• Stars
    star
    206
  • Rank 189,428 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 10 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Smarter Manual Annotation for Resource-constrained collection of Training data

SMART

Build Status Documentation Status

SMART is an open source application designed to help data scientists and research teams efficiently build labeled training datasets for supervised machine learning tasks.

If you use SMART for a research publication, please consider citing:

Chew, R., Wenger, M., Kery, C., Nance, J., Richards, K., Hadley, E., & Baumgartner, P. (2019). SMART: An Open Source Data Labeling Platform for Supervised Learning. Journal of Machine Learning Research, 20(82), 1-5.

Development

The simplest way to start developing is to go to the envs/dev directory and run the rebuild script with ./rebuild.sh. This will: clean up any old containers/volumes, rebuild the images, run all migrations, and seed the database with some testing data.

The testing data includes three users root, user1, test_user and all of their passwords are password555. There is also a handful of projects with randomly labeled data by the various users.

Docker containers

This project uses docker containers organized by docker-compose to ease dependency management in development. All dependencies are controlled through docker.

Initial Startup

First, install docker and docker-compose. Then navigate to envs/dev and to build all the images run:

docker-compose build

Next, create the docker volumes where persistent data will be stored.

docker volume create --name=vol_smart_pgdata
docker volume create --name=vol_smart_data

Then, migrate the database to ensure the schema is prepared for the application.

docker-compose run --rm backend ./migrate.sh

Workflow During Development

Run docker-compose up to start all docker containers. This will start up the containers in the foreground so you can see the logs. If you prefer to run the containers in the background use docker-compose up -d. When switching between branches there is no need to run any additional commands (except build if there is dependency change).

Dependency Changes

If there is ever a dependency change than you will need to re-build the containers using the following commands:

docker-compose build <container with new dependency>
docker-compose rm <container with new dependency>
docker-compose up

If your database is blank, you will need to run migrations to initialize all the required schema objects; you can start a blank backend container and run the migration django management command with the following command:

docker-compose run --rm backend ./migrate.sh

Dependency management in Python

We use pip-tools to manage Python dependencies. To change the dependencies:

  1. Edit requirements.in to add, remove, or edit a dependency. You only need to put primary dependencies here, that is, the ones explicitly needed by our source code. pip-tools will take care of adding their dependencies.
  2. Run docker-compose run --rm backend pip-compile docker/requirements.in to generate a new requirements.txt. Note that pip-tools uses the existing requirements.txt file when building a new one, so that it can maintain existing versions. To upgrade a package to the newest version compatible with the other libraries, just remove it from the existing requirements.txt before running pip-compile.
  3. Run docker-compose build backend to install the updated requirements into the Docker image.

Custom Environment Variables

The various services will be available on your machine at their standard ports, but you can override the port numbers if they conflict with other running services. For example, you don't want to run SMART's instance of Postgres on port 5432 if you already have your own local instance of Postgres running on port 5432. To override a port, create a file named .env in the envs/dev directory that looks something like this:

# Default is 5432
EXTERNAL_POSTGRES_PORT=5433

# Default is 3000
EXTERNAL_FRONTEND_PORT=3001

The .env file is ignored by .gitignore.

Timezones

All date-times in the SMART backend and database are set to UTC (Coordinated Universal Time) as reccomended by the Django docs. By default the history and download date-times are set to Eastern New York time. To change this, go to SMART/backend/django/smart/settings.py and update the TIME_ZONE_FRONTEND variable to the desired time-zone.

Running tests

Backend tests use py.test and flake8. To run them, use the following docker-compose command from the env/dev directory:

docker-compose run --rm backend ./run_tests.sh <args>

Where <args> are arguments to be passed to py.test. Use py.test -h to see all the options, but a few useful ones are highlighted below:

  • -x: Stop running after the first failure
  • -s: Print stdout from the test run (allows you to see temporary print statements in your code)
  • -k <expression>: Only run tests with names containing "expression"; you can use Python expressions for more precise control. See py.test -h for more info
  • --reuse-db: Don't drop/recreate the database between test runs. This is useful for for reducing test runtime. You must not pass this flag if the schema has changed since the last test run.

Frontend tests use mocha and eslint. To run them, use the following docker-compose command from the env/dev directory:

docker-compose run --rm smart_frontend ./run_tests.sh

Contributing

If you would like to contribute to SMART feel free to submit issues and pull requests addressing any bugs or features. Before submitting a pull request make sure to follow the few guidelines below:

  • Clearly explain the bug you are experiencing or the feature you wish to implement in the description.
  • For new features include unit tests to ensure the feature is working correctly and the new feature is maintainable going forward.
  • For bug fixes include unit tests to ensure that previously untested code is now covered.
  • Make sure your branch passes all the existing backend and frontend tests.
  • It is recommended that you enable pre-commit hooks. These are format checks that run whenever you commit to the project.
    • In order to run the pre-commit hooks you will need to have pre-commit installed in your local environment.
    • Once your environment is active you will need to install the pre-commit hooks with pre-commit install
    • This project uses the following formatters:
      • black: The uncompromising Python code formatter
      • flake8: Your tool for style guide enforcement
      • docformatter: Formats docstrings to follow PEP 257
      • isort: A python utility to sort imports
      • eslint: A fully pluggable tool for identifying and reporting on patterns in JavaScript

More Repositories

1

gobbli

Deep learning with text doesn't have to be scary.
Python
274
star
2

harness-vue

JavaScript
10
star
3

harness

The Harness vue plugin
JavaScript
8
star
4

rollmatch

Rolling Entry Matching R Package
R
7
star
5

teehr

Tools for Exploratory Evaluation in Hydrologic Research
Python
6
star
6

code_docker_lib

Dockerized tools for the Center for Omics Discovery and Epidemiology
Dockerfile
5
star
7

PushshiftRedditDistiller

This package is intended to assist with downloading, extracting, and distilling the monthly reddit data dumps made available through pushshift.io
Julia
4
star
8

hydro-evaluation

Test code for the CIROH Evaluation System project.
Jupyter Notebook
4
star
9

biocloud_docker_tools

C
4
star
10

diabetes-simbackend-only

Python
4
star
11

biocloud_gwas_workflows

WDL
3
star
12

NCMInD

Python
3
star
13

nc-mind-covid-19

Python
2
star
14

mobForest

R
2
star
15

BigSurv18-Spark-for-Social-Science

Jupyter Notebook
2
star
16

harness-starter-template

A starter template for building web dashboards with RTI's Harness Vue Plugins
JavaScript
2
star
17

comprehensive-model-schema

JSON Schema for the comprehensive diabetes model.
2
star
18

harness-ui

Vue
2
star
19

AEG-RTI_H2Models

Python
2
star
20

crcsim

Colorectal cancer (CRC) simulation model, designed to examine the impacts of screening strategies and patient compliance on outcomes like mortality rate.
Python
2
star
21

ld-regression-pipeline

WDL workflow for running LD-regression of GWAS summary statistics against one or more phenotypes on interest
WDL
1
star
22

childcare_lead_BNmodels

Code to develop BN models to predict water lead risks in child care centers.
R
1
star
23

rota-app

Streamlit application for using ROTA
Python
1
star
24

nc-mind

NC MInD Website
HTML
1
star
25

virtual-opioid-user

Continuous model of an individual's opioid user over time
Python
1
star
26

cervical-cancer-abm

Cervical Cancer Prevention Agent-Based Model
Python
1
star
27

LISTS_REDCap_project

LISTS (Longitudinal Implementation Strategy Tracker System) REDCap project
1
star
28

harness-vue-bootstrap

Vue
1
star
29

teehr-may-2023-workshop

Materials for the May 2023 CIROH Developers Conference
Python
1
star
30

rota

Rapid Offense Text Autocoder
Python
1
star
31

researchnet

RTI’s ResearchNet is a flexible, cloud-enabled backend for Computer Assisted Self Interview (CASI) systems. This platform provides a secure mechanism for managing enrollment, processing consent, and collecting survey data.
Python
1
star
32

csv-to-embeddings-model

Trains a model on top of a sbert's pertained models with given trained pairs to be used with Python's Sentence Transformer
Python
1
star