• Stars
    star
    263
  • Rank 155,624 (Top 4 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Configs and boilerplates for Label Studio's Machine Learning backend

What is the Label Studio ML backend?

The Label Studio ML backend is an SDK that lets you wrap your machine learning code and turn it into a web server. You can then connect that server to a Label Studio instance to perform 2 tasks:

  • Dynamically pre-annotate data based on model inference results
  • Retrain or fine-tune a model based on recently annotated data

If you just need to load static pre-annotated data into Label Studio, running an ML backend might be overkill for you. Instead, you can import preannotated data.

How it works

  1. Get your model code
  2. Wrap it with the Label Studio SDK
  3. Create a running server script
  4. Launch the script
  5. Connect Label Studio to ML backend on the UI

Quickstart

Follow this example tutorial to run an ML backend with a simple text classifier:

  1. Clone the repo

    git clone https://github.com/heartexlabs/label-studio-ml-backend  
  2. Setup environment

    It is highly recommended to use venv, virtualenv or conda python environments. You can use the same environment as Label Studio does. Read more about creating virtual environments via venv.

    cd label-studio-ml-backend
    
    # Install label-studio-ml and its dependencies
    pip install -U -e .
    
    # Install example dependencies
    pip install -r label_studio_ml/examples/requirements.txt
  3. Initialize an ML backend based on an example script:

    label-studio-ml init my_ml_backend --script label_studio_ml/examples/simple_text_classifier/simple_text_classifier.py

    This ML backend is an example provided by Label Studio. See how to create your own ML backend.

  4. Start ML backend server

    label-studio-ml start my_ml_backend
  5. Start Label Studio and connect it to the running ML backend on the project settings page.

Create your own ML backend

Follow this tutorial to wrap existing machine learning model code with the Label Studio ML SDK to use it as an ML backend with Label Studio.

Before you start, determine the following:

  1. The expected inputs and outputs for your model. In other words, the type of labeling that your model supports in Label Studio, which informs the Label Studio labeling config. For example, text classification labels of "Dog", "Cat", or "Opossum" could be possible inputs and outputs.
  2. The prediction format returned by your ML backend server.

This example tutorial outlines how to wrap a simple text classifier based on the scikit-learn framework with the Label Studio ML SDK.

Start by creating a class declaration. You can create a Label Studio-compatible ML backend server in one command by inheriting it from LabelStudioMLBase.

from label_studio_ml.model import LabelStudioMLBase

class MyModel(LabelStudioMLBase):

Then, define loaders & initializers in the __init__ method.

def __init__(self, **kwargs):
    # don't forget to initialize base class...
    super(MyModel, self).__init__(**kwargs)
    self.model = self.load_my_model()

There are special variables provided by the inherited class:

  • self.parsed_label_config is a Python dict that provides a Label Studio project config structure. See ref for details. Use might want to use this to align your model input/output with Label Studio labeling configuration;
  • self.label_config is a raw labeling config string;
  • self.train_output is a Python dict with the results of the previous model training runs (the output of the fit() method described bellow) Use this if you want to load the model for the next updates for active learning and model fine-tuning.

After you define the loaders, you can define two methods for your model: an inference call and a training call.

Inference call

Use an inference call to get pre-annotations from your model on-the-fly. You must update the existing predict method in the example ML backend scripts to make them work for your specific use case. Write your own code to override the predict(tasks, **kwargs) method, which takes JSON-formatted Label Studio tasks and returns predictions in the format accepted by Label Studio.

Example

def predict(self, tasks, **kwargs):
    predictions = []
    # Get annotation tag first, and extract from_name/to_name keys from the labeling config to make predictions
    from_name, schema = list(self.parsed_label_config.items())[0]
    to_name = schema['to_name'][0]
    for task in tasks:
        # for each task, return classification results in the form of "choices" pre-annotations
        predictions.append({
            'result': [{
                'from_name': from_name,
                'to_name': to_name,
                'type': 'choices',
                'value': {'choices': ['My Label']}
            }],
            # optionally you can include prediction scores that you can use to sort the tasks and do active learning
            'score': 0.987
        })
    return predictions

Training call

Use the training call to update your model with new annotations. You don't need to use this call in your code, for example if you just want to pre-annotate tasks without retraining the model. If you do want to retrain the model based on annotations from Label Studio, use this method.

Write your own code to override the fit(annotations, **kwargs) method, which takes JSON-formatted Label Studio annotations and returns an arbitrary dict where some information about the created model can be stored.

Example

def fit(self, completions, workdir=None, **kwargs):
    # ... do some heavy computations, get your model and store checkpoints and resources
    return {'checkpoints': 'my/model/checkpoints'}  # <-- you can retrieve this dict as self.train_output in the subsequent calls

After you wrap your model code with the class, define the loaders, and define the methods, you're ready to run your model as an ML backend with Label Studio.

For other examples of ML backends, refer to the examples in this repository. These examples aren't production-ready, but can help you set up your own code as a Label Studio ML backend.

Different port

If you don't want to use the docker, you can run the ML backend with uwsgi workers and use custom port this way:

label-studio-ml-backend init --script examples/dummy_model/dummy_model.py my_backend
cd my_backend
python _wsgi.py -p 4242

Deploy your ML backend to GCP

Before you start:

  1. Install gcloud
  2. Init billing for account if it's not activated
  3. Init gcloud, type the following commands and login in browser:
gcloud auth login
  1. Activate your Cloud Build API
  2. Find your GCP project ID
  3. (Optional) Add GCP_REGION with your default region to your ENV variables

To start deployment:

  1. Create your own ML backend
  2. Start deployment to GCP:
label-studio-ml deploy gcp {ml-backend-local-dir} \
--from={model-python-script} \
--gcp-project-id {gcp-project-id} \
--label-studio-host {https://app.heartex.com} \
--label-studio-api-key {YOUR-LABEL-STUDIO-API-KEY}
  1. After label studio deploys the model - you will get model endpoint in console.

More Repositories

1

labelImg

LabelImg is now part of the Label Studio community. The popular image annotation tool created by Tzutalin is no longer actively being developed, but you can check out Label Studio, the open source data labeling tool for images, text, hypertext, audio, video and time-series data.
Python
20,885
star
2

label-studio

Label Studio is a multi-type data labeling and annotation tool with standardized output format
JavaScript
16,524
star
3

awesome-data-labeling

A curated list of awesome data labeling tools
3,470
star
4

label-studio-frontend

Data labeling react app that is backend agnostic and can be embedded into your applications β€” distributed as an NPM package
JavaScript
318
star
5

label-studio-converter

Tools for converting Label Studio annotations into common dataset formats
Python
253
star
6

label-studio-transformers

Label data using HuggingFace's transformers and automatically get a prediction service
Python
176
star
7

RLHF

Collection of links, tutorials and best practices of how to collect the data and build end-to-end RLHF system to finetune Generative AI models
Jupyter Notebook
62
star
8

label-studio-sdk

Label Studio SDK
Python
51
star
9

dm2

Full-fledged Data Exploration Tool for Label Studio
JavaScript
35
star
10

pyheartex

Heartex Python SDK - Connect your own models to Heartex Data Labeling
Python
28
star
11

brand-sentiment-analysis

Scripts utilizing Heartex platform to build brand sentiment analysis from the news
CSS
22
star
12

label-studio-evalme

Evaluation metrics package
Python
7
star
13

label-studio-terraform

HCL
5
star
14

label-studio-examples

Example Code to Supplement the Label Studio Blog
Python
5
star
15

label-studio-tools

Python
4
star
16

text-classifier

Tensorflow-based text classifier that could be integrated with Heartex/Label Studio
Python
4
star
17

awesome-human-in-the-loop

Awesome List of Human in the Loop resources and references for retraining models.
4
star
18

smartfew

SmartFew is your swiss knife for semi-supervised structuring of unlabeled data using Few Shot Learning.
Python
4
star
19

charts

3
star
20

heartexlabs.github.io

Label Studio website with the documentation
HTML
2
star
21

awesome-active-learning

A curated list of awesome active learning related topics
2
star
22

label-studio-addon-dicom

DICOM format annotation and labeling support for Label Studio
2
star
23

articles

Materials we publish on Medium and other resources about labeling, machine learning, active learning, etc
1
star