• Stars
    star
    153
  • Rank 241,886 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Weakly Supervised End-to-End Learning (NeurIPS 2021)

WeaSEL: Weakly Supervised End-to-end Learning

Python PyTorch Lightning Config: hydra license

This is a PyTorch-Lightning-based framework, based on our End-to-End Weak Supervision paper (NeurIPS 2021), that allows you to train your favorite neural network for weakly-supervised classification1

  • only with multiple labeling functions (LFs)2, i.e. without any labeled training data!
  • in an end-to-end manner, i.e. directly train and evaluate your neural net (end-model from here on), there's no need to train a separate label model any more as in Snorkel & co,
  • with better test set performance and enhanced robustness against correlated or inaccurate LFs than prior methods like Snorkel

1 This includes learning from crowdsourced labels or annotations!
2 LFs are labeling heuristics, that output noisy labels for (subsets of) the training data (e.g. crowdworkers or keyword detectors).

If you use this code, please consider citing our work

End-to-End Weak Supervision
Salva Rühling Cachay, Benedikt Boecking, and Artur Dubrawski
Advances in Neural Information Processing Systems (NeurIPS), 2021
arXiv:2107.02233v3

Credits

Getting Started

This library assumes familiarity with (multi-source) weak supervision, if that's not the case you may want to first learn its basics in e.g. this overview slides from Stanford or this Snorkel tutorial.

That being said, have a look at our examples and the notebooks therein showing you how to use Weasel for your own dataset, LF set, or end-model. E.g.:

Reproducibility

Please have a look at the research code branch, which operates on pure PyTorch.

Installation

1. New environment (recommended, but optional)
conda create --name weasel python=3.9
conda activate weasel  
2a: From source
python -m pip install git+https://github.com/autonlab/weasel#egg=weasel[all]
2b: From source, editable install
git clone https://github.com/autonlab/weasel.git
cd weasel
pip install -e .[all]

Minimal dependencies

Minimal dependencies, in particular not using Hydra, can be installed with

python -m pip install git+https://github.com/autonlab/weasel

The needed environment corresponds to conda env create -f env_gpu_minimal.yml.

If you choose to use this variant, you won't be able to run some of the examples: You may want to have a look at this notebook that walks you through how to use Weasel without Hydra as the config manager.

Note: Weasel is under active development, some uncovered edge cases might exist, and any feedback is very welcomed!

Apply WeaSEL to your own problem

Configuration with Hydra

Optional: This template config will help you get started with your own application, an analogous config is used in this tutorial script that you may want to check out.

Pre-defined or custom downstream models & Baselines

Please have a look at the detailed instructions in this Readme.

Using your own dataset and/or labeling heuristics

Please have a look at the detailed instructions in this Readme.

Citation

@article{cachay2021endtoend,
  author={R{\"u}hling Cachay, Salva and Boecking, Benedikt and Dubrawski, Artur},
  journal={Advances in Neural Information Processing Systems}, 
  title={End-to-End Weak Supervision},
  year={2021}
}

More Repositories

1

auton-survival

Auton Survival - an open source package for Regression, Counterfactual Estimation, Evaluation and Phenotyping with Censored Time-to-Events
Python
313
star
2

DeepSurvivalMachines

26
star
3

AutonFeat

A High Performance Library for Time-Series Featurization.
Python
23
star
4

ActiveSearch

Java
20
star
5

aqua

AQuA: A Benchmarking Tool for Label Quality Assessment
Jupyter Notebook
18
star
6

esrnn

Hybrid ES-RNN models for time series forecasting
Jupyter Notebook
17
star
7

KeyClass

Code for "Classifying Unstructured Clinical Notes via Automatic Weak Supervision", MLHC 2022.
Python
16
star
8

tad

Temporal Anomaly Detector (TAD)
Python
14
star
9

auviewer

Python
10
star
10

constrained-clustering

Repository for the Constraint Satisfaction Clustering method and other constrained clustering algorithms
Python
7
star
11

pmx_data

Documentation and code for predictive maintenance data and assess scripts.
Python
6
star
12

2018.hackAuton

All hacks built during the 2018 hackAuton.
Jupyter Notebook
5
star
13

audata

Python
5
star
14

autonforecasting

Python
4
star
15

autonlab.github.io

TeX
3
star
16

weakVSAlertsAdjudicator

Code for "Weakly supervised classification of vital sign alerts as real or artifact", AMIA 2022.
Jupyter Notebook
3
star
17

auton_survival

Package for performing Time-to-Event prediction and Survival Analysis.
Python
2
star
18

fastlvm

Fast and scalable inference for various Latent Variable Models
C++
2
star
19

afib_detection

Python
2
star
20

nbeats

Python
2
star
21

autonbox

Primitives for the D3M program
Python
2
star
22

bn-sat-verification

Formal Verifications of Bayesian Network Classifiers
Jupyter Notebook
2
star
23

2023.hackAuton

This repository hosts all the hacks from the 2023 hackAuton event!
Jupyter Notebook
2
star
24

find_projections

Find 2-d projection boxes which can separate out homogeneous data points
Jupyter Notebook
1
star
25

srl

Service Routing Layer (SRL) for MEMEX services
C++
1
star
26

active-matrix-factorization

Active matrix factorization code for Active Learning and Search on Low-Rank Matrices by Sutherland, Póczos, and Schneider, KDD 2013
MATLAB
1
star
27

active-search-gp-sopt

implementation of experiments in Yifei Ma, Tzu-Kuo Huang, Jeff Schneider. Active Search and Bandits on Graphs Using Sigma-Optimality. UAI 2015.
TeX
1
star