• Stars
    star
    124
  • Rank 288,207 (Top 6 %)
  • Language
    Python
  • Created over 7 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for performing 3 multitask machine learning methods: deep neural networks, Multitask Multi-kernel Learning (MTMKL), and a hierarchical Bayesian model (HBLR).

Personalized Multitask Learning

This repo contains code for 3 multitask machine learning methods: deep neural networks, Multitask Multi-kernel Learning (MTMKL), and a hierarchical Bayesian model (HBLR). These methods can be used to personalize the prediction of outcomes like stress, happiness, etc. to each individual, by treating predicting the outcome of a single individual (or a cluster of related individuals) as a task.

The code is related to two research papers which explain this approach in further detail:

Taylor, S.*, Jaques, N.*, Nosakhare, E., Sano, A., Picard, R., "Personalized Multitask Learning for Predicting Tomorrow’s Mood, Stress, and Health", IEEE Transactions on Affective Computing December 2017. (*equal contribution) PDF

Jaques, N.*, Taylor, S.*, Nosakhare, E., Sano, A., Picard, R., "Multi-task Learning for Predicting Health, Stress, and Happiness", NIPS Workshop on Machine Learning for Healthcare, December 2016, Barcelona, Spain. (*equal contribution) PDF BEST PAPER AWARD

If you find this code useful, please cite our work!

If you have any questions about this code or the associated papers, please email us at [email protected] or [email protected].

Models in this code:

Multitask Neural Network (MTL-NN)

image

The intuition behind the multitask neural network design is that the shared layers will learn to extract information that is useful for summarizing relevant characteristics of any person’s day into an efficient, generalizable embedding. The final, task-specific layers are then expected to learn how to map this embedding to a prediction customized for each person or cluster of people.

For example, if the shared layers learn to condense all of the relevant smartphone app data about phone calls and texting into an aggregate measure of social support, the task-specific layers can then learn a unique weighting of this measure for each cluster of participants. Perhaps a cluster containing participants with high extroversion scores will be more strongly affected by a lack of social support than another cluster.

Multitask Multi-kernel Learning (MTMKL)

MTMKL (originally developed by Kandemir et. al.) is a modified version of Multi-Kernel Learning (MKL) in which tasks share information through kernel weights on the modalities. MTMKL uses a least-squares support vector machine (LSSVM) for each task-specific model. Unlike the canonical SVM, the LSSVM uses a quadratic error on the “slack” variables instead of an L1 error. As a result, the LSSVM can be learned by solving a series of linear equations in contrast to using quadratic programing to learn a canonical SVM model.

Hierarchical Bayesian Logistic Regression (HBLR)

In hierarchical Bayesian MTL approaches, the model for each task draws its parameters from a common prior distribution. As the model is trained, the common prior is updated, allowing information to be shared across tasks. The model we adopt, which was originally proposed by Xue et. al., draws logistic regression (LR) weights for each task from a shared Dirichlet Process (DP) prior; we call this model Hierarchical Bayesian Logistic Regression (HBLR).

In contrast with our prior approaches (MTL-NN and MTMKL), the HBLR model allows us to directly define each task as predicting a label (e.g. tomorrow's stress level) of a single user, since the model is able to implicitly learn its own (soft) clustering. This model clusters tasks that are most similar in terms of their relationship between the input features and their resulting outcome (i.e. decision boundaries) while simultaneously learning the prediction function.

Single Task Learning models

Code to train a logistic regression model, an LSSVM, and a single-task neural network is include for comparison purposes.

Structure

Code structure

Wrappers are used to perform a grid search over hyperparameters. The file run_jobs.py can be used to launch the training of several models in sequence, and send emails after they complete. To see an example of how to run the training code for the models, see jobs_to_run.txt.

Input data format

.csv files

Assume csvs have columns for 'user_id', 'timestamp', and columns for the outcome labels containing the string '_Label'.

'Task dict list'

For the multi-task algorithms, we use a special data structure saved to a pickle file to represent the data from multiple tasks. The code for generating files in this format given a .csv file is available in make_datasets.py. To run it, use:

python make_datasets.py --datafile='./example_data.csv' --task_type='users'

File Format details

  • Data for both labels-as-tasks and users-as-tasks are stored in pickled files as a list of dicts (each list item represents a task)

    • Labels-as-tasks
      • The .csv file will be partitioned such that predicting related outcomes is each task (e.g. predicting stress is one task and predicting happiness is another)
      • Normalization is done based on training data for entire group
    • Users-as-tasks:
      • The .csv file will be partioned such that predicting the outcome of each user is one task.
      • Need to specify which label to target (i.e., the label that you will be predicting)
      • Normalization is done per-person
  • Each task is a dict containing 4 keys:

    • ‘Name’: gives the name of the task, eg. "Group_Happiness_Evening_Label" or a user ID
    • ‘X’: the data matrix. Rows are samples, columns are features. Does not contain unnecessary stuff like ‘user_id’ and ‘timestamp’, and has already been normalized and empty cells filled
    • ‘Y’: the classification labels for this task, in the same order as the rows of X
    • ‘ModalityDict’: used for MTMKL model. Maps modalities like “phys” or “location” to their start index in the feature list

More Repositories

1

AI-generated-characters

AI-generated-character
Jupyter Notebook
455
star
2

Junkyard-Jumbotron

The Junkyard Jumbotron is a web tool that makes it really easy to combine a bunch of random displays into a single, large virtual display. It works with laptops, tablets, smartphones -- anything that can run a web browser. And the magic is that all you need to do to configure one is take a photograph of all the screens.
C++
199
star
3

medrec

medical records on the blockchain https://medrec.media.mit.edu/
JavaScript
156
star
4

unhangout-old

RETIRED
JavaScript
155
star
5

sherlock-project

This repository provides data and scripts to use Sherlock, a DL-based model for semantic data type detection: https://sherlock.media.mit.edu.
Jupyter Notebook
143
star
6

gobo

💭 Gobo: Your social media. Your rules.
JavaScript
108
star
7

vizml

Plotly dataset-visualization pairs, feature extraction scripts, and model training code for VizML (CHI 2019)
Python
100
star
8

para

JavaScript
100
star
9

django-channels-presence

"Rooms" and "Presence" for django-channels
Python
78
star
10

viznet

VizNet is a repository providing real-world datasets that enable, among other things, (re)running empirical studies with higher ecological validity
Jupyter Notebook
74
star
11

MDAgents

Python
27
star
12

Health-LLM

Python
22
star
13

prg-raise-playground

Boilerplate for playing with and deploying Scratch 3.0 modifications!
JavaScript
19
star
14

MediaCloud-Dashboard

Front-end for the MediaCloud database
JavaScript
16
star
15

storybook-photoshop-jsx

JavaScript
16
star
16

ajl.ai

A web application for crowdsourcing image annotations.
JavaScript
16
star
17

ml-certs

Media Lab Digital Certificates
HTML
15
star
18

MITLegalForum

Transforming Law and Legal Processes for the Digital Age
15
star
19

AffectiveComputingQuantifyMeAndroid

The QuantifyMe platform helps researchers conduct single-case experiments in an automated and scientific manner.
Java
15
star
20

ai-generated-media

Jupyter Notebook
14
star
21

unhangout

Python
14
star
22

2019-MIT-Computational-Law-Course

MIT IAP 2019 Computational Law Course
Go
14
star
23

HERMITS_UIST20

Python
13
star
24

nmi-ai-2023

A repository for the paper "Beliefs about AI influence human-AI interaction and can be manipulated to increase perceived trustworthiness, empathy, and effectiveness" Nature Machine Intelligence 2023.
Jupyter Notebook
13
star
25

empathic-stories

HTML
11
star
26

Evolutron

A mini-framework to build and train neural networks for molecular biology.
Jupyter Notebook
11
star
27

eegreconstruction

Jupyter Notebook
10
star
28

OpenCyberDance

Open source Cybernetic Dance System
TypeScript
10
star
29

kukaslxctrl

A small library intended for controlling KUKA robots using KRC4 over KUKA RSI (Robot Sensor Interface) from Simulink.
C
10
star
30

word-tree

A Unity app designed to help children learn English letter-sound correspondence, sound blending, and sight word recognition.
C#
9
star
31

Terra-Incognita

Your personal media geography. Catherine's thesis project.
JavaScript
9
star
32

CityMatrixAI

CityMatrix is an urban decision support system augmented with artificial intelligence. This repo is the UI for the AI assistant of the project.
C#
9
star
33

2018-MIT-IAP-ComputationalLaw

MIT IAP Computational Law Course
8
star
34

bert-slu

Python
8
star
35

ai-false-memories

repository for the paper "AI-Induced False Memories in Simulated Witness Interviews with Large Language Model"
Jupyter Notebook
8
star
36

DeepABM-Pandemic

Python
7
star
37

Wearable-Sanitizer

Wearable Sanitizer
C++
7
star
38

promise-tracker-builder

Web app for developing and tracking civic monitoring campaigns
JavaScript
7
star
39

MAS.S60.Fall2020

Experiments in Deepfakes : Creativity, Computation, and Criticism
Jupyter Notebook
7
star
40

Generative-Autonomous-Legal-Entities

GALE - Exploring the Potential and the Perils of Autonomous Legal Entities Powered by Generative AI
7
star
41

FutureLaw

Future Law at the MIT Media Lab
6
star
42

Vida_Modeling

User Interface and Simulation Platform for a System Dynamics Model
Python
6
star
43

Realtime-Community-Sign

Software to run LED signs to show community information like bus arrival times and event calendars
Python
6
star
44

TI_EVM_logger

Sensor data log (+stream to websocket) for evaluation modules by Texas Instrument (tested with FDC2214 and LDC1614)
Python
6
star
45

NewsPix

NewsPix is a suite of apps by Matt Carroll, Catherine D'Ignazio and Jay Vachon that drive engagement in local news through pictures and visualizations. Our first app is a browser extension for Chrome and Firefox that delivers breaking news to the new tab window of a desktop user's browser.
CSS
6
star
46

MappingPoliceViolence-Scaper

Scripts that pull together data for our investigation into police violence against un-armed people of color in the US.
HTML
6
star
47

2021-MIT-IAP-Computational-Law-Course

5
star
48

livingmemory

JavaScript
5
star
49

TrustCoreID

For Human Dynamics open collaboration on CoreID project
JavaScript
5
star
50

ml-certs-website-archive

[ARCHIVE] Webpage for the Digital Certificates Project
HTML
5
star
51

Project-Captivate

Glasses project for crowds
C
5
star
52

tidstream

Tools for generating and manipulating live Vorbis and Opus streams
C
5
star
53

thefestival.media.mit.edu

Official website for the Festival of Learning at the MIT Media Lab
HTML
4
star
54

tidzam

Python
4
star
55

doodlebot

DoodleBot guide and resources
OpenSCAD
4
star
56

MIT-CLR

Public Facing GitHub Repo of MIT Computational Law Report
4
star
57

SAR-opal-base

A generalized Unity game builder designed for use in child-robot interactions.
C#
4
star
58

omniFORM

C++
4
star
59

storyspace

A simple storytelling game built in Unity3D / Mono, designed for use with a storytelling robot.
C#
4
star
60

Society-of-Neurons

Jupyter Notebook
4
star
61

SLIC

Sovereign Legal Identity Challenge
4
star
62

OpenMediaLegalHack

#hack4music at MIT Media Lab
4
star
63

MediaCloud-Tag-Explorer

Website you can use to explore MediaCloud tag sets
Python
4
star
64

spiral

Archimedean spiral generator for embroidered speaker coils
Python
4
star
65

AffectiveComputingQuantifyMeDjango

The QuantifyMe platform helps researchers conduct single-case experiments in an automated and scientific manner.
Python
4
star
66

Nightlights-Mobility

A project seeking to link remote observation nightlights data with telecoms-based mobility data
Python
4
star
67

Community-Sign-Server

Server software to manage a network of LED signs showing community information like bus arrival times and event calendars
PHP
3
star
68

tega_teleop

A python rosnode for teleoperating the Tega robot
Python
3
star
69

GDPR-Hack-Day

GDPR Sunrise Eve Hack Day
3
star
70

Computational-Law-IAP-Workshop-2020

3
star
71

jitsi-meet-server

Experimental Vagrant/Salt configuration for automatically deploying a Jitsi Meet video server
Shell
3
star
72

prg-s02-system-setup

Python
3
star
73

pugg

A demon of the second kind, designed to overthrow Pugg, the information pirate
Python
3
star
74

asr_google_cloud

subscribes to microphone feed and publishes ASR result over ROS
Python
3
star
75

DistributedIdentity

Collaborative Project of Sarah Schwettmann and Dazza Greenwood
3
star
76

rr_tools

Tools for analysis and processing for the relational robot project.
Python
3
star
77

MediaMeter-Coder

Code to compare historical coverage of US and World issues in US newspapers.
Ruby
3
star
78

Hack4Climate

Hack4Climate at the MIT Media Lab
3
star
79

LegalHackers

LegalHackers.org related research and development activities at law.MIT.edu, the Media Lab and MIT
3
star
80

unhangout-video-server

RETIRED
SaltStack
3
star
81

dcpctrl_v1

Code developed 2015-2016 to control the second iteration of the Digital Construction Platform.
MATLAB
3
star
82

promise-tracker-mobile

Mobile data collection client for civic monitoring campaigns
JavaScript
3
star
83

nytcorpus-ruby

A ruby parser for the New York Times Corpus
Ruby
3
star
84

opera-timeline

Interactive Timeline of Projects by the Opera of the Future
JavaScript
3
star
85

fbserver

FB Server
Ruby
3
star
86

physioHMD

The PhysioHMD platform introduces a software and hardware modular interface built for collecting affect and physiological data from users wearing a head-mounted display. The platform enables researchers and developers to aggregate and interpret signals in real-time and use them to develop novel, personalized interactions, as well as evaluate virtual experiences. Our design offers seamless integration with standard HMDs, requiring minimal setup effort for developers and those with less experience using game engines. The PhysioHMD platform is a flexible architecture that offers an interface that is not only easy to extend but also complemented by a suite of tools for testing and analysis. We hope that PhysioHMD can become a universal, publicly available testbed for VR and AR researchers.
Python
3
star
87

gee_custom_utilities

A collection of python utility functions for working with Google Earth Engine
Python
2
star
88

AttentionMapDemo

A geographical heatmap of media attention across the globe, from a variety of sources
JavaScript
2
star
89

dhm

Digital Humanitarian Marketplace
PHP
2
star
90

yourAd

ad design and replacement tool to reclaim your browser ads
JavaScript
2
star
91

speech-tapgame-aamas18

This repository contains the source code and associated executables for running the tap game described in "A Social Robot System for Modeling Children's Pronunciation"
Python
2
star
92

Global-Coverage-Study

A small study designed to compare geographic coverage between various types of online news sources
HTML
2
star
93

PopBlocks

JavaScript
2
star
94

subreddit-scripts

A repository for commonly used reddit scripts
Python
2
star
95

DigitalIdentitySessions

July 24 2017 at the MIT Media Lb
HTML
2
star
96

genderinmemoriam

Gender in Memoriam
JavaScript
2
star
97

fluid_statistics

Python Statistics Pipeline
Jupyter Notebook
2
star
98

text_analyses_tools

Matching phrases between source and query text files
Python
2
star
99

HCU400

An Annotated Dataset for Exploring Aural Phenomenology through Causal Uncertainty
2
star
100

omniFORM_2021

C++
2
star