• Stars
    star
    174
  • Rank 219,104 (Top 5 %)
  • Language
    Python
  • License
    BSD 2-Clause "Sim...
  • Created over 2 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NeurIPS'22 | TransTab: Learning Transferable Tabular Transformers Across Tables

TransTab: A flexible transferable tabular learning framework [arxiv]

PyPI version Documentation Status License GitHub Repo stars GitHub Repo forks Downloads Downloads

Document is available at https://transtab.readthedocs.io/en/latest/index.html.

Paper is available at https://arxiv.org/pdf/2205.09328.pdf.

5 min blog to understand TransTab at realsunlab.medium.com!

News!

  • [05/04/23] Check the version 0.0.5 of TransTab!

  • [01/04/23] Check the version 0.0.3 of TransTab!

  • [12/03/22] Check out our [blog] for a quick understanding of TransTab!

  • [08/31/22] 0.0.2 Support encode tabular inputs into embeddings directly. An example is provided here. Several bugs are fixed.

TODO

  • Table embedding.

  • Add support to direct process table with missing values.

  • Add regression support.

Features

This repository provides the python package transtab for flexible tabular prediction model. The basic usage of transtab can be done in a couple of lines!

import transtab

# load dataset by specifying dataset name
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data('credit-g')

# build classifier
model = transtab.build_classifier(cat_cols, num_cols, bin_cols)

# start training
transtab.train(model, trainset, valset, **training_arguments)

# make predictions, df_x is a pd.DataFrame with shape (n, d)
# return the predictions ypred with shape (n, 1) if binary classification;
# (n, n_class) if multiclass classification.
ypred = transtab.predict(model, df_x)

It's easy, isn't it?

How to install

First, download the right pytorch version following the guide on https://pytorch.org/get-started/locally/.

Then try to install from pypi directly:

pip install transtab

or

pip install git+https://github.com/RyanWangZf/transtab.git

Please refer to for more guidance on installation and troubleshooting.

Transfer learning across tables

A novel feature of transtab is its ability to learn from multiple distinct tables. It is easy to trigger the training like

# load the pretrained transtab model
model = transtab.build_classifier(checkpoint='./ckpt')

# load a new tabular dataset
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data('credit-approval')

# update categorical/numerical/binary column map of the loaded model
model.update({'cat':cat_cols,'num':num_cols,'bin':bin_cols})

# then we just trigger the training on the new data
transtab.train(model, trainset, valset, **training_arguments)

Contrastive pretraining on multiple tables

We can also conduct contrastive pretraining on multiple distinct tables like

# load from multiple tabular datasets
dataname_list = ['credit-g', 'credit-approval']
allset, trainset, valset, testset, cat_cols, num_cols, bin_cols \
     = transtab.load_data(dataname_list)

# build contrastive learner, set supervised=True for supervised VPCL
model, collate_fn = transtab.build_contrastive_learner(
    cat_cols, num_cols, bin_cols, supervised=True)

# start contrastive pretraining training
transtab.train(model, trainset, valset, collate_fn=collate_fn, **training_arguments)

Citation

If you find this package useful, please consider citing the following paper:

@inproceedings{wang2022transtab,
  title={TransTab: Learning Transferable Tabular Transformers Across Tables},
  author={Wang, Zifeng and Sun, Jimeng},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

More Repositories

1

MedCLIP

EMNLP'22 | MedCLIP: Contrastive Learning from Unpaired Medical Images and Texts
Python
409
star
2

PyTrial

PyTrial: A Comprehensive Platform for Artificial Intelligence for Drug Development
Python
80
star
3

Influence_Subsampling

Official Implementation of Unweighted Data Subsampling via Influence Function - AAAI 2020
Python
66
star
4

BioBridge

ICLR'24 | BioBridge: Bridging Biomedical Foundation Models via Knowledge Graphs
Jupyter Notebook
54
star
5

Hurst-exponent-R-S-analysis-

Calculates the Hurst exponent of a time series based on Rescaled range (R/S) analysis.
Python
48
star
6

PAC-Bayes-IB

Official repo for PAC-Bayes Information Bottleneck. ICLR 2022.
Jupyter Notebook
44
star
7

SurvTRACE

SurvTRACE: Transformers for Survival Analysis with Competing Events
Python
43
star
8

CVIB-Rec

Official Implementation of Information Theoretic Counterfactual Learning from Missing Not At Random Feedback. NeurIPS 2020.
Python
26
star
9

PromptEHR

EMNLP'22 | PromptEHR: Conditional Electronic Healthcare Records Generation with Prompt Learning
Python
23
star
10

Trial2Vec

Findings of EMNLP'22 | Trial2Vec: Zero-Shot Clinical Trial Document Similarity Search using Self-Supervision
Python
19
star
11

MediTab

The code for the paper "MediTab: Scaling Medical Tabular Data Predictors via Data Consolidation, Enrichment, and Refinement"
Jupyter Notebook
18
star
12

SDSIT

Seminar in Data Science and Information Technology (SDSIT) given by Prof. Laurent El Ghaoui, in Summer, 2020.
9
star
13

StockPricePrediction

A demo of stockprice prediction & arbitrage trading.
Python
7
star
14

HGNN_EHR

Python
5
star
15

Uncertainty-Curriculum-Learning

Official Implementation of Uncertainty-guided Curriculum Learning via Infinitesimal Jackknife.
Jupyter Notebook
4
star
16

CUDA_Tutorial

My CUDA C practices while learning the CUDA C Programming Introduction.
Cuda
3
star
17

construction-site-segmentation

An open dataset for semantic segmentation on construction site, released by the paper: "Deep Semantic Segmentation for Visual Understanding on Construction Sites".
Python
3
star
18

B.S._Graduate_Project

My B.E. graduate project codes.
Python
2
star
19

tf-FFM

Tensorflow based field-aware factorization machine, FFM.
Python
2
star
20

Face_Recognition

A demo on face detection, classification and verification via tensorflow.
Python
1
star
21

tf-deepFM

A deep factorization machine (deepFM) implemented via Tensorflow.
Python
1
star
22

QuantStrategy

Several tools & backtest samples being implemented in Financial Engineering.
Python
1
star
23

ILSVRC2015_VID_operation

How to use ILSVRC2015 video data for object tracking train and test.
Python
1
star
24

My_Leetcode_Solution

Leetcode Practices.
Python
1
star
25

Quant_Investment_Course

Introduction to the Statistical Arbitrage investment.
Python
1
star
26

Learning_from_data

The homework of the course Learning From Data taught by Prof.HUANG in Tsinghua-Berkeley Shenzhen Institute(TBSI).
Python
1
star