• Stars
    star
    653
  • Rank 68,497 (Top 2 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

DeepTables: Deep-learning Toolkit for Tabular data

DeepTables

Python Versions TensorFlow Versions Downloads PyPI Version

Documentation Status Build Status Coverage Status License

We Are Hiring!

Dear folks, we are opening several precious positions based in Beijing both for professionals and interns avid in AutoML/NAS, please send your resume/cv to [email protected]. (Application deadline: TBD.)

DeepTables: Deep-learning Toolkit for Tabular data

DeepTables(DT) is a easy-to-use toolkit that enables deep learning to unleash great power on tabular data.

Overview

MLP (also known as Fully-connected neural networks) have been shown inefficient in learning distribution representation. The "add" operations of the perceptron layer have been proven poor performance to exploring multiplicative feature interactions. In most cases, manual feature engineering is necessary and this work requires extensive domain knowledge and very cumbersome. How learning feature interactions efficiently in neural networks becomes the most important problem.

Various models have been proposed to CTR prediction and continue to outperform existing state-of-the-art approaches to the late years. Well-known examples include FM, DeepFM, Wide&Deep, DCN, PNN, etc. These models can also provide good performance on tabular data under reasonable utilization.

DT aims to utilize the latest research findings to provide users with an end-to-end toolkit on tabular data.

DT has been designed with these key goals in mind:

  • Easy to use, non-experts can also use.
  • Provide good performance out of the box.
  • Flexible architecture and easy expansion by user.

Tutorials

Please refer to the official docs at https://deeptables.readthedocs.io/en/latest/.

Installation

pip is recommended to install DeepTables:

pip install tensorflow deeptables

Note:

  • Tensorflow is required by DeepTables, install it before running DeepTables.

GPU Setup (Optional)

To use DeepTables with GPU devices, install tensorflow-gpu instead of tensorflow.

pip install tensorflow-gpu deeptables

Verify the installation:

python -c "from deeptables.utils.quicktest import test; test()"

Optional dependencies

Following libraries are not hard dependencies and are not automatically installed when you install DeepTables. To use all functionalities of DT, these optional dependencies must be installed.

pip install shap

Example:

A simple binary classification example

import numpy as np
from deeptables.models import deeptable, deepnets
from deeptables.datasets import dsutils
from sklearn.model_selection import train_test_split

#loading data
df = dsutils.load_bank()
df_train, df_test = train_test_split(df, test_size=0.2, random_state=42)

y = df_train.pop('y')
y_test = df_test.pop('y')

#training
config = deeptable.ModelConfig(nets=deepnets.DeepFM)
dt = deeptable.DeepTable(config=config)
model, history = dt.fit(df_train, y, epochs=10)

#evaluation
result = dt.evaluate(df_test,y_test, batch_size=512, verbose=0)
print(result)

#scoring
preds = dt.predict(df_test)

A solution using DeepTables to win the 1st place in Kaggle Categorical Feature Encoding Challenge II

Click here

Citation

If you use DeepTables in your research, please cite us as follows:

Jian Yang, Xuefeng Li, Haifeng Wu. DeepTables: A Deep Learning Python Package for Tabular Data. https://github.com/DataCanvasIO/DeepTables, 2022. Version 0.2.x.

BibTex:

@misc{deeptables,
  author={Jian Yang, Xuefeng Li, Haifeng Wu},
  title={{DeepTables}: { A Deep Learning Python Package for Tabular Data}},
  howpublished={https://github.com/DataCanvasIO/DeepTables},
  note={Version 0.2.x},
  year={2022}
}

DataCanvas

DeepTables is an open source project created by DataCanvas.

More Repositories

1

YLearn

YLearn, a pun of "learn why", is a python package for causal inference
Python
393
star
2

HyperGBM

A full pipeline AutoML tool for tabular data
Python
324
star
3

Hypernets

A General Automated Machine Learning framework to simplify the development of End-to-end AutoML toolkits in specific domains.
Python
262
star
4

HyperTS

A Full-Pipeline Automated Time Series (AutoTS) Analysis Toolkit.
Python
260
star
5

Alaya

Python
43
star
6

LMPM

JavaScript
41
star
7

Cooka

A lightweight and visual AutoML system
Python
39
star
8

HyperKeras

An AutoDL tool for Neural Architecture Search and Hyperparameter Optimization on Tensorflow and Keras
Jupyter Notebook
30
star
9

LMS

Python
17
star
10

CausalLab

An Interactive Causal Analysis Tool
Python
10
star
11

TSBenchmark

A benchmarking framework for time series
Python
8
star
12

MMAlaya

Python
8
star
13

screwjack

ScrewJack is a tiny command line tool for manipulating modules.
Python
8
star
14

tabular-toolbox

A library of extension and helper modules for tabular data base on python's machine learning frameworks.
Python
6
star
15

WAIC-2022-Hackathon-Causal-Learning-and-Decision-Optimization-Challenge

WAIC 2022 Hackathon Causal Learning and Decision Optimization Challenge
Jupyter Notebook
6
star
16

pyDataCanvas

Runtime support for DataCanvas
Python
4
star
17

StreamTau2

A web app to manager job submitting.
Java
4
star
18

HyperBoard

HyperBoard is a visualization tool designed for Hypernets
JavaScript
4
star
19

HyperNLP

Python
2
star
20

expretau

ExpreTau is a simple expression engine written in Java, of which the runtime codes are splitted from parsing and compiling codes.
Java
2
star
21

example-modules

A series of example modules for DataCanvas.
Python
1
star
22

deeptables-docs-zh_CN

deeptables 的中文文档。
1
star
23

schetau

ScheTau is a scheduler build on top of storages.
Java
1
star