• Stars
    star
    600
  • Rank 74,102 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 7 years ago
  • Updated 5 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Benchmark of Text Classification in PyTorch

Text Classification Benchmark

A Benchmark of Text Classification in PyTorch

Motivation

We are trying to build a Benchmark for Text Classification including

Many Text Classification DataSet, including Sentiment/Topic Classfication, popular language(e.g. English and Chinese). Meanwhile, a basic word embedding is provided.

Implment many popular and state-of-art Models, especially in deep neural network.

Have done

We have done some dataset and models

Dataset done

  • IMDB
  • SST
  • Trec

Models done

  • FastText
  • BasicCNN (KimCNN,MultiLayerCNN, Multi-perspective CNN)
  • InceptionCNN
  • LSTM (BILSTM, StackLSTM)
  • LSTM with Attention (Self Attention / Quantum Attention)
  • Hybrids between CNN and RNN (RCNN, C-LSTM)
  • Transformer - Attention is all you need
  • ConS2S
  • Capsule
  • Quantum-inspired NN

Libary

You should have install these librarys

python3
torch
torchtext (optional)

Dataset

Dataset will be automatically configured in current path, or download manually your data in Dataset, step-by step.

including

Glove embeding
Sentiment classfication dataset IMDB

usage

Run in default setting

python main.py

CNN

python main.py --model cnn

LSTM

python main.py --model lstm

Road Map

  • Data preprossing framework
  • Models modules
  • Loss, Estimator and hyper-paramter tuning.
  • Test modules
  • More Dataset
  • More models

Organisation of the repository

The core of this repository is models and dataset.

  • dataloader/: loading all dataset such as IMDB, SST

  • models/: creating all models such as FastText, LSTM,CNN,Capsule,QuantumCNN ,Multi-Head Attention

  • opts.py: Parameter and config info.

  • utils.py: tools.

  • dataHelper: data helper

Contributor

Welcome your issues and contribution!!!

More Repositories

1

LLMZoo

โšกLLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.โšก
Python
2,922
star
2

Medical_NLP

Medical NLP Competition, dataset, large models, paper
2,066
star
3

HuatuoGPT

HuatuoGPT, Towards Taming Language Models To Be a Doctor. (An Open Medical GPT)
Python
527
star
4

InstructionZoo

184
star
5

crosstalk-generation

Code and data for crosstalk text generation tasks, exploring whether large models and pre-trained language models can understand humor.
Python
163
star
6

CMB

CMB, A Comprehensive Medical Benchmark in Chinese
Python
124
star
7

Evaluation-of-ChatGPT-on-Information-Extraction

An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).
Python
121
star
8

qnn

Python
112
star
9

ReasoningNLP

paper list on reasoning in NLP
109
star
10

GrammarGPT

The code and data for GrammarGPT.
Python
87
star
11

complex-order

Python
83
star
12

Huatuo-26M

The Largest-scale Chinese Medical QA Dataset๏ผš with 26,000,000 question answer pairs.
67
star
13

FastLLM

Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];
Python
32
star
14

DPTDR

Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval
Python
25
star
15

GPT-API-Accelerate

The "GPT-API-Accelerate" project provides a set of Python classes for accelerating the process of generating responses to prompts using the OpenAI GPT-3.5 API.
Python
19
star
16

REMOP

Code for the paper: Modular Retrieval for Generalization and Interpretation.
Python
11
star
17

ReaLM

A trainable user simulator
Python
9
star
18

ChatGPT-Detection-PR-HPPT

Codes and dataset for the paper: Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text
Python
9
star
19

Reading-list-of-ChatGPT

7
star
20

HuatuoGPT-R

RAG to reduce medical haluccination.
5
star
21

MultilingualSIFT

MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning
5
star
22

MindedWheeler

Embody_AI with car as Demo
C++
5
star
23

DotaGPT

Chinese Medical instruction-tuning Dataset
4
star
24

MedJamba

Multilingual Medical Model Based On Jamba
Python
4
star
25

finetune_chatgpt

The example for finetuning chatgpt.
Python
3
star
26

ChatZoo

Chat data for training LLMs
3
star
27

Autonomous_Learning

LLMs Could Autonomously Learn Without External Supervision. (An Autonomous Learning Method)
Python
3
star
28

LLMZOO-API-SDK

Python
3
star
29

Overview-of-ChatGPT

3
star
30

LLMFactory

A factory to standardize LLM adaptation through modularization
Python
2
star
31

OpenChatGPT

2
star
32

try_Phoenix2

Phoenix2 code in dev
Python
1
star
33

MLLM-Bench

Evaluating Multi-modal LLMs using GPT-4V
HTML
1
star