FreedomIntelligence/TextClassificationBenchmark

Stars
601
Rank 74,537 (Top 2 %)
Language
Python
License
MIT License
Created almost 7 years ago
Updated 7 months ago

FreedomIntelligence/TextClassificationBenchmark

FreedomIntelligence

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

A Benchmark of Text Classification in PyTorch

Text Classification Benchmark

A Benchmark of Text Classification in PyTorch

Motivation

We are trying to build a Benchmark for Text Classification including

Many Text Classification DataSet, including Sentiment/Topic Classfication, popular language(e.g. English and Chinese). Meanwhile, a basic word embedding is provided.

Implment many popular and state-of-art Models, especially in deep neural network.

Have done

We have done some dataset and models

Dataset done

IMDB
SST
Trec

Models done

FastText
BasicCNN (KimCNN,MultiLayerCNN, Multi-perspective CNN)
InceptionCNN
LSTM (BILSTM, StackLSTM)
LSTM with Attention (Self Attention / Quantum Attention)
Hybrids between CNN and RNN (RCNN, C-LSTM)
Transformer - Attention is all you need
ConS2S
Capsule
Quantum-inspired NN

Libary

You should have install these librarys

python3
torch
torchtext (optional)

Dataset

Dataset will be automatically configured in current path, or download manually your data in Dataset, step-by step.

including

Glove embeding
Sentiment classfication dataset IMDB

usage

Run in default setting

python main.py

CNN

python main.py --model cnn

LSTM

python main.py --model lstm

Road Map

Organisation of the repository

The core of this repository is models and dataset.

dataloader/: loading all dataset such as IMDB, SST
models/: creating all models such as FastText, LSTM,CNN,Capsule,QuantumCNN ,Multi-Head Attention
opts.py: Parameter and config info.
utils.py: tools.
dataHelper: data helper

Contributor

Welcome your issues and contribution!!!

LLMZoo

⚡LLM Zoo is a project that provides data, models, and evaluation benchmark for large language models.⚡

Medical_NLP

Medical NLP Competition, dataset, large models, paper

HuatuoGPT

HuatuoGPT, Towards Taming Language Models To Be a Doctor. (An Open Medical GPT)

InstructionZoo

crosstalk-generation

Code and data for crosstalk text generation tasks, exploring whether large models and pre-trained language models can understand humor.

CMB

CMB, A Comprehensive Medical Benchmark in Chinese

Evaluation-of-ChatGPT-on-Information-Extraction

An Evaluation of ChatGPT on Information Extraction task, including Named Entity Recognition (NER), Relation Extraction (RE), Event Extraction (EE) and Aspect-based Sentiment Analysis (ABSA).

qnn

ReasoningNLP

paper list on reasoning in NLP

GrammarGPT

The code and data for GrammarGPT.

complex-order

Huatuo-26M

The Largest-scale Chinese Medical QA Dataset： with 26,000,000 question answer pairs.

FastLLM

Fast LLM Training CodeBase With dynamic strategy choosing [Deepspeed+Megatron+FlashAttention+CudaFusionKernel+Compiler];

DPTDR

Code for COLING22 paper, DPTDR: Deep Prompt Tuning for Dense Passage Retrieval

GPT-API-Accelerate

The "GPT-API-Accelerate" project provides a set of Python classes for accelerating the process of generating responses to prompts using the OpenAI GPT-3.5 API.

REMOP

Code for the paper: Modular Retrieval for Generalization and Interpretation.

ReaLM

A trainable user simulator

ChatGPT-Detection-PR-HPPT

Codes and dataset for the paper: Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

Reading-list-of-ChatGPT

Autonomous_Learning

LLMs Could Autonomously Learn Without External Supervision. (An Autonomous Learning Method)

DotaGPT

Chinese Medical instruction-tuning Dataset

HuatuoGPT-R

RAG to reduce medical haluccination.

MultilingualSIFT

MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning

MindedWheeler

Embody_AI with car as Demo

MedJamba

Multilingual Medical Model Based On Jamba

finetune_chatgpt

The example for finetuning chatgpt.

ChatZoo

Chat data for training LLMs

LLMZOO-API-SDK

Overview-of-ChatGPT

LLMFactory

A factory to standardize LLM adaptation through modularization

OpenChatGPT

try_Phoenix2

Phoenix2 code in dev

MLLM-Bench

Evaluating Multi-modal LLMs using GPT-4V