• Stars
    star
    6
  • Rank 2,539,965 (Top 51 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 4 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Extract Knowledge from wiki dump file

More Repositories

1

awesome-chatgpt-dataset

Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!
Python
688
star
2

TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
Python
537
star
3

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark
Python
201
star
4

tw_stocker

keep tracking and store taiwan stock information
Python
100
star
5

TFkit

πŸ€–πŸ“‡ handling multiple nlp task in one pipeline
Python
56
star
6

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together
Python
41
star
7

vall-e-encodec

Python
41
star
8

BertGenerate

Fine tuning bert for text generation
Jupyter Notebook
38
star
9

asr-trainer

one script for xls-r/xlsr/whisper fine-tuning
Python
37
star
10

aidev

Revolutionize your development workflow with AI-powered code assistance, automating mock tests, suggestions, and unit test generation in a single Python CLI tool.
Python
35
star
11

NLPrep

🍳 NLPrep - dataset tool for many natural language processing task
Python
28
star
12

BDG

Code for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."
Python
27
star
13

Phraseg

Phraseg - δΈ€θ¨€οΌšζ–°θ©žη™ΌηΎε·₯ε…·εŒ…
Jupyter Notebook
26
star
14

wav2vec2-xlsr-multilingual-56

56 language, 1 model Multilingual ASR
Python
23
star
15

FTA

Technical Analysis on Cryptocurrency
Python
23
star
16

ChineseErrorDataset

CGED & CSC
22
star
17

asrp

ASR text preprocessing utility
Python
20
star
18

nlp2go

πŸƒ hosting nlp models in one line
CSS
20
star
19

ipa2

Tools for convert Text to IPA in python
Python
16
star
20

nlp2

βš™οΈTool for NLP - handle file and text
Python
15
star
21

awesome-question-answering-dataset

A list of awesome machine question answering dataset - ζ©Ÿε™¨ε•η­”ζ•Έζ“šι›†
15
star
22

pretrain_bart

training BART from scratch
Python
12
star
23

SnapShare

Linking Your Phone To Computer Browser With Socket.io.
JavaScript
10
star
24

causal-lm-trainer

Python
8
star
25

wav2vec-u-exp

Build and Run Wav2vec Unsupervised Experiment
Dockerfile
8
star
26

whisper-live-asr-demo

run whisper on CPU/GPU server
JavaScript
8
star
27

gpu-info-api

πŸ±β€πŸ’» GPU Info API is an API that provides detailed information about Nvidia, AMD, and Intel GPUs. The information is extracted from Wikipedia and stored in JSON format.
Python
8
star
28

t5lephone

phoneme byt5
Python
7
star
29

MMLM

Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra
Python
7
star
30

llm-estimator

Effortlessly predict training time, loss, and cost for LLM model training
JavaScript
6
star
31

react-media-viewer

Ready to go Media Player Component for React.
JavaScript
6
star
32

dtokenizer

discretize everything into tokens
Python
6
star
33

hubert-cluster-code

Extract clustering feature from hubert
5
star
34

pytorch-tta

Pytorch implementation of "Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning".
Python
5
star
35

GSQA

Generative Spoken Question Answering
Python
4
star
36

taiwan-company-network

ε°η£ε…¬εΈζŠ•θ³‡ι—œδΏ‚εœ–
CSS
4
star
37

DevLEGO

Create your development Env like LEGO blocks, run your projects on any device - be it a PC, Web, Phone or Tablet!
Shell
4
star
38

awesome-evaluation-lm

Collection Of Automated Language Model Assessment
3
star
39

fastpages

Jupyter Notebook
3
star
40

modelhub

3
star
41

Gossiping-Chinese-Positive-Corpus

PTT ε…«ε¦η‰ˆε•η­”-正青-δΈ­ζ–‡θͺžζ–™
3
star
42

survey-builder

survey builder for human evaluation
JavaScript
3
star
43

voidful

Python
3
star
44

audio-preprocessing-pipeline

Python
3
star
45

DG-Showcase

Showcase for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."
CSS
3
star
46

hubert-pretrain

using huggingface trainer to pre-train hubert
Python
2
star
47

dpr-multilingual

A multilingual version of DPR
2
star
48

telenotify

Python
2
star
49

tts-corpus-creator

collection of different source of TTS api for generating corpus.
Python
2
star
50

diff-aspect-set-dg

Python
2
star
51

depack

Extract files from any type of archive in command line
Python
2
star
52

Data2QA

Unified QA with different modality input
Python
2
star
53

bindtorchaudio

`bindtorchaudio` is a Python package that allows for easy installation of the `torchaudio` library, which provides audio processing functionalities for the PyTorch machine learning framework.
Python
2
star
54

seq2seq-lm-trainer

This is a simple example of using the T5 model for sequence-to-sequence tasks, leveraging Hugging Face's `Trainer` for efficient model training.
Python
2
star
55

PPA

Prompt Pool Agent
Python
2
star
56

bforce

bruteforce is all you need in a unstable system
Python
1
star
57

twcc-usage-slack-bot

TWCC GPU Usage Notification Slack Bot
Python
1
star
58

shows

lib for system monitoring with CPU/GPU/DISK/MEM/NET
Python
1
star
59

get-stat

lib for system monitoring in Python / Web API (CPU/GPU/DISK/MEM/NET/SERVICE)
1
star
60

NLPrep-Datasets

HTML
1
star
61

pearl

PEARL - Optimize Prompt Selection for Enhanced Answer Performance Using Reinforcement Learning
Python
1
star
62

uni-superb

Python
1
star
63

huggingface_notebook

Jupyter Notebook
1
star
64

superb-website

JavaScript
1
star
65

leverage-lm

small lm + RAG > LLM
1
star
66

fbcrawler

Python
1
star
67

SoundON

Python
1
star