Eric Lam (@voidful)

Top repositories

1

awesome-chatgpt-dataset

Unlock the Power of LLM: Explore These Datasets to Train Your Own ChatGPT!
Python
688
star
2

TextRL

Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
Python
537
star
3

Codec-SUPERB

Audio Codec Speech processing Universal PERformance Benchmark
Python
201
star
4

tw_stocker

keep tracking and store taiwan stock information
Python
100
star
5

TFkit

πŸ€–πŸ“‡ handling multiple nlp task in one pipeline
Python
56
star
6

SpeechMix

Explore different way to mix speech model(wav2vec2, hubert) and nlp model(BART,T5,GPT) together
Python
41
star
7

vall-e-encodec

Python
41
star
8

BertGenerate

Fine tuning bert for text generation
Jupyter Notebook
38
star
9

asr-trainer

one script for xls-r/xlsr/whisper fine-tuning
Python
37
star
10

aidev

Revolutionize your development workflow with AI-powered code assistance, automating mock tests, suggestions, and unit test generation in a single Python CLI tool.
Python
35
star
11

NLPrep

🍳 NLPrep - dataset tool for many natural language processing task
Python
28
star
12

BDG

Code for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."
Python
27
star
13

Phraseg

Phraseg - δΈ€θ¨€οΌšζ–°θ©žη™ΌηΎε·₯ε…·εŒ…
Jupyter Notebook
26
star
14

wav2vec2-xlsr-multilingual-56

56 language, 1 model Multilingual ASR
Python
23
star
15

FTA

Technical Analysis on Cryptocurrency
Python
23
star
16

ChineseErrorDataset

CGED & CSC
22
star
17

asrp

ASR text preprocessing utility
Python
20
star
18

nlp2go

πŸƒ hosting nlp models in one line
CSS
20
star
19

ipa2

Tools for convert Text to IPA in python
Python
16
star
20

nlp2

βš™οΈTool for NLP - handle file and text
Python
15
star
21

awesome-question-answering-dataset

A list of awesome machine question answering dataset - ζ©Ÿε™¨ε•η­”ζ•Έζ“šι›†
15
star
22

pretrain_bart

training BART from scratch
Python
12
star
23

SnapShare

Linking Your Phone To Computer Browser With Socket.io.
JavaScript
10
star
24

causal-lm-trainer

Python
8
star
25

wav2vec-u-exp

Build and Run Wav2vec Unsupervised Experiment
Dockerfile
8
star
26

whisper-live-asr-demo

run whisper on CPU/GPU server
JavaScript
8
star
27

gpu-info-api

πŸ±β€πŸ’» GPU Info API is an API that provides detailed information about Nvidia, AMD, and Intel GPUs. The information is extracted from Wikipedia and stored in JSON format.
Python
8
star
28

t5lephone

phoneme byt5
Python
7
star
29

MMLM

Toward Multi Modality Language Model - implementation of GPT-4o/Project Astra
Python
7
star
30

llm-estimator

Effortlessly predict training time, loss, and cost for LLM model training
JavaScript
6
star
31

WikiExtractor

Extract Knowledge from wiki dump file
Python
6
star
32

react-media-viewer

Ready to go Media Player Component for React.
JavaScript
6
star
33

dtokenizer

discretize everything into tokens
Python
6
star
34

hubert-cluster-code

Extract clustering feature from hubert
5
star
35

pytorch-tta

Pytorch implementation of "Fast and Accurate Deep Bidirectional Language Representations for Unsupervised Learning".
Python
5
star
36

GSQA

Generative Spoken Question Answering
Python
4
star
37

taiwan-company-network

ε°η£ε…¬εΈζŠ•θ³‡ι—œδΏ‚εœ–
CSS
4
star
38

DevLEGO

Create your development Env like LEGO blocks, run your projects on any device - be it a PC, Web, Phone or Tablet!
Shell
4
star
39

awesome-evaluation-lm

Collection Of Automated Language Model Assessment
3
star
40

fastpages

Jupyter Notebook
3
star
41

Gossiping-Chinese-Positive-Corpus

PTT ε…«ε¦η‰ˆε•η­”-正青-δΈ­ζ–‡θͺžζ–™
3
star
42

survey-builder

survey builder for human evaluation
JavaScript
3
star
43

voidful

Python
3
star
44

audio-preprocessing-pipeline

Python
3
star
45

DG-Showcase

Showcase for "A BERT-based Distractor Generation Scheme with Multi-tasking and Negative Answer Training Strategies."
CSS
3
star
46

modelhub

3
star
47

hubert-pretrain

using huggingface trainer to pre-train hubert
Python
2
star
48

dpr-multilingual

A multilingual version of DPR
2
star
49

telenotify

Python
2
star
50

tts-corpus-creator

collection of different source of TTS api for generating corpus.
Python
2
star
51

diff-aspect-set-dg

Python
2
star
52

depack

Extract files from any type of archive in command line
Python
2
star
53

Data2QA

Unified QA with different modality input
Python
2
star
54

bindtorchaudio

`bindtorchaudio` is a Python package that allows for easy installation of the `torchaudio` library, which provides audio processing functionalities for the PyTorch machine learning framework.
Python
2
star
55

seq2seq-lm-trainer

This is a simple example of using the T5 model for sequence-to-sequence tasks, leveraging Hugging Face's `Trainer` for efficient model training.
Python
2
star
56

PPA

Prompt Pool Agent
Python
2
star
57

bforce

bruteforce is all you need in a unstable system
Python
1
star
58

twcc-usage-slack-bot

TWCC GPU Usage Notification Slack Bot
Python
1
star
59

shows

lib for system monitoring with CPU/GPU/DISK/MEM/NET
Python
1
star
60

get-stat

lib for system monitoring in Python / Web API (CPU/GPU/DISK/MEM/NET/SERVICE)
1
star
61

NLPrep-Datasets

HTML
1
star
62

pearl

PEARL - Optimize Prompt Selection for Enhanced Answer Performance Using Reinforcement Learning
Python
1
star
63

uni-superb

Python
1
star
64

huggingface_notebook

Jupyter Notebook
1
star
65

superb-website

JavaScript
1
star
66

leverage-lm

small lm + RAG > LLM
1
star
67

fbcrawler

Python
1
star
68

SoundON

Python
1
star
69

blog

0
star