• Stars
    star
    128
  • Rank 279,379 (Top 6 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created about 1 year ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

๐Ÿ‹ An unofficial implementation of Self-Alignment with Instruction Backtranslation.

๐Ÿ‹ Humback

An unofficial implementation of Self-Alignment with Instruction Backtranslation .

The proposed Humback is a novel framework that can augment the instruction data for supervised fine-tuning with high quality.

๐Ÿšง Currently, this repo is under construction and not finished.

Humback Framework

๐ŸŒด Dependencies

๐Ÿš€ QuickStart

Procedure (2 iters):

  1. Prepare seed data and unlabelled data.
  2. Train the backward model $M_{yx}$ on the reversed seed data.
  3. Self-augment the seed data via $M_{yx}$.
  4. Train a forward model $M_{0}$ on the seed data.
  5. Self-curate the unlabelled data $A_{k}^{(1)}$ via $M_{0}$ (tag quality scores).
  6. Train a forward model $M_{1}$ on the self-curated unlabelled data $A_{k}^{(1)}$.
  7. Use $M_{1}$ to self-curate the unlabelled data $A_{k}^{(2)}$.
  8. Train a forward model $M_{2}$ on the self-curated unlabelled data $A_{k}^{(2)}$.

Seed Data Pre-processing

We follow the original paper and use oasst1 to construct the seed data.

The processed data could be found here .

$ bash data/seed/download.sh
$ python data/seed/convert.py
# #data: 3286, #dump: 3200
# Instruction len: 149ยฑ266, Response len: 1184ยฑ799

Unlabelled Data Pre-processing

Since ClueWeb22 is not a free open-source dataset, we sample texts from falcon-refinedweb instead.

The processed data could be found here .

$ python data/unlabelled/falcon_refinedweb.py

Train Backward Model $M_{yx}$

Item Value
Foundation Model meta-llama/Llama-2-7b-hf
GPUs 8 * A100 40GB
Mixed Precision bf16
Gradient Checkpointing on
ZeRO-Offload Stage 2
Batch size 32
Steps 500
# The first Myx training takes about 30min (on the seed data)
$ bash scripts/train_backward_Myx.sh

The pre-trained $M_{yx}$ is available at Huggingface.

Self-Augmentation via $M_{yx}$

The augmentation data is available at Huggingface .

# Taking about 6:40:45 on the unlabelled data with 8*A100
$ bash scripts/self_aug.sh

Train Seed Model $M_{0}$

Hyper parameters are the same as $M_{yx}$.

$ bash scripts/train_seed.sh

The pre-trained $M_{0}$ is available at Huggingface (Uploading).

Self-Curation Prompting

The curated data is available at Huggingface .

# 33:54:45 with 8*A100 on 482,963 samples
$ bash scripts/self_curation.sh
# scores: [('None', 217203), ('4', 119211), ('3', 102756), ('5', 21301), ('1', 13083), ('2', 9288), ('8', 19), ('0', 15), ('9', 14), ('7', 11), ('6', 9), ('10', 4), ('91', 3), ('83', 2), ('20', 2), ('14', 2), ('75', 2), ('92', 2), ('72', 1), ('93', 1), ('28', 1), ('19', 1), ('728', 1), ('17', 1), ('16', 1), ('100', 1), ('237', 1), ('13', 1), ('73', 1), ('38', 1), ('87', 1), ('94', 1), ('98', 1), ('64', 1), ('52', 1), ('27', 1), ('24', 1), ('762', 1), ('266', 1), ('225', 1), ('80', 1), ('267', 1), ('99', 1), ('90', 1), ('63', 1), ('97', 1), ('78', 1), ('40', 1), ('1986', 1), ('47', 1), ('66', 1), ('45', 1), ('10502', 1), ('21', 1)]
# Number of qualified results (scores=5): 21301/482963
# instruction len: 198 ยฑ 351
# response len: 1601 ยฑ 345
# ---------------------------------------
# v2: (Strict Curation Score Matching: add `$` to the matching regex):
# Scores: [('None', 322324), ('3', 71851), ('4', 53120), ('5', 16460), ('1', 11921), ('2', 7260), ('0', 10), ('7', 4), ('6', 3), ('19', 1), ('8', 1), ('16', 1), ('13', 1), ('10', 1), ('23', 1), ('9', 1), ('90', 1), ('92', 1), ('45', 1)]
# Number of qualified results (scores=5): 15521/482963
# instruction len: 124 ยฑ 113
# response len: 1611 ยฑ 345
# ---------------------------------------
$ cat outputs/m1/unlabelled_curated_data.jsonl data/seed/seed.jsonl > data/curated/m1.jsonl

Train Models $M_{i}$

Most hyper parameters are the same as $M_{yx}$ except for the number of steps (the original Humback trains 1600 steps on 512k samples).

# change the `--data_path` in `scripts/train_seed.sh`
$ bash scripts/train_seed.sh

๐Ÿ“‘ Experimental Results

Other models: HuggingFaceH4/open_llm_leaderboard .

Model Average ARC HellaSwag MMLU TruthfulQA
Llama-2-7b 54.32 53.07 78.59 46.87 38.76
Llama-2-7b-chat 56.34 52.90 78.55 48.32 45.57
Vicuna-7b-v1.3 55.62 50.43 76.92 48.14 47.01
Humback $M_{0}$ 58.13 56.31 81.20 47.45 47.59
Humback $M_{1}$ 54.65 52.99 78.57 45.48 41.54
Humback $M_{1,\text{w/o DiffSysPrompt,TemplateVicuna1.1}}$ 55.85 52.82 78.53 45.86 46.21
Humback $M_{1,\text{w/o DiffSysPrompt,TemplateVicuna1.1,StrictCurationScoreMatching}}$ 54.26 53.50 78.52 45.19 39.83
Humback $M_{1,\text{w/o DiffSysPrompt,TemplateVicuna1.1,StrictCurationScoreMatching,1200steps}}$ 56.67 56.23 81.10 46.46 42.89
Humback $M_{1,\text{w/o DiffSysPrompt,TemplateVicuna1.1,StrictCurationScoreMatching,1800steps}}$ 57.58 57.68 81.78 46.13 44.74
Humback $M_{1,\text{w/o DiffSysPrompt,TemplateVicuna1.1,StrictCurationScoreMatching,2400steps}}$ 56.96 55.89 80.83 45.84 45.30

The results and the trend are not as good as the original paper, but the performance of $M_{0}$ is better than vanilla llama2-7b. Specifically, Humback $M_{1}$ is worse than $M_{0}$, and the different system prompts seem not be helpful on these benchmarks. By the way, although $M_{0}$ is good at these benchmarks, it may be not good at generating high quality and diversified responses on more tasks. Further experiments should be conducted to verify the effectiveness of the reproduced Humback $M_{0}$ (e.g. alpaca_eval with GPT4 as the judge).

Possible reasons are:

  1. The backward model $M_{yx}$ is not good enough to generate high quality instructions.
  2. The seed model $M_{0}$ is not competent to evaluate the generated quality (not all scores are ranging from 1 to 5).

Since I don't have GPT4 API keys, chatgpt_fn is used as the evaluator here (as introduced in alpaca_eval):

                       win_rate  standard_error  n_total  avg_length
gpt4                      73.79            1.54      805        1365
claude                    70.37            1.60      805        1082
chatgpt                   66.09            1.66      805         811
wizardlm-13b              65.16            1.67      805         985
vicuna-13b                64.10            1.69      805        1037
guanaco-65b               62.36            1.71      805        1249
oasst-rlhf-llama-33b      62.05            1.71      805        1079
alpaca-farm-ppo-human     60.25            1.72      805         803
falcon-40b-instruct       56.52            1.74      805         662
text_davinci_003          50.00            0.00      805         307
alpaca-7b                 45.22            1.74      805         396
HumbackM0                 32.30            1.65      805         548
text_davinci_001          28.07            1.56      805         296
HumbackM1                 23.35            1.49      805        1522

๐Ÿ”ฅ Further discussions are fully welcomed.

๐Ÿ“ TODO

  • train more steps on $M_{i}$.
  • remove system prompts when training $M_{0}$, $M_{i}$ and $M_{yx}$.

๐Ÿ’Œ Acknowledgments

๐Ÿ“œ Reference

@misc{li2023selfalignment,
    title={Self-Alignment with Instruction Backtranslation},
    author={Xian Li and Ping Yu and Chunting Zhou and Timo Schick and Luke Zettlemoyer and Omer Levy and Jason Weston and Mike Lewis},
    year={2023},
    eprint={2308.06259},
    archivePrefix={arXiv},
    primaryClass={cs.CL}
}

More Repositories

1

DocEE

๐Ÿ•น๏ธ A toolkit for document-level event extraction, containing some SOTA model implementations.
Python
232
star
2

Mirror

๐ŸชžA powerful toolkit for almost all the Information Extraction tasks.
Python
105
star
3

awesome-lm-evaluation

๐Ÿฉบ A collection of ChatGPT evaluation reports on various bechmarks.
49
star
4

random-luck

Automatically select the best random seed based on ancient Chinese I Ching. Good luck and best wishes !
Python
43
star
5

paper-hero

๐Ÿ’ช A toolkit to help search for papers from aclanthology, arXiv and dblp.
Python
42
star
6

watchmen

๐Ÿ˜Ž A simple and easy-to-use toolkit for GPU scheduling.
Python
40
star
7

MoE-SFT

๐Ÿผ Official implementation of Dynamic Data Mixing Maximizes Instruction Tuning for Mixture-of-Experts
Python
33
star
8

NYT-H

๐Ÿ“œ Codes and Data for COLING2020 paper: Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
Python
31
star
9

REx

๐ŸŽฎ A toolkit for Relation Extraction and more...
Python
23
star
10

CatalogExtraction

๐ŸŒณCED: Catalog Extraction from Documents
Python
15
star
11

writing-comrade

โœ’๏ธ ChatGPT as a writing partner.
Python
14
star
12

MCRF

Multiple CRF implementations for PyTorch
Python
8
star
13

keras-cnn-relation-classificaiton

Relation Classification via Convolutional Neural Network
Python
7
star
14

accept

Will my paper be accepted?
HTML
5
star
15

Spico197.github.io

ZHU Tong's homepage.
JavaScript
4
star
16

ReportShortcut

ไฝฟ็”จๅฟซๆท้”ฎ่พ…ๅŠฉ่พ“ๅ…ฅๅญฆๅทๅ’Œๅง“ๅ
AutoHotkey
4
star
17

smoe-eval

For smoe models evaluation. Commit: b281b0921b636bc36ad05c0b0b0763bd6dd43463
Python
4
star
18

feishu-alert-bots

Python
4
star
19

pytorch-cnn-re

PyTorch version Relation Extraction codes on SemEval2010-Task8
Python
3
star
20

TaskLAMA

๐Ÿšฉ An unoffical implementation of TaskLAMA.
Python
3
star
21

pcnn_one_nyt10

Python
3
star
22

car-senti

CCF-BDCI2018 ๆฑฝ่ฝฆ่กŒไธš็”จๆˆท่ง‚็‚นไธป้ข˜ๅŠๆƒ…ๆ„Ÿ่ฏ†ๅˆซ ่ฏ•ๆฐด
Jupyter Notebook
3
star
23

TaskPlanningPapers

๐Ÿฅ… A collection of task planning papers.
3
star
24

StudentHealthDailyUpdate

Python
2
star
25

TongfuRemotePrinter

A remote printer based on django
Python
2
star
26

PythonCompilerPrinciplesExp

็ผ–่ฏ‘ๅŽŸ็†ๅฎž้ชŒ๏ผˆPLY๏ผ‰
Python
2
star
27

get-an-idea

Geting an idea from enormous paper repositories.
Python
2
star
28

daily-arxiv

Daily arXiv feed
Vue
2
star
29

nlp-mate

You need a mate when processing natural languages ~
1
star
30

solocourse

A series of self-study materials of college knowledges.
1
star
31

elec-fields-waves

Electromagnetic field and electromagnetic wave formulas by latex
TeX
1
star
32

large-corpus-process

Tutorial for simple methods to process corpora.
Python
1
star
33

cnki-spider

cnki-spider
HTML
1
star
34

SUDA-Psychological-Counseling-Appointments

Checking available psychological counseling appointments in SUDA.
Python
1
star
35

BERT-CLS-MLC

Multi-class multi-label classification with BERT CLS token representation.
Python
1
star
36

ShikeLang

Bon Appรฉtit.
Python
1
star
37

Learn-REx

Learn REx by a simple demo project.
Python
1
star
38

server-remote-control

Remote power control by accessing BMI.
Python
1
star