• This repository has been archived on 21/Jun/2024
  • Stars
    star
    336
  • Rank 125,564 (Top 3 %)
  • Language
    Python
  • License
    MIT License
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Toolformer

Open-source implementation of Toolformer: Language Models Can Teach Themselves to Use Tools by Meta AI.

Abstract

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.

How to run

Inference

Models are available on huggingface! toolformer_v0

Quick example on how to launch it below:

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline

tokenizer = AutoTokenizer.from_pretrained(r"dmayhem93/toolformer_v0_epoch2")
model = AutoModelForCausalLM.from_pretrained(
    r"dmayhem93/toolformer_v0_epoch2",
    torch_dtype=torch.float16,
    low_cpu_mem_usage=True,
).cuda()
generator = pipeline(
    "text-generation", model=model, tokenizer=tokenizer, device=0
) 

Model Performance

v0

The model is currently able to do retrieval. In a one shot setting it will pick it up without too much hand holding. For zero shot, adding a token bias to the <TOOLFORMER_API_START>(token index 50257) will get it started.

Token bias seems to depend on the length of context, 2.5 with minimal context, 7.5 with a lot of context, seemed to be good numbers in the brief testing.

Calculation and Calendar are a WIP, you can give it a shot, but don't expect good results.

Tool Integration

WIP

Tool integration into sampling is a work in progress, so you will need to manually perform the tool integration.

e.g. when it outputs <TOOLFORMER_API_START>Calculator(1 + 2)<TOOLFORMER_API_RESPONSE> you will need to input 3<TOOLFORMER_API_END> right after.

For retrieval, copy/pasting search results seems to work, but pasting results from actual retrieval is better if you have it.

To get some retrieval, here is a brief script on setting it up with some data you'll load in and retrieve from.

from tools import Retriever
import json


if __name__ == '__main__':
    retriever = Retriever()
    ret_val = "location of New Orleans"
    with open('retrieval_test_data.json', encoding='utf-8') as f:
        ret_strings = json.load(f)
    print(', '.join(retriever.retrieval(
        ret_strings, ret_val, 3
    )))

Data generation

Looking to make your own data?

python data_generator.py --num_devices=x, --device_id=y

Will let you run it without collision on x devices, so if you only have one,

python data_generator.py --num_devices=1, --device_id=0

Each one uses an entire GPU, so if you want to run in a node with multiple GPUs please set your CUDA_VISIBLE_DEVICES, e.g.

export CUDA_VISIBLE_DEVICES=5
python data_generator.py --num_devices=8, --device_id=5

The easiest way to gather multiple tools would be to make a data_generator script for each tool you want to use

finally, after you have your results, some minimal postprocessing scripts are in this folder

You'll probably want to look at your data and figure out if there's any filtering needed.

For an example of what it looks like after, our first dataset generation is here, and the postprocessed outputs ready for HF trainer is here

How to train

We used huggingface's run_clm.py which we put in this repository as train_gptj_toolformer.py.

We used a batch size of 32 (4/device), command used is below

deepspeed train_gptj_toolformer.py --model_name_or_path=EleutherAI/gpt-j-6B --per_device_train_batch_size=4 \
  --num_train_epochs 10 --save_strategy=epoch --output_dir=finetune_toolformer_v0 --report_to "wandb" \
  --dataset_name dmayhem93/toolformer-v0-postprocessed --tokenizer_name customToolformer \
  --block_size 2048 --gradient_accumulation_steps 1 --do_train --do_eval --evaluation_strategy=epoch \
  --logging_strategy=epoch --fp16 --overwrite_output_dir --adam_beta1=0.9 --adam_beta2=0.999 \
  --weight_decay=2e-02 --learning_rate=1e-05 --warmup_steps=100 --per_device_eval_batch_size=1 \
  --cache_dir="hf_cache" --gradient_checkpointing=True --deepspeed ds_config_gpt_j.json

Citations

@misc{https://doi.org/10.48550/arxiv.2302.04761,
  doi = {10.48550/ARXIV.2302.04761},
  
  url = {https://arxiv.org/abs/2302.04761},
  
  author = {Schick, Timo and Dwivedi-Yu, Jane and Dessì, Roberto and Raileanu, Roberta and Lomeli, Maria and Zettlemoyer, Luke and Cancedda, Nicola and Scialom, Thomas},
  
  keywords = {Computation and Language (cs.CL), FOS: Computer and information sciences, FOS: Computer and information sciences},
  
  title = {Toolformer: Language Models Can Teach Themselves to Use Tools},
  
  publisher = {arXiv},
  
  year = {2023},
  
  copyright = {arXiv.org perpetual, non-exclusive license}
}

@Article{dao2022flashattention,
    title={Flashattention: Fast and memory-efficient exact attention with io-awareness},
    author={Dao, Tri and Fu, Daniel Y and Ermon, Stefano and Rudra, Atri and R{'e}, Christopher},
    journal={arXiv preprint arXiv:2205.14135},
    year={2022}
}

@software{Liang_Long_Context_Transformer_2023,
    author = {Liang, Kaizhao},
    doi = {10.5281/zenodo.7651809},
    month = {2},
    title = {{Long Context Transformer v0.0.1}},
    url = {https://github.com/github/linguist},
    version = {0.0.1},
    year = {2023}
}

More Repositories

1

PaLM

An open-source implementation of Google's PaLM models
Python
805
star
2

LaMDA-rlhf-pytorch

Open-source pre-training implementation of Google's LaMDA in PyTorch. Adding RLHF similar to ChatGPT.
Python
460
star
3

t5-pytorch

Implementation of Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer in PyTorch.
Python
46
star
4

flash-gpt

Add Flash-Attention to Huggingface Models
Python
33
star
5

vit-flax

Implementation of numerous Vision Transformers in Google's JAX and Flax.
Python
19
star
6

PaLM-flax

Implementation of the SOTA Transformer architecture from PaLM - Scaling Language Modeling with Pathways in JAX/Flax
Python
14
star
7

PaLM-rlhf-jax

Implementation of RLHF (Reinforcement Learning with Human Feedback) on top of the PaLM architecture. Basically ChatGPT but with PaLM. Built in collaboration with Lucidrains.
Python
9
star
8

ViT-Patch-Merger

Python
7
star
9

SemDeDup

An unofficial implementation of SemDeDup: Data-efficient learning at web-scale through semantic deduplication.
Python
6
star
10

Huggingface-deduplicate

Python
5
star
11

hf_fsdp

A very basic fsdp wrapper for HF.
Python
5
star
12

Simple-ViT-flax

Python
4
star
13

flan-Llama

Code for training Llama on the Flan Collection
Python
4
star
14

dsp-langchain

Python
4
star
15

Sparrow-rlhf-pytorch

An open-source implementation of DeepMind's Sparrow with RLHF.
Python
4
star
16

Adaptive-Token-Sampling-Flax

Python
4
star
17

Llama-rlhf-pytorch

3
star
18

Perpetrator

An API for Redteaming large language models.
3
star
19

CaiT-Flax

Python
3
star
20

oig-Llama

Open Instruct fine-tuned Llama
3
star
21

jrlx

Python
2
star
22

Twins-SVT-Flax

An open-source implementation of the Twins: Revisiting the Design of Spatial Attention in Vision Transformers research paper in Google's JAX and Flax.
Python
2
star
23

t8t-pytorch

The official repository for Towards 8-bit Transformers. Stable training of LLMs at mixed 8-bit precision.
2
star
24

WebGPT

Combining LangChain with LLMs to build WebGPT.
2
star
25

BloomCoder

Python
2
star
26

podcasts-are-all-you-need

Python
2
star
27

ViT

Python
1
star
28

Crossformer-flax

Python
1
star
29

DeepViT-flax

Implementation of Deep Vision Transformer in Flax
Python
1
star
30

ViT-haiku

Python
1
star
31

LeViT-flax

Python
1
star
32

OPTCode

Python
1
star
33

bpt-pytorch

Python
1
star
34

deep-reinforcement-learning-go

Creation and implementation of an Alpha Go engine from scratch
Jupyter Notebook
1
star
35

Everything-Machine-Learning

1
star
36

Token-to-Token-ViT-flax

Python
1
star
37

ViT-Small-Datasets-flax

Python
1
star
38

job-posting-analysis

Data Science job posting analysis and resume comparison using NLP and Scikit Learn
Jupyter Notebook
1
star
39

charcuterie

A library for sampling Huggingface datasets.
1
star
40

tokenize_hf

Python
1
star
41

annotated-transformer

Python
1
star
42

Orchestrator

An API for different data cleaning tools.
Python
1
star
43

conceptofmind

Config files for my GitHub profile.
1
star
44

unsloth-bert

Python
1
star