• Stars
    star
    198
  • Rank 196,898 (Top 4 %)
  • Language
    Jupyter Notebook
  • Created over 1 year ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Code for Arxiv 2023: Improving Language Model Negociation with Self-Play and In-Context Learning from AI Feedback

GPT-Bargaining: Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback

title

Implmentation of paper: Improving Language Model Negotiation with Self-Play and In-Context Learning from AI Feedback. [Arxiv] 2023.

Yao Fu. University of Edinburgh

We study whether multiple large language models (LLMs) can autonomously improve each other in a negotiation game by playing, reflecting, and criticizing. We ask two large language models to negotiate with each other, playing the roles of a buyer and a seller, respectively. They aim to reach a deal with the buyer targeting a lower price and the seller a higher one. A third language model, playing the critic, provides feedback to a player to improve the player’s negotiation strategies. We let the two agents play multiple rounds, using previous negotiation history and the AI feedback as in-context demonstrations to improve the model’s negotiation performance iteratively.

We believe our results have profound implications for AI research: on the positive side, it indicates the possibility of continuously improving language models with minimal human intervention. On the risky side, it might be more challenging to oversight the model behavior in our framework because models are acting autonomously, which calls for future alignment and safety research in the multi-agent game setting.

Quickstart

mkdir outputs 

api_key=<YOUR_OPENAI_API_KEY>
anthropic_api_key=<YOUR_ANTHROPIC_API_KEY>
game_type=criticize_seller
moderator_instruction=moderator_0509
verbose=1
n_round=10
n_rollout=5
n_exp=200
ver=criticize_claude_instant_seller
seller_engine=claude-instant-v1.0
seller_critic_engine=claude-instant-v1.0
game_version=${game_type}_${n_exp}_runs_${n_rollout}_rollout_ver_${ver}
python run.py\
    --api_key=${api_key}\
    --anthropic_api_key=${anthropic_api_key}\
    --seller_engine=${seller_engine}\
    --seller_critic_engine=${seller_critic_engine}\
    --game_type=${game_type}\
    --verbose=${verbose}\
    --n_round=${n_round}\
    --n_exp=${n_exp}\
    --n_rollout=${n_rollout}\
    --moderator_instruction=${moderator_instruction}\
    --ver=${ver}\
    --game_version=${game_version} 
Code structure: 

agent.py: implementation of different agents
lib_api.py: wrappers of LLM APIs
run.py: run the bargaining game!

lib_prompt: prompt library used in this project 
exps: experiments run in this project 
notebooks: visualization tools 

Examples

Bargaining and improving from AI feedback.

example_run

AI feedback about bargaining strategies

example_feedback

Improvements over multiple rounds of AI Feedback

example_feedback

TODOs

  • Include chat-bison-001
  • Finish claude-100k

More Repositories

1

chain-of-thought-hub

Benchmarking large language models' complex reasoning ability with chain-of-thought prompting
Jupyter Notebook
2,556
star
2

Deep-Generative-Models-for-Natural-Language-Processing

DGMs for NLP. A roadmap.
392
star
3

dgm_latent_bow

Implementation of NeurIPS 19 paper: Paraphrase Generation with Latent Bag of Words
Python
124
star
4

FlanT5-CoT-Specialization

Implementation of ICML 23 Paper: Specializing Smaller Language Models towards Multi-Step Reasoning.
Jupyter Notebook
122
star
5

Distributional-Generalization-in-Natural-Language-Processing

Distributional Generalization in NLP. A roadmap.
Jupyter Notebook
86
star
6

Gumbel-CRF

Implementation of NeurIPS 20 paper: Latent Template Induction with Gumbel-CRFs
Python
53
star
7

PoincareProbe

Implementation of ICLR 21 paper: Probing BERT in Hyperbolic Spaces
Jupyter Notebook
50
star
8

Partially-Observed-TreeCRFs

Implementation of AAAI 21 paper: Nested Named Entity Recognition with Partially Observed TreeCRFs
Python
49
star
9

franxyao.github.io

35
star
10

Language-Model-Pretraining-for-Text-Generation

LM pretraining for generation, reading list, resources, conference mappings.
19
star
11

pivot_analysis

Implementation of INLG 19 paper: Rethinking Text Attribute Transfer: A Lexical Analysis
Python
15
star
12

RDP

Implementation of ICML 22 Paper: Scaling Structured Inference with Randomization
Jupyter Notebook
13
star
13

Complexity-Based-Prompting

Complexity Based Prompting for Multi-Step Reasoning
10
star
14

prompt-handbook

Rules of Thumb πŸ‘ for Writing Good Magical Prompts
5
star
15

nlu-cw2

Python
4
star
16

Natural-Ansewr-Generation

Python
2
star
17

Retrieval-Head-with-Flash-Attention

Efficient retrieval head analysis with triton flash attention that supports topK probability
Jupyter Notebook
2
star
18

SCAN_reproduce

Python
1
star