• Stars
    star
    218
  • Rank 181,805 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 2 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

We perform functional grounding of LLMs' knowledge in BabyAI-Text

Grounding Large Language Models with Online Reinforcement Learning

This repository contains the code used for our paper Grounding Large Language Models with Online Reinforcement Learning. We perform functional grounding of LLMs' knowledge in BabyAI-Text: Main schema

We then perform an in-depth anaylsis of the generalization abilities of our trained agents: Generalization schema

We release our BabyAI-Text environment along with the code to perform our experiments (both training agents and evaluating their performance). We rely on the Lamorel library to use LLMs.

Our repository is structured as follows:

πŸ“¦ Grounding_LLMs_with_online_RL
┣ πŸ“‚ babyai-text -- our BabyAI-Text environment
┣ πŸ“‚ experiments -- code for our experiments
┃ ┣ πŸ“‚ agents -- implementation of all our agents
┃ ┃ ┣ πŸ“‚ bot -- bot agent leveraging BabyAI's bot
┃ ┃ ┣ πŸ“‚ random_agent -- agent playing uniformly random
┃ ┃ ┣ πŸ“‚ drrn -- DRRN agent from here
┃ ┃ ┣ πŸ“‚ ppo -- agents using PPO
┃ ┃ ┃ ┣ πŸ“œ symbolic_ppo_agent.py -- SymbolicPPO adapted from BabyAI's PPO
┃ ┃ ┃ β”— πŸ“œ llm_ppo_agent.py -- our LLM agent grounded using PPO
┃ ┣ πŸ“‚ configs -- Lamorel configs for our experiments
┃ ┣ πŸ“‚ slurm -- utils scripts to launch our experiments on a SLURM cluster
┃ ┣ πŸ“‚ campaign -- SLURM scripts used to launch our experiments
┃ ┣ πŸ“œ train_language_agent.py -- train agents using BabyAI-Text (LLMs and DRRN) -> contains our implementation of PPO loss for LLMs as well as additional heads on top of LLMs
┃ ┣ πŸ“œ train_symbolic_ppo.py -- train SymbolicPPO on BabyAI (with BabyAI-Text's tasks)
┃ ┣ πŸ“œ post-training_tests.py -- generalization tests of trained agents
┃ ┣ πŸ“œ test_results.py -- utils to format results
┃ β”— πŸ“œ clm_behavioral-cloning.py -- code to perform Behavioral Cloning on an LLM using trajectories

Installation steps

  1. Create conda env
conda create -n dlp python=3.10.8; conda activate dlp
  1. Install PyTorch
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
  1. Install packages required by our package
pip install -r requirements.txt
  1. Install BabyAI-Text: See installation details in the babyai-text package

  2. Install Lamorel

git clone https://github.com/flowersteam/lamorel.git; cd lamorel/lamorel; pip install -e .; cd ../..

Launch

Please use Lamorel along with our configs. You can find examples of our training scripts in campaign.

Training a Language Model

To train a Language Model on a BabyAI-Text environment, one must use the train_language_agent.py file. This script (launched with Lamorel) uses the following config entries:

rl_script_args:
  seed: 1
  number_envs: 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  num_steps: 1000 # Total number of training steps
  max_episode_steps: 3 # Maximum number of steps in a single episode
  frames_per_proc: 40 # The number of collected transitions to perform a PPO update will be frames_per_proc*number_envs
  discount: 0.99 # Discount factor used in PPO
  lr: 1e-6 # Learning rate used to finetune the LLM
  beta1: 0.9 # PPO's hyperparameter
  beta2: 0.999 # PPO's hyperparameter
  gae_lambda: 0.99 # PPO's hyperparameter
  entropy_coef: 0.01 # PPO's hyperparameter
  value_loss_coef: 0.5 # PPO's hyperparameter
  max_grad_norm: 0.5 # Maximum grad norm when updating the LLM's parameters
  adam_eps: 1e-5 # Adam's hyperparameter
  clip_eps: 0.2 # Epsilon used in PPO's losses clipping
  epochs: 4 # Number of PPO epochs performed on each set of collected trajectories
  batch_size: 16 # Minibatch size
  action_space: ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs: ??? # Where to store logs
  name_experiment: 'llm_mtrl' # Useful for logging
  name_model: 'T5small' # Useful for logging
  saving_path_model: ??? # Where to store the finetuned model
  name_environment: 'BabyAI-MixedTestLocal-v0' # BabiAI-Text's environment 
  load_embedding: true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads: false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  template_test: 1 # Which prompt template to use to log evolution of action's probability (Section C of our paper). Choices or [1, 2].
  nbr_obs: 3 # Number of past observation used in the prompt

For the config entries related to the Language Model itself, please see Lamorel.

Evaluating performances on test episodes

To evaluate the performance of an agent (e.g. a trained LLM, BabyAI's bot...) on test tasks, use post-training_tests.py and set the following config entries:

rl_script_args:
  seed: 1
  number_envs: 2 # Number of parallel envs to launch (steps will be synchronized, i.e. a step call will return number_envs observations)
  max_episode_steps: 3 # Maximum number of steps in a single episode
  action_space: ["turn_left","turn_right","go_forward","pick_up","drop","toggle"] # Possible actions for the agent
  saving_path_logs: ??? # Where to store logs
  name_experiment: 'llm_mtrl' # Useful for logging
  name_model: 'T5small' # Useful for logging
  saving_path_model: ??? # Where to store the finetuned model
  name_environment: 'BabyAI-MixedTestLocal-v0' # BabiAI-Text's environment 
  load_embedding: true # Whether trained embedding layers should be loaded (useful when lm_args.pretrained=False). Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  use_action_heads: false # Whether action heads should be used instead of scoring. Setting both this and use_action_heads to True (lm_args.pretrained=False) creates our NPAE agent.
  nbr_obs: 3 # Number of past observation used in the prompt
  number_episodes: 10 # Number of test episodes
  language: 'english' # Useful to perform the French experiment (Section H4)
  zero_shot: true # Whether the zero-shot LLM (i.e. without finetuning should be used)
  modified_action_space: false # Whether a modified action space (e.g. different from the one seen during training) should be used
  new_action_space: #["rotate_left","rotate_right","move_ahead","take","release","switch"] # Modified action space
  im_learning: false # Whether a LLM produced with Behavioral Cloning should be used
  im_path: "" # Path to the LLM learned with Behavioral Cloning
  bot: false # Whether the BabyAI's bot agent should be used

More Repositories

1

lamorel

Lamorel is a Python library designed for RL practitioners eager to use Large Language Models (LLMs).
Python
192
star
2

teachDeepRL

Python
81
star
3

rl-difference-testing

Simple tools for statistical analyses in RL experiments
Python
66
star
4

TeachMyAgent

TeachMyAgent is a testbed platform for Automatic Curriculum Learning methods in Deep RL.
Jupyter Notebook
66
star
5

explauto

An autonomous exploration library
Jupyter Notebook
64
star
6

geppg

Python
35
star
7

Imagine

Code for the paper Language as a Cognitive Tool to Imagine Goals in Curiosity Driven Exploration
Python
29
star
8

LLM-Culture

Code for the "Cultural evolution in populations of Large Language Models" paper
Python
28
star
9

curious

Implementation of CURIOUS: Intrinsically Motivated Modular Multi-Goal Reinforcement Learning
Python
27
star
10

adtool

Curiosity driven exploration of your complex system πŸ‘€
Python
24
star
11

Unsupervised_Goal_Space_Learning

Code to reproduce the results in the "Unsupervised Learning of Goal Spaces for Intrinsically Motivated Exploration"
Jupyter Notebook
21
star
12

automated_discovery_of_lenia_patterns

Python
20
star
13

Curiosity_Driven_Goal_Exploration

Code to reproduce the results of "Curiosity Driven Exploration of Learned Disentangled Goal Spaces"
Jupyter Notebook
19
star
14

cognitive-testbattery

JavaScript
18
star
15

EpidemiOptim

Code for the paper EpidemiOptim: A Toolbox for the Optimization of Control Policies in Epidemiological Models.
Jupyter Notebook
13
star
16

playground_env

Implementation of the Playground environment from the paper Language as a Cognitive Tool to Imagine Goals inCuriosity-Driven Exploration.
Python
10
star
17

spatio-temporal-language-transformers

Code for the paper Grounding Spatio-Temporal Language with Transformers
Python
8
star
18

social-ai

Jupyter Notebook
7
star
19

EcoEvoJax

Code for the paper "Eco-evolutionary Dynamics of Non-episodic Neuroevolution in Large Multi-agent Environments"
Jupyter Notebook
7
star
20

sbmltoodejax

TeX
6
star
21

Interactive_DeepRL_Demo

In-browser interactive demo of Deep Reinforcement Learning agents' adaptation to unknown 2D tasks.
JavaScript
5
star
22

architect-builder-abig

Code base for the paper Learning to Guide and to Be Guided in the Architect-Builder Problem
Python
5
star
23

kidlearn

5
star
24

picture-word-interference

Jupyter Notebook
4
star
25

sensorimotor-lenia-search

Jupyter Notebook
4
star
26

curious-exploration-of-grn-competencies

Jupyter Notebook
4
star
27

adtool_legacy

Python
4
star
28

naminggamesal

Active Learning Strategies in Naming Games
Python
4
star
29

self_calibration_BCI_plosOne_2015

Code and data associated to the paper: Exploiting task constraints for self-calibrated brain-machine interface control using error-related potentials. PLoS ONE, accepted for publication. 2015. http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0131491
TeX
4
star
30

gym_flowers

Custom module that overrides gym to integrate custom environments from the lab.
Python
2
star
31

forest

A python hierarchical configuration structure for scientific experiments
Python
2
star
32

SESM

Experimental study of curiosity and spontaneous exploration mechanisms in humans
JavaScript
2
star
33

fishing-baxter

Python
2
star
34

meta-acl

Jupyter Notebook
2
star
35

mlagents-environments

ML-Agents environments for Intrinsically Motivated Exploration
Python
2
star
36

clia

Site web du livre "C'est (pas) moi, c'est l'IA!" (Nathan)
1
star
37

rl_stats

Implementation of: A Hitchhiker's Guide to Statistical Comparisons of Reinforcement Learning Algorithms
Python
1
star
38

runaway

Mirrors the original repository @
Rust
1
star
39

autodiscjax

Python
1
star
40

chatgpt_5_minutes

ChatGPT explained in 5 minutes
JavaScript
1
star