• Stars
    star
    208
  • Rank 189,015 (Top 4 %)
  • Language
  • Created over 1 year ago
  • Updated 7 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A collection of LLM with RL papers

LLM-with-RL-papers

A collection of LLM with RL related papers for instruction following, reasoning, decision making, continuous improvement and self improvement etc.

Review

[1] Yang S, Nachum O, Du Y, et al. Foundation Models for Decision Making: Problems, Methods, and Opportunities[J]. arXiv preprint arXiv:2303.04129, 2023.[link]

RL without Human Feedback

[1] Le H, Wang Y, Gotmare A D, et al. Coderl: Mastering code generation through pretrained models and deep reinforcement learning[J]. Advances in Neural Information Processing Systems, 2022, 35: 21314-21328. [link]

[2] Shojaee P, Jain A, Tipirneni S, et al. Execution-based Code Generation using Deep Reinforcement Learning[J]. arXiv preprint arXiv:2301.13816, 2023.[link]

[3] Uesato J, Kushman N, Kumar R, et al. Solving math word problems with process-and outcome-based feedback[J]. arXiv preprint arXiv:2211.14275, 2022.[link]

[4] Deng M, Wang J, Hsieh C P, et al. Rlprompt: Optimizing discrete text prompts with reinforcement learning[J]. arXiv preprint arXiv:2205.12548, 2022.[link]

[5] Carta T, Romac C, Wolf T, et al. Grounding large language models in interactive environments with online reinforcement learning[J]. arXiv preprint arXiv:2302.02662, 2023.[link]

[6] Mezghani L, Bojanowski P, Alahari K, et al. Think Before You Act: Unified Policy for Interleaving Language Reasoning with Actions[J]. arXiv preprint arXiv:2304.11063, 2023.[link]

[7] Dubois Y, Li X, Taori R, et al. AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback[J]. arXiv preprint arXiv:2305.14387, 2023. [link]

[8] Laskin M, Wang L, Oh J, et al. In-context reinforcement learning with algorithm distillation[J]. arXiv preprint arXiv:2210.14215, 2022.[link]

RLHF(RL with Human Feedback)

[1] Christiano P F, Leike J, Brown T, et al. Deep reinforcement learning from human preferences[J]. Advances in neural information processing systems, 2017, 30. [link]

[2] Ziegler D M, Stiennon N, Wu J, et al. Fine-tuning language models from human preferences[J]. arXiv preprint arXiv:1909.08593, 2019.[link])

[3] Stiennon N, Ouyang L, Wu J, et al. Learning to summarize with human feedback[J]. Advances in Neural Information Processing Systems, 2020, 33: 3008-3021.[link]

[4] Ouyang L, Wu J, Jiang X, et al. Training language models to follow instructions with human feedback[J]. Advances in Neural Information Processing Systems, 2022, 35: 27730-27744.[link]

[5] Nakano R, Hilton J, Balaji S, et al. Webgpt: Browser-assisted question-answering with human feedback[J]. arXiv preprint arXiv:2112.09332, 2021.[link]

[6] Bai Y, Jones A, Ndousse K, et al. Training a helpful and harmless assistant with reinforcement learning from human feedback[J]. arXiv preprint arXiv:2204.05862, 2022.[link]

[7] Bai Y, Kadavath S, Kundu S, et al. Constitutional AI: Harmlessness from AI Feedback[J]. arXiv preprint arXiv:2212.08073, 2022.[link]

[8] Ganguli D, Askell A, Schiefer N, et al. The capacity for moral self-correction in large language models[J]. arXiv preprint arXiv:2302.07459, 2023.[link]

[9] Zhu B, Jiao J, Jordan M I. Principled Reinforcement Learning with Human Feedback from Pairwise or $ K $-wise Comparisons[J]. arXiv preprint arXiv:2301.11270, 2023.[link]

Prompt-based but RL related

[1] Madaan A, Tandon N, Gupta P, et al. Self-refine: Iterative refinement with self-feedback[J]. arXiv preprint arXiv:2303.17651, 2023.[link]

[2] Yao S, Zhao J, Yu D, et al. React: Synergizing reasoning and acting in language models[J]. arXiv preprint arXiv:2210.03629, 2022.[link]

[3] Liu H, Sferrazza C, Abbeel P. Languages are rewards: Hindsight finetuning using human feedback[J]. arXiv preprint arXiv:2302.02676, 2023.[link]

[4] Zhang T, Liu F, Wong J, et al. The Wisdom of Hindsight Makes Language Models Better Instruction Followers[J]. arXiv preprint arXiv:2302.05206, 2023.[link]

[5] Chen X, Lin M, Schärli N, et al. Teaching Large Language Models to Self-Debug[J]. arXiv preprint arXiv:2304.05128, 2023.[link]

[6] Liu R, Yang R, Jia C, et al. Training Socially Aligned Language Models in Simulated Human Society[J]. arXiv preprint arXiv:2305.16960, 2023. [link]

[7] Chen L, Wang L, Dong H, et al. Introspective Tips: Large Language Model for In-Context Decision Making[J]. arXiv preprint arXiv:2305.11598, 2023.[link]

Code

[1] DeepSpeed Chat RLHF [link]

[2] TRLX [link]

[3] PKU-Beaver [link]

[4] ColossalAI [link]

More Repositories

1

Deep-Learning-Papers-Reading-Roadmap

Deep Learning papers reading roadmap for anyone who are eager to learn this amazing tech!
Python
37,895
star
2

Meta-Learning-Papers

Meta Learning / Learning to Learn / One Shot Learning / Few Shot Learning
2,612
star
3

LearningToCompare_FSL

PyTorch code for CVPR 2018 paper: Learning to Compare: Relation Network for Few-Shot Learning (Few-Shot Learning part)
Python
1,043
star
4

DRL-FlappyBird

Playing Flappy Bird Using Deep Reinforcement Learning (Based on Deep Q Learning DQN using Tensorflow)
Python
576
star
5

DDPG

Reimplementation of DDPG(Continuous Control with Deep Reinforcement Learning) based on OpenAI Gym + Tensorflow
Python
549
star
6

Deep-Reasoning-Papers

Recent Papers including Neural Symbolic Reasoning, Logical Reasoning, Visual Reasoning, planning and any other topics connecting deep learning and reasoning
295
star
7

wechat_jump_end_to_end

Playing Wechat Jump Game with End-to-End Convolutional Neural Networks
Python
180
star
8

DQN-Atari-Tensorflow

Simplest Version of playing Atari with Deep Q Learning in Tensorflow
Python
160
star
9

Lifelong-Learning-Paper-List

Lifelong/Continual Learning Paper List
150
star
10

Tracking-on-iOS

Test Several Visual Object Tracking Algorithm including CMT,TLD,CT on iOS
C++
73
star
11

a2c_cartpole_pytorch

advantage actor-critic reinforcement learning for openai gym cartpole
Python
64
star
12

PlaneWar

Written in Swift Using latest Sprite kit a copy version of wechat game 微信打飞机!
Swift
62
star
13

meta-critic-networks

Pytorch code for Arxiv Paper: Learning to learn: Meta-Critic Networks for Sample-Efficient Learning
Python
56
star
14

wechat_jump_end_to_end_train

Playing Wechat Jump Game with End-to-End Convolutional Neural Networks (Train code)
Python
45
star
15

DDPG-tensorflow

DDPG on OpenAI Gym Pendulum
Python
19
star
16

CLDrone

Autonomous Quadrotor Simulation Research Platform based on ROS,Gazebo,Pixhawk,DRCsim..
C++
14
star
17

PhysicsFieldsWorld

Swift
14
star
18

researchGPT

a collection of cookbooks to use ChatGPT for academic research.
Python
8
star
19

iOS_WiFi_EV3_Library

Objective-C
8
star
20

Gym-Flappy-Bird

A OpenAI Gym Env for Flappy Bird
Python
6
star
21

songrotek.github.io

Personal Website
HTML
3
star
22

EV3Car

iPhone Control ev3 car by bluetooth
Objective-C
2
star
23

EV3PortViewer

Objective-C
1
star
24

EV3Tank

Objective-C
1
star
25

SystemSoundServices

Objective-C
1
star
26

Character-Animation-AI-Paper-List

1
star
27

tinyCNNOSX

run tiny CNN on OSX
C++
1
star