🧵 Table of Contents
🚀 Leaderboard
Leaderboard (Sort by HumanEval Pass@1)
Model | Params | HumanEval | MBPP | HF | Source |
---|---|---|---|---|---|
GPT-4 + Reflexion | ? | 91.0 | 77.1 | paper | |
GPT-4 (latest) | ? | 84.1 | 80.0 | github | |
DeepSeek-Coder-Instruct | 33B | 79.3 | 70.0 | ckpt | github |
DeepSeek-Coder-Instruct | 7B | 78.6 | 65.4 | ckpt | github |
GPT-3.5-Turbo (latest) | ? | 76.2 | 70.8 | github | |
Code-Llama | 34B | 62.2 | 61.2 | paper | |
Pangu-Coder2 | 15B | 61.6 | paper | ||
WizardCoder-15B | 15B | 57.3 | 51.8 | ckpt | paper |
Code-Davinci-002 | ? | 47.0 | paper | ||
StarCoder-15B (Prompted) | 15B | 40.8 | 49.5 | ckpt | paper |
PaLM 2-S | ? | 37.6 | 50.0 | paper | |
PaLM-Coder-540B | 540B | 36.0 | 47.0 | paper | |
InstructCodeT5+ | 16B | 35.0 | paper | ||
StarCoder-15B | 15B | 33.6 | 52.7 | ckpt | paper |
Code-Cushman-001 | ? | 33.5 | 45.9 | paper | |
CodeT5+ | 16B | 30.9 | paper | ||
LLaMA2-70B | 70B | 29.9 | ckpt | paper | |
CodeGen-16B-Mono | 16B | 29.3 | 35.3 | paper | |
PaLM-540B | 540B | 26.2 | 36.8 | paper | |
LLaMA-65B | 65B | 23.7 | 37.7 | paper | |
CodeGeeX | 13B | 22.9 | 24.4 | paper | |
LLaMA-33B | 33B | 21.7 | 30.2 | paper | |
CodeGen-16B-Multi | 16B | 18.3 | 20.9 | paper | |
AlphaCode | 1.1B | 17.1 | paper |
💡 Toolkit:
- bigcode-evaluation-harness: A framework for the evaluation of autoregressive code generation language models.
- multilingual-code-evals: Multilingual Code Models Evaluation.
📚 Paper
▶️ Pre-Training
-
Evaluating Large Language Models Trained on Code
Preprint
[Paper] Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto. et al. 2021.07
-
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
ICLR23
[Paper] Erik Nijkamp, Bo Pang, Hiroaki Hayashi, Lifu Tu, Huan Wang, Yingbo Zhou, Silvio Savarese, Caiming Xiong. 2022.03
-
SantaCoder: don't reach for the stars!
Preprint
[Paper] Loubna Ben Allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff. et al. 2023.01
-
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Evaluations on HumanEval-X
Preprint
[Paper] Qinkai Zheng, Xiao Xia, Xu Zou, Yuxiao Dong, Shan Wang, Yufei Xue, Zihan Wang, Lei Shen, Andi Wang, Yang Li, Teng Su, Zhilin Yang, Jie Tang. 2023.03
-
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
ICLR23
[Paper] Erik Nijkamp, Hiroaki Hayashi, Caiming Xiong, Silvio Savarese, Yingbo Zhou. 2023.05
-
StarCoder: may the source be with you!
Preprint
[Paper] Raymond Li, Loubna Ben Allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou. et al. 2023.05
-
CodeT5+: Open Code Large Language Models for Code Understanding and Generation
Preprint
[Paper] Yue Wang, Hung Le, Akhilesh Deepak Gotmare, Nghi D.Q. Bui, Junnan Li, Steven C.H. Hoi. 2023.05
-
Textbooks Are All You Need
Preprint
[Paper] Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi. et al. 2023.06
-
Code Llama: Open Foundation Models for Code
Preprint
[Paper] Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat. et al. 2023.08
▶️ Instruction Tuning
-
WizardCoder: Empowering Code Large Language Models with Evol-Instruct
Preprint
[Paper] Ziyang Luo, Can Xu, Pu Zhao, Qingfeng Sun, Xiubo Geng, Wenxiang Hu, Chongyang Tao, Jing Ma, Qingwei Lin, Daxin Jiang. 2023.07
-
OctoPack: Instruction Tuning Code Large Language Models
Preprint
[Paper][Repo] Niklas Muennighoff, Qian Liu, Armel Zebaze, Qinkai Zheng, Binyuan Hui, Terry Yue Zhuo, Swayam Singh, Xiangru Tang, Leandro von Werra, Shayne Longpre. 2023.08
▶️ Alignment with Feedback
-
CodeRL: Mastering Code Generation through Pretrained Models and Deep Reinforcement Learning
NeurIPS22
[Paper] Hung Le, Yue Wang, Akhilesh Deepak Gotmare, Silvio Savarese, Steven C.H. Hoi. 2022.07
-
Execution-based Code Generation using Deep Reinforcement Learning
TMLR23
[Paper] Parshin Shojaee, Aneesh Jain, Sindhu Tipirneni, Chandan K. Reddy. 2023.01
-
RLTF: Reinforcement Learning from Unit Test Feedback
Preprint
[Paper] Jiate Liu, Yiqin Zhu, Kaiwen Xiao, Qiang Fu, Xiao Han, Wei Yang, Deheng Ye. 2023.07
-
PanGu-Coder2: Boosting Large Language Models for Code with Ranking Feedback
Preprint
[Paper] Bo Shen, Jiaxin Zhang, Taihong Chen, Daoguang Zan, Bing Geng, An Fu, Muhan Zeng, Ailun Yu, Jichuan Ji, Jingyang Zhao, Yuenan Guo, Qianxiang Wang. 2023.07
▶️ Prompting
-
CodeT: Code Generation with Generated Tests
ICLR23
[Paper] Bei Chen, Fengji Zhang, Anh Nguyen, Daoguang Zan, Zeqi Lin, Jian-Guang Lou, Weizhu Chen. 2022.07
-
Coder Reviewer Reranking for Code Generation
ICML23
[Paper] Tianyi Zhang, Tao Yu, Tatsunori B Hashimoto, Mike Lewis, Wen-tau Yih, Daniel Fried, Sida I Wang. 2022.11
-
LEVER: Learning to Verify Language-to-Code Generation with Execution
ICML23
[Paper] Ansong Ni, Srini Iyer, Dragomir Radev, Ves Stoyanov, Wen-tau Yih, Sida I. Wang, Xi Victoria Lin. 2023.02
-
Teaching Large Language Models to Self-Debug
Preprint
[Paper] Xinyun Chen, Maxwell Lin, Nathanael Schärli, Denny Zhou. 2023.06
-
Demystifying GPT Self-Repair for Code Generation
Preprint
[Paper] Theo X. Olausson, Jeevana Priya Inala, Chenglong Wang, Jianfeng Gao, Armando Solar-Lezama. 2023.06
-
SelfEvolve: A Code Evolution Framework via Large Language Models
Preprint
[Paper] Shuyang Jiang, Yuhao Wang, Yu Wang. 2023.06
▶️ Evaluation & Benchmark
-
Measuring Coding Challenge Competence With APPS
NeurIPS21
Named APPS
[Paper][Repo] Dan Hendrycks, Steven Basart, Saurav Kadavath, Mantas Mazeika, Akul Arora, Ethan Guo, Collin Burns, Samir Puranik, Horace He, Dawn Song, Jacob Steinhardt. 2021.05
-
Program Synthesis with Large Language Models
Preprint
Named MBPP
[Paper] Jacob Austin, Augustus Odena, Maxwell Nye, Maarten Bosma, Henryk Michalewski, David Dohan, Ellen Jiang, Carrie Cai, Michael Terry, Quoc Le, Charles Sutton. 2021.08
-
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
ICML23
[Paper] Yuhang Lai, Chengxi Li, Yiming Wang, Tianyi Zhang, Ruiqi Zhong, Luke Zettlemoyer, Scott Wen-tau Yih, Daniel Fried, Sida Wang, Tao Yu. 2022.11
-
RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems
Preprint
[Paper] Tianyang Liu, Canwen Xu, Julian McAuley. 2023.06
-
Can ChatGPT replace StackOverflow? A Study on Robustness and Reliability of Large Language Model Code Generation
Preprint
[Paper] Li Zhong, Zilong Wang. 2023.08
▶️ Using LLMs while coding
-
Awesome-DevAI: A list of resources about using LLMs while building software
Awesome
[Repo] Ty Dunn, Nate Sesti. 2023.10
🙌 Contributors
This is an active repository and your contributions are always welcome! If you have any question about this opinionated list, do not hesitate to contact me [email protected]
.
Cite as
@software{awesome-code-llm,
author = {Binyuan Hui},
title = {An awesome and curated list of best code-LLM for research},
howpublished = {\url{https://github.com/huybery/Awesome-Code-LLM}},
year = 2023,
}
Acknowledgement
This project is inspired by Awesome-LLM.