• Stars
    star
    164
  • Rank 225,063 (Top 5 %)
  • Language
    Python
  • License
    Other
  • Created almost 2 years ago
  • Updated about 2 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A multi-programming language benchmark for evaluating the performance of large language model of code.

Multi-Programming Language Evaluation of Large Language Models of Code (MultiPL-E)

MultiPL-E is a system for translating unit test-driven neural code generation benchmarks to new languages. We have used MultiPL-E to translate two popular Python benchmarks (HumanEval and MBPP) to 18 other programming languages.

For more information:

Versions

  • Version 0.4.0: Work in progress.

    • New languages: OCaml, MATLAB
    • Using .jsonl instead of .json for prompts
    • Several bugfixes to prompts
  • Version 0.3.0: used to evaluate StarCoder

    • This version corrects several bugs in prompts and test cases that resulted in lower pass@k rates for some of the statically typed languages. The most significant difference is that the pass@k for Java increases by about 2% on HumanEval.
  • Version 0.2.0: used to evaluate SantaCoder

More Repositories

1

10PL

10 papers that all PhD students in programming languages ought to know, for some value of 10
878
star
2

Stopify

A JS-to-JS compiler that makes it easier to build Web IDEs and compile to JS.
JavaScript
168
star
3

hopl-s2017

History of Programming Languages, Spring 2017
TeX
127
star
4

PromiseKeeper

Finding Broken Promises in Asynchronous JavaScript Programs
JavaScript
55
star
5

augur

Performant taint analysis for Node.js
JavaScript
44
star
6

CanItEdit

Can It Edit? Evaluating the Ability of Large Language Models to Follow Code Editing Instructions
Python
34
star
7

Ocelot

An IDE for JavaScript, without the "bad parts".
TypeScript
29
star
8

website

Source for PRL website
Racket
16
star
9

gradual-typing-performance

Racket
10
star
10

softscheme

Andrew Wright's soft type system for R4 Scheme
Scheme
6
star
11

ElementaryJS

JavaScript without the sharp edges
TypeScript
4
star
12

TypeWeaver

Artifact for the ECOOP 2023 paper: Do Machine Learning Models Produce TypeScript Types that Type Check?
TypeScript
4
star
13

gtp

NSF grant website
HTML
3
star
14

nuprl.github.io-archive

Build artifacts for prl.ccs.neu.edu - DON'T MAKE CHANGES HERE - go to
HTML
3
star
15

tag-sound

Source for "A Spectrum of Type Soundness and Performance", ICFP 2018
Racket
3
star
16

retic_performance

Performance evaluation of Reticulated Python
Racket
2
star
17

jankscripten

Rust
2
star
18

prl-seminar-junior

PRLSeminar, Junior: information, materials, schedule
2
star
19

TypeWhich

Customizable, solver-based type migration for the gradually-typed lambda calculus.
Rust
2
star
20

gfd-oopsla-2019

Paper, proofs, and code for "Complete Monitors for Gradual Types"
TeX
1
star
21

wimpl

Rust
1
star
22

formalizations-in-agda

1
star
23

fsp-benchmarks

1
star
24

softscheme-web

Scheme
1
star
25

prl-website

The website renderer for the Programming Research Laboratory at Northeastern University
Scheme
1
star