• Stars
    star
    13,504
  • Rank 2,287 (Top 0.05 %)
  • Language
    Python
  • License
    MIT License
  • Created 8 months ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]

swe-agent.com

Website & Demoย  | ย  Discordย  | ย  Paper [coming April 2024]

๐Ÿ‘‹ Overview

SWE-agent turns LMs (e.g. GPT-4) into software engineering agents that can fix bugs and issues in real GitHub repositories.

On SWE-bench, SWE-agent resolves 12.29% of issues, achieving the state-of-the-art performance on the full test set.

SWE-agent is built and maintained by researchers from Princeton University.

โœจ Agent-Computer Interface (ACI)

We accomplish these results by designing simple LM-centric commands and feedback formats to make it easier for the LM to browse the repository, view, edit and execute code files. We call this an Agent-Computer Interface (ACI) and build the SWE-agent repository to make it easy to iterate on ACI design for repository-level coding agents.

Just like how typical language models requires good prompt engineering, good ACI design leads to much better results when using agents. As we show in our paper, a baseline agent without a well-tuned ACI does much worse than SWE-agent.

SWE-agent contains features that we discovered to be immensely helpful during the agent-computer interface design process:

  1. We add a linter that runs when an edit command is issued, and do not let the edit command go through if the code isn't syntactically correct.
  2. We supply the agent with a special-built file viewer, instead of having it just cat files. We found that this file viewer works best when displaying just 100 lines in each turn. The file editor that we built has commands for scrolling up and down and for performing a search within the file.
  3. We supply the agent with a special-built full-directory string searching command. We found that it was important for this tool to succintly list the matches- we simply list each file that had at least one match. Showing the model more context about each match proved to be too confusing for the model.
  4. When commands have an empty output we return a message saying "Your command ran successfully and did not produce any output."

Read our paper for more details [coming soon!].

@misc{yang2024sweagent,
      title={SWE-agent: Agent Computer Interfaces Enable Software Engineering Language Models}, 
      author={John Yang and Carlos E. Jimenez and Alexander Wettig and Kilian Lieret and Shunyu Yao and Karthik Narasimhan and Ofir Press},
      year={2024},
}

๐Ÿš€ Setup

๐ŸŽ๏ธ Express Setup + Run

You can run the software directly using Docker.

  1. Install Docker, then start Docker locally.
  2. Run docker pull sweagent/swe-agent:latest
  3. Add your API tokens to a file keys.cfg as explained below

Then run

# NOTE:
# This assumes that keys.cfg is in your current directory (else fix the path below)
# This command is equivalent to the script shown in the quickstart 
docker run --rm -it -v /var/run/docker.sock:/var/run/docker.sock \
  -v $(pwd)/keys.cfg:/app/keys.cfg \
  sweagent/swe-agent-run:latest \
  python run.py --image_name=sweagent/swe-agent:latest \
  --model_name gpt4 \
  --data_path https://github.com/pvlib/pvlib-python/issues/1603 \
  --config_file config/default_from_url.yaml  --skip_existing=False

Tip

  • For more information on the different API keys/tokens, see below.
  • If you're using docker on Windows, use -v //var/run/docker.sock:/var/run/docker.sock (double slash) to escape it (more information).
  • See the installation issues section for more help if you run into trouble.

๐Ÿ Setup with conda (developer version)

To install the development version:

  1. Install Docker, then start Docker locally.
  2. Clone this repository
  3. Install Miniconda, then create the swe-agent environment with conda env create -f environment.yml
  4. Activate using conda activate swe-agent.
  5. Run ./setup.sh to create the swe-agent docker image.
  6. Create a keys.cfg file at the root of this repository (see below)

Warning

Expect some issues with Windows (we're working on them). In the meantime, simply use Docker (see above). If you want the latest version, you can also build your own swe-agent-run container with the Dockerfile at the root of this repository by running docker build -t sweagent/swe-agent-run:latest .

Tip

If you run into docker issues, see the installation issues section for more help

๐Ÿ”‘ Add your API keys/tokens

For the conda setup, create a keys.cfg file at the root of this repository and populate it with your API keys.

GITHUB_TOKEN: 'GitHub Token Here (optional)'
OPENAI_API_KEY: 'OpenAI API Key Here if using OpenAI Model (optional)'

If you're using docker, pass the key with the -e option to the docker container.

๐Ÿ”Ž More options for different keys (click to unfold)

All keys are optional.

GITHUB_TOKEN: 'GitHub Token Here'
OPENAI_API_KEY: 'OpenAI API Key Here if using OpenAI Model'
ANTHROPIC_API_KEY: 'Anthropic API Key Here if using Anthropic Model'
TOGETHER_API_KEY: 'Together API Key Here if using Together Model'
AZURE_OPENAI_API_KEY: 'Azure OpenAI API Key Here if using Azure OpenAI Model'
AZURE_OPENAI_ENDPOINT: 'Azure OpenAI Endpoint Here if using Azure OpenAI Model'
AZURE_OPENAI_DEPLOYMENT: 'Azure OpenAI Deployment Here if using Azure OpenAI Model'
AZURE_OPENAI_API_VERSION: 'Azure OpenAI API Version Here if using Azure OpenAI Model'
OPENAI_API_BASE_URL: 'LM base URL here if using Local or alternative api Endpoint'

See the following links for tutorials on obtaining Anthropic, OpenAI, and Github tokens.

More installation tips

If you seem to be having issues with running docker

  • Make sure that you allow the use of the Docker socket. In Docker desktop, click Settings > Advanced > Allow the default Docker socket to be used (requires password)
  • If your docker installation uses a different socket, you might have to symlink them, see this command for example

Any remaining issues? Please open a GitHub issue!

๐Ÿ”ฅ Quickstart: Solve real-life GitHub issues!

Using this script, you can run SWE-agent on any GitHub issue!

python run.py --model_name gpt4 \
  --data_path https://github.com/pvlib/pvlib-python/issues/1603 \
  --config_file config/default_from_url.yaml

You can also apply to it to a local repository:

python run.py --model_name gpt4 \
  --data_path /path/to/my_issue.md \
  --repo_path /path/to/my/local/repo \
  --config_file config/default_from_url.yaml \
  --apply_patch_locally

Tip

  • Run python run.py --help to see all available options.
  • You can have the agent automatically open a PR if the issue has been solved by supplying the --open_pr flag. Please use this feature responsibly (on your own repositories or after careful consideration).
  • See the scripts/ folder for other useful scripts and details.
  • See the config/ folder for details about how you can define your own configuration!
  • See the sweagent/agent/ folder for details about the logic behind configuration based workflows.
  • See the sweagent/environment/ folder for details about the SWEEnv environment (interface + implementation).
  • See the trajectories/ folder for details about the output of run.py.
Ollama Support

Models served with an ollama server can be used by specifying --model with ollama:model_name and --host_url to point to the url used to serve ollama (http://localhost:11434 by default). See more details about using ollama here.

python run.py --model_name ollama:deepseek-coder:6.7b-instruct \
  --host_url http://localhost:11434 \
  --data_path https://github.com/pvlib/pvlib-python/issues/1603 \
  --config_file config/default_from_url.yaml

๐Ÿ’ฝ Benchmarking

There are two steps to the SWE-agent pipeline. First SWE-agent takes an input GitHub issue and returns a pull request that attempts to fix it. We call that step inference. The second step (currently, only available for issues in the SWE-bench benchmark) is to evaluate the pull request to verify that it has indeed fixed the issue.

Warning

At this moment, there are known issues with a small number of repositories that don't install properly for arm64 / aarch64 architecture computers. We're working on a fix, but if you'd like to run and evaluate on the entirety of SWE-bench, the easiest way is by using an x86 machine.

๐Ÿ‘ฉโ€๐Ÿ’ป Inference

Inference on any GitHub Issue: See above.

Inference on SWE-bench: Run SWE-agent on SWE-bench Lite and generate patches.

python run.py --model_name gpt4 \
  --per_instance_cost_limit 2.00 \
  --config_file ./config/default.yaml

If you'd like to run on a single issue from SWE-bench, use the --instance_filter option as follows:

python run.py --model_name gpt4 \
  --instance_filter marshmallow-code__marshmallow-1359

๐Ÿงช Evaluation

This step is only available for issues from the SWE-bench set. To evaluate generated pull requests:

cd evaluation/
./run_eval.sh <predictions_path>

Replace <predictions_path> with the path to the model's predictions, which should be generated from the Inference step. The <predictions_path> arguments should look like ../trajectories/<username>/<model>-<dataset>-<hyperparams>/all_preds.jsonl

  • See the evaluation/ folder for details about how evaluation works.

๐Ÿ’ซ Contributions

  • If you'd like to ask questions, learn about upcoming features, and participate in future development, join our Discord community!
  • If you'd like to contribute to the codebase, we welcome issues and pull requests!
  • If you'd like to see a post or tutorial about some topic, please let us know via an issue.

Contact person: John Yang and Carlos E. Jimenez (Email: {jy1682, carlosej}@princeton.edu).

๐Ÿชช License

MIT. Check LICENSE.

Tests (no LM) codecov pre-commit.ci status Markdown links

More Repositories

1

tree-of-thought-llm

[NeurIPS 2023] Tree of Thoughts: Deliberate Problem Solving with Large Language Models
Python
4,726
star
2

SimCSE

[EMNLP 2021] SimCSE: Simple Contrastive Learning of Sentence Embeddings https://arxiv.org/abs/2104.08821
Python
3,381
star
3

SWE-bench

[ICLR 2024] SWE-Bench: Can Language Models Resolve Real-world Github Issues?
Python
1,846
star
4

MeZO

[NeurIPS 2023] MeZO: Fine-Tuning Language Models with Just Forward Passes. https://arxiv.org/abs/2305.17333
Python
1,031
star
5

PURE

[NAACL 2021] A Frustratingly Easy Approach for Entity and Relation Extraction https://arxiv.org/abs/2010.12812
Python
788
star
6

LM-BFF

[ACL 2021] LM-BFF: Better Few-shot Fine-tuning of Language Models https://arxiv.org/abs/2012.15723
Python
714
star
7

SimPO

SimPO: Simple Preference Optimization with a Reference-Free Reward
Python
672
star
8

DensePhrases

[ACL 2021] Learning Dense Representations of Phrases at Scale; EMNLP'2021: Phrase Retrieval Learns Passage Retrieval, Too https://arxiv.org/abs/2012.12624
Python
605
star
9

LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning
Python
546
star
10

ALCE

[EMNLP 2023] Enabling Large Language Models to Generate Text with Citations. Paper: https://arxiv.org/abs/2305.14627
Python
450
star
11

LESS

[ICML 2024] LESS: Selecting Influential Data for Targeted Instruction Tuning
Jupyter Notebook
354
star
12

AutoCompressors

[EMNLP 2023] Adapting Language Models to Compress Long Contexts
Python
273
star
13

WebShop

[NeurIPS 2022] ๐Ÿ›’WebShop: Towards Scalable Real-World Web Interaction with Grounded Language Agents
Python
264
star
14

TRIME

[EMNLP 2022] Training Language Models with Memory Augmentation https://arxiv.org/abs/2205.12674
Python
192
star
15

intercode

[NeurIPS 2023 D&B] Code repository for InterCode benchmark https://arxiv.org/abs/2306.14898
Python
191
star
16

CoFiPruning

[ACL 2022] Structured Pruning Learns Compact and Accurate Models https://arxiv.org/abs/2204.00408
Python
188
star
17

OptiPrompt

[NAACL 2021] Factual Probing Is [MASK]: Learning vs. Learning to Recall https://arxiv.org/abs/2104.05240
Python
167
star
18

TransformerPrograms

[NeurIPS 2023] Learning Transformer Programs
Python
157
star
19

EntityQuestions

EMNLP'2021: Simple Entity-centric Questions Challenge Dense Retrievers https://arxiv.org/abs/2109.08535
Python
139
star
20

QuRating

[ICML 2024] Selecting High-Quality Data for Training Language Models
Python
137
star
21

CEPE

[ACL 2024] Long-Context Language Modeling with Parallel Encodings
Python
135
star
22

DinkyTrain

Princeton NLP's pre-training library based on fairseq with DeepSpeed kernel integration ๐Ÿšƒ
Python
111
star
23

LLMBar

[ICLR 2024] Evaluating Large Language Models at Evaluating Instruction Following
Python
108
star
24

MQuAKE

[EMNLP 2023] MQuAKE: Assessing Knowledge Editing in Language Models via Multi-Hop Questions
Jupyter Notebook
97
star
25

USACO

Can Language Models Solve Olympiad Programming?
Python
96
star
26

ProLong

Homepage for ProLong (Princeton long-context language models) and paper "How to Train Long-Context Language Models (Effectively)"
Python
82
star
27

NLProofS

EMNLP 2022: Generating Natural Language Proofs with Verifier-Guided Search https://arxiv.org/abs/2205.12443
Python
81
star
28

CharXiv

[NeurIPS 2024] CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs
Python
72
star
29

MADE

EMNLP 2021: Single-dataset Experts for Multi-dataset Question-Answering
Python
70
star
30

LM-Kernel-FT

A Kernel-Based View of Language Model Fine-Tuning https://arxiv.org/abs/2210.05643
Python
68
star
31

c-sts

[EMNLP 2023] C-STS: Conditional Semantic Textual Similarity
Python
66
star
32

calm-textgame

[EMNLP 2020] Keep CALM and Explore: Language Models for Action Generation in Text-based Games
Python
65
star
33

DataMUX

[NeurIPS 2022] DataMUX: Data Multiplexing for Neural Networks
Jupyter Notebook
58
star
34

ShortcutGrammar

EMNLP 2022: Finding Dataset Shortcuts with Grammar Induction https://arxiv.org/abs/2210.11560
Jupyter Notebook
57
star
35

LitSearch

A Retrieval Benchmark for Scientific Literature Search
Python
54
star
36

Collie

[ICLR 2024] COLLIE: Systematic Construction of Constrained Text Generation Tasks
Jupyter Notebook
52
star
37

EvalConvQA

[ACL 2022] Ditch the Gold Standard: Re-evaluating Conversational Question Answering
Python
45
star
38

HELMET

The HELMET Benchmark
Python
42
star
39

MABEL

EMNLP 2022: "MABEL: Attenuating Gender Bias using Textual Entailment Data" https://arxiv.org/abs/2210.14975
Python
37
star
40

LM-Science-Tutor

Python
34
star
41

rationale-robustness

NAACL 2022: Can Rationalization Improve Robustness? https://arxiv.org/abs/2204.11790
Python
26
star
42

PTP

Improving Language Understanding from Screenshots. Paper: https://arxiv.org/abs/2402.14073
Python
25
star
43

corpus-poisoning

[EMNLP 2023] Poisoning Retrieval Corpora by Injecting Adversarial Passages https://arxiv.org/abs/2310.19156
Python
25
star
44

InstructEval

[NAACL 2024 Findings] Evaluation suite for the systematic evaluation of instruction selection methods.
Jupyter Notebook
23
star
45

Edge-Pruning

Code and data for the paper "Finding Transformer Circuits with Edge Pruning".
Python
22
star
46

WhatICLLearns

[ACL 2023 Findings] What In-Context Learning โ€œLearnsโ€ In-Context: Disentangling Task Recognition and Task Learning
Python
21
star
47

Cognac

Repo for paper: Controllable Text Generation with Language Constraints
Python
19
star
48

lwm

We develop world models that can be adapted with natural language. Intergrating these models into artificial agents allows humans to effectively control these agents through verbal communication.
Python
18
star
49

ELIZA-Transformer

Representing Rule-based Chatbots with Transformers
Python
18
star
50

semsup

Semantic Supervision: Enabling Generalization over Output Spaces
Python
16
star
51

benign-data-breaks-safety

Python
16
star
52

SRL-NLC

Safe Reinforcement Learning with Natural Language Constraints
14
star
53

datamux-pretraining

MUX-PLMs: Pretraining LMs with Data Multiplexing
Python
14
star
54

XTX

[ICLR 2022 Spotlight] Multi-Stage Episodic Control for Strategic Exploration in Text Games
Python
13
star
55

MultilingualAnalysis

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"
Python
13
star
56

dyck-transformer

[ACL 2021] Self-Attention Networks Can Process Bounded Hierarchical Languages
Python
12
star
57

blindfold-textgame

[NAACL 2021] Reading and Acting while Blindfolded: The Need for Semantics in Text Game Agents
Python
12
star
58

align-mlm

Python
11
star
59

metric-wsd

NAACL'2021: Non-Parametric Few-Shot Learning for Word Sense Disambiguation
Python
10
star
60

semsup-xc

SemSup-XC: Semantic Supervision for Extreme Classification
Jupyter Notebook
10
star
61

Heuristic-Core

[ACL 2024] The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models - https://arxiv.org/abs/2403.03942
Python
9
star
62

CopyCat

Python
9
star
63

NegotiationToM

Code release for Improving Dialog Systems for Negotiation with Personality Modeling.
Python
7
star
64

CARETS

Python
6
star
65

SPARTAN

SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Python
5
star
66

il-scaling-in-games

Official code repo of "Scaling Laws for Imitation Learning in Single-Agent Games"
Python
5
star
67

attribute-tagging

[LaReL 2022] Towards an Enhanced, Faithful, and Adaptable Web Interaction Environment
Python
4
star
68

MoQA

Python
3
star