Github Copilot

This is an analysis of the Github Copilot extension for Visual Studio Code.

Under macOS the VS Code Extensions are located in the following directory:

~/.vscode/extensions

Analysis of version 1.92.177

For an analysis of Copilot Chat see README_COPILOT_CHAT.md.

Prompts

The Github Copilot extension generates three types of prompts.

Prompt 1: Single file

We start with the simplest case with only one file file1.py.

filename: file1.py
file content: # Print hello, world

If the user presses enter after # Print hello, world, the extension generates the following prompt:

# Path: file1.py
# Print hello, world

The path to the file is part of the prompt.

Prompt 2: Multiple files

Now let's consider a slightly more complex two-file case where file file2.py is edited.

filename: file1.py
file content: # Print hello, world
filename: file2.py
file content: # Print he

In this case, the extension generates the following prompt:

# Path: file2.py
# Compare this snippet from file1.py:
# # Print hello, world
# Print he

Files with similar content are also included in the prompt.

Prompt 3: Fill in the middle

Copilot supports Fill in the Middle. That means the extension sends the code before and after the cursor position to the model.

filename: file3.py
file content: # Test prefix\n# Test suffix

If the user presses enter after # Test prefix, the extension generates the prefix

# Path: file3.py
# Test prefix

and the suffix

# Test suffix

Communication

Language model

To generate a completion, the extension sends a POST request to the endpoint https://copilot-proxy.githubusercontent.com/v1/engines/copilot-codex/completions.

After sending the request, the endpoint returns the following response.

Telemetry

The Github Copilot extension sends telemetry data to the endpoint https://dc.services.visualstudio.com:

Deeper Analysis

Vocabulary

The extension contains two vocabulary files

Filename	Vocabulary Size	Comment
`vocab_cushman001.bpe`	50,276	This vocabulary is based on the GPT-2 vocabulary
`vocab_cushman002.bpe`	100,000	This vocabulary is new and not based on the GPT-2 vocabulary anymore

Min prompt chars

The length of the prompt has to be >= 10 characters before the prompt is sent to the model.

if ((_ > 0 ? n.length : d) < t.MIN_PROMPT_CHARS)
    return t._contextTooShort;

File information

The following information is collected about the file being edited:

const m = {
    uri: d.toString(), // The absolute path of the file
    source: t, // Content of the file
    offset: n, // The offset of the cursor
    relativePath: u, // The relative path of the file
    languageId: p // The programming language of the file
}

Neighbor Files

The extension remembers the files that have been accessed before. The function getNeighborFiles calls the function truncateDocs. Input of the function truncateDocs are the files sorted by access time.

When the combined size of all files exceeds 200,000, any additional files will be disregarded. The function truncateDocs returns a truncated list of files.

Copilot Performance

We have evaluated the copilot model cushman-ml with the HumanEval dataset. Out of 164 programming problems, the model can solve 56.10%.

Model name	Pass@1	Date	Comment
code-cushman-001	32.93%	2022-10-23	https://openai.com/api/
code-davinci-002	46.95%	2022-10-23	https://openai.com/api/
cushman-ml	56.10%	2022-10-23	Copilot

Completions of the evaluation run: 2022-10-23-samples-cushman-ml.jsonl

saschaschramm/github-copilot

saschaschramm

Reviews

Repository Details