• Stars
    star
    381
  • Rank 108,429 (Top 3 %)
  • Language
    Python
  • License
    Other
  • Created over 3 years ago
  • Updated 9 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A tool that AI automatically recommends commit messages.

commit-autosuggestions

Build Status License: Apache 2.0 PyPI Downloads

This is implementation of CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model. CommitBERT is accepted in ACL workshop : NLP4Prog. Have you ever hesitated to write a commit message? Now get a commit message from Artificial Intelligence!

Abstract

CodeBERT: A Pre-Trained Model for Programming and Natural Languages introduces a pre-trained model in a combination of Program Language and Natural Language(PL-NL). It also introduces the problem of converting code into natural language (Code Documentation Generation).

diff --git a/test.py b/test.py
new file mode 100644
index 0000000..d13f441
--- /dev/null
+++ b/test.py
@@ -0,0 +1,6 @@
+
+import torch
+import argparse
+
+def add(a, b):
+    return a + b
Recommended Commit Message : Add two arguments .

We can use CodeBERT to create a model that generates a commit message when code is added. However, most code changes are not made only by add of the code, and some parts of the code are deleted.

diff --git a/test.py b/test.py
index d13f441..1b1b82a 100644
--- a/test.py
+++ b/test.py
@@ -1,6 +1,3 @@

-import torch
-import argparse
-
 def add(a, b):
     return a + b
Recommended Commit Message : Remove unused imports

To solve this problem, use a new embedding called patch_type_embeddings that can distinguish added and deleted, just as the XLM(Lample et al, 2019) used language embeddeding. (1 for added, 2 for deleted.)

Language support

Language Added Diff Data(Only Diff) Weights
Python βœ… βœ… 423k Link
JavaScript βœ… βœ… 514k Link
Go ⬜ ⬜ ⬜ ⬜
JAVA ⬜ ⬜ ⬜ ⬜
Ruby ⬜ ⬜ ⬜ ⬜
PHP ⬜ ⬜ ⬜ ⬜
  • βœ… β€” Supported
  • ⬜ - N/A ️

We plan to slowly conquer languages that are not currently supported. However, I also need to use expensive GPU instances of AWS or GCP to train about the above languages. Please do a simple sponsor for this! Add data is CodeSearchNet dataset.

Quick Start

To run this project, you need a flask-based inference server (GPU) and a client (commit module). If you don't have a GPU, don't worry, you can use it through Google Colab.

1. Run flask pytorch server.

Prepare Docker and Nvidia-docker before running the server.

1-a. If you have GPU machine.

Serve flask server with Nvidia Docker. Check the docker tag for programming language in here.

Language Tag
Python py
JavaScript js
Go go
JAVA java
Ruby ruby
PHP php
$ docker run -it -d --gpus 0 -p 5000:5000 graykode/commit-autosuggestions:{language}
1-b. If you don't have GPU machine.

Even if you don't have a GPU, you can still serve the flask server by using the ngrok setting in commit_autosuggestions.ipynb.

2. Start commit autosuggestion with Python client module named commit.

First, install the package through pip.

$ pip install commit

Set the endpoint for the flask server configured in step 1 through the commit configure command. (For example, if the endpoint is http://127.0.0.1:5000, set it as follows: commit configure --endpoint http://127.0.0.1:5000)

$ commit configure --help       
Usage: commit configure [OPTIONS]

Options:
  --profile TEXT   unique name for managing each independent settings
  --endpoint TEXT  endpoint address accessible to the server (example :
                   http://127.0.0.1:5000/)  [required]

  --help           Show this message and exit.

All setup is done! Now, you can get a commit message from the AI with the command commit.

$ commit --help          
Usage: commit [OPTIONS] COMMAND [ARGS]...

Options:
  --profile TEXT       unique name for managing each independent settings
  -f, --file FILENAME  patch file containing git diff (e.g. file created by
                       `git add` and `git diff --cached > test.diff`)

  -v, --verbose        print suggested commit message more detail.
  -a, --autocommit     automatically commit without asking if you want to
                       commit

  --help               Show this message and exit.

Commands:
  configure

Training detail

Refer How to train for your lint style. This allows you to re-fine tuning to your repository's commit lint style.

Contribution

You can contribute anything, even a typo or code in the article. Don't hesitate!!. Versions are managed only within the branch with the name of each version. After being released on Pypi, it is merged into the master branch and new development proceeds in the upgraded version branch.

Author

Tae Hwan Jung(@graykode)

Citation

@article{jung2021commitbert,
  title={CommitBERT: Commit Message Generation Using Pre-Trained Programming Language Model},
  author={Jung, Tae-Hwan},
  journal={arXiv preprint arXiv:2105.14242},
  year={2021}
}

More Repositories

1

nlp-tutorial

Natural Language Processing Tutorial for Deep Learning Researchers
Jupyter Notebook
13,597
star
2

nlp-roadmap

ROADMAP(Mind Map) and KEYWORD for students those who have interest in learning NLP
3,160
star
3

distribution-is-all-you-need

The basic distribution probability Tutorial for Deep Learning Researchers
Python
1,596
star
4

gpt-2-Pytorch

Simple Text-Generator with OpenAI gpt-2 Pytorch Implementation
Python
927
star
5

xlnet-Pytorch

Simple XLNet implementation with Pytorch Wrapper
Jupyter Notebook
569
star
6

toeicbert

TOEIC(Test of English for International Communication) solving using pytorch-pretrained-BERT model.
Python
115
star
7

modelsummary

All Model summary in PyTorch similar to `model.summary()` in Keras
Python
84
star
8

matorage

Matorage is tensor(multidimensional matrix) object storage manager for deep learning framework(Pytorch, Tensorflow V2, Keras)
Python
72
star
9

KorQuAD-beginner

Guide KorQuAD upload to leaderboard (EM 68.947 / F1 88.468) model which only use BERT-multilingual(single)
Python
41
star
10

aws-kubeflow

A guideline for basic use and installation of kubeflow in AWS.
Jupyter Notebook
37
star
11

vision-tutorial

Computer Vision Tutorial for Deep Learning Researchers
Python
33
star
12

DeepLearning-Study

This is repository for DeepLearning Study in Kyung Hee University
Python
27
star
13

DAC

Deep Adaptive Image Clustering Paper Implementation
Jupyter Notebook
25
star
14

horovod-ansible

Create Horovod cluster easily using Ansible
HCL
22
star
15

aws-kubeadm-terraform

create kubernetes cluster on AWS only typing 'terraform apply' on 3 minutes.
HCL
16
star
16

kubernetes-glusterfs-aws

file system clustering as glusterfs in kubernetes environment on aws platform
Shell
13
star
17

mlm-pipeline

mlm-pipeline is a cloud architecture that preprocesses the masked language model (mlm)
Python
10
star
18

linux0.11-kernel-code-review

The old Linux kernel source ver 0.11 review with line by line for OS lecture.
C
9
star
19

khuthon2018

λ”₯λŸ¬λ‹μ„ μ‚¬μš©ν•œ 맛집 뢄석 - 2018λ…„ 쿠톀(해컀톀)
JavaScript
8
star
20

projects

MY PROJECT LIST AT A GLANCE πŸŒˆπŸš€πŸ¦„
7
star
21

graykode.github.io

graykode's blog
Shell
4
star
22

ALGORITHM-MASTER

I LOVE ALGORITHM
C++
4
star
23

nlpblock

Use Abstractions Level Block for NLP with Pytorch
Python
3
star
24

intellij-foundry

Kotlin
3
star
25

ml-kubernetes-tutorial

very basic tutorial for who interesting in Machine Learning Serving with Docker, Kubernetes, Kubeflow
3
star
26

mnist-flow

This Project is only repository for solving AI Engineer Party
Python
3
star
27

modelaverage

tf-keras, make the average of model weight in same model.
Python
3
star
28

nonce-python

2019 github seminar in D.COM
HTML
2
star
29

nlp-advance

Simple Paper Implementation Code about all model after Attention is all you need(Transformer)
2
star
30

graykode

1
star
31

ohora

Jupyter Notebook
1
star