CodeTrans
CodeTrans is providing state of the art pre-trained models for source code. CodeTrans was trained on several Nvidia RTX 8000 GPUs and couple of Google TPUs using various State of the Art Transformers Models.
Take a look into our paper CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing for more information about our work.
This repository will be updated regulary with new pre-trained models for source code as part of supporting software engineering community in general, and Source Code for Covid-19 research specifically.
Table of Contents
⌛️ Models Availability
All CodeTrans original Tensorflow checkpoints are downloadable from this dropbox folder and the pytorch checkpoints in the Hugging Face model hub.
You can download all the datasets used in this research from dropbox folder.
🚀 Usage
How to use CodeTrans:
🤖 Feature Extraction (FE):
coming soon.
💥 Fine Tuning (FT):
coming soon.
🧠 Prediction:
Please check: Prediction Section. More information coming soon.
⚗️ Code Sequences Generation:
coming soon.
🧐 Visualization:
coming soon.
📈 Benchmark:
coming soon.
📊 Expected Results
💻 Function Documentation Generation (Bleu):
Language / Model | Python | Java | Go | Php | Ruby | JavaScript |
---|---|---|---|---|---|---|
CodeTrans-ST-Small | 17.31 | 16.65 | 16.89 | 23.05 | 9.19 | 13.7 |
CodeTrans-ST-Base | 16.86 | 17.17 | 17.16 | 22.98 | 8.23 | 13.17 |
CodeTrans-TF-Small | 19.93 | 19.48 | 18.88 | 25.35 | 13.15 | 17.23 |
CodeTrans-TF-Base | 20.26 | 20.19 | 19.50 | 25.84 | 14.07 | 18.25 |
CodeTrans-TF-Large | 20.35 | 20.06 | 19.54 | 26.18 | 14.94 | 18.98 |
CodeTrans-MT-Small | 19.64 | 19.00 | 19.15 | 24.68 | 14.91 | 15.26 |
CodeTrans-MT-Base | 20.39 | 21.22 | 19.43 | 26.23 | 15.26 | 16.11 |
CodeTrans-MT-Large | 20.18 | 21.87 | 19.38 | 26.08 | 15.00 | 16.23 |
CodeTrans-MT-TF-Small | 19.77 | 20.04 | 19.36 | 25.55 | 13.70 | 17.24 |
CodeTrans-MT-TF-Base | 19.77 | 21.12 | 18.86 | 25.79 | 14.24 | 18.62 |
CodeTrans-MT-TF-Large | 18.94 | 21.42 | 18.77 | 26.20 | 14.19 | 18.83 |
State of the art | 19.06 | 17.65 | 18.07 | 25.16 | 12.16 | 14.90 |
💻 Source Code Summarization (Bleu):
Language / Model | Python | SQL | C# |
---|---|---|---|
CodeTrans-ST-Small | 8.45 | 17.55 | 19.74 |
CodeTrans-ST-Base | 9.12 | 15.00 | 18.65 |
CodeTrans-TF-Small | 10.06 | 17.71 | 20.40 |
CodeTrans-TF-Base | 10.94 | 17.66 | 21.12 |
CodeTrans-TF-Large | 12.41 | 18.40 | 21.43 |
CodeTrans-MT-Small | 13.11 | 19.15 | 22.39 |
CodeTrans-MT-Base | 13.37 | 19.24 | 23.20 |
CodeTrans-MT-Large | 13.24 | 19.40 | 23.57 |
CodeTrans-MT-TF-Small | 12.10 | 18.25 | 22.03 |
CodeTrans-MT-TF-Base | 10.64 | 16.91 | 21.40 |
CodeTrans-MT-TF-Large | 12.14 | 19.98 | 21.10 |
State of the art | -- | 18.40 | 20.50 |
💻 Code Comment Generation (Bleu):
Language / Model | Java |
---|---|
CodeTrans-ST-Small | 37.98 |
CodeTrans-ST-Base | 38.07 |
CodeTrans-TF-Small | 38.56 |
CodeTrans-TF-Base | 39.06 |
CodeTrans-TF-Large | 39.50 |
CodeTrans-MT-Small | 20.15 |
CodeTrans-MT-Base | 27.44 |
CodeTrans-MT-Large | 34.69 |
CodeTrans-MT-TF-Small | 38.37 |
CodeTrans-MT-TF-Base | 38.90 |
CodeTrans-MT-TF-Large | 39.25 |
State of the art | 38.17 |
💻 Commit Message Generation (Bleu):
Language / Model | Java |
---|---|
CodeTrans-ST-Small | 39.61 |
CodeTrans-ST-Base | 38.67 |
CodeTrans-TF-Small | 44.22 |
CodeTrans-TF-Base | 44.17 |
CodeTrans-TF-Large | 44.41 |
CodeTrans-MT-Small | 36.17 |
CodeTrans-MT-Base | 39.25 |
CodeTrans-MT-Large | 41.18 |
CodeTrans-MT-TF-Small | 43.96 |
CodeTrans-MT-TF-Base | 44.19 |
CodeTrans-MT-TF-Large | 44.34 |
State of the art | 32.81 |
💻 API Sequence Recommendation (Bleu):
Language / Model | Java |
---|---|
CodeTrans-ST-Small | 68.71 |
CodeTrans-ST-Base | 70.45 |
CodeTrans-TF-Small | 68.90 |
CodeTrans-TF-Base | 72.11 |
CodeTrans-TF-Large | 73.26 |
CodeTrans-MT-Small | 58.43 |
CodeTrans-MT-Base | 67.97 |
CodeTrans-MT-Large | 72.29 |
CodeTrans-MT-TF-Small | 69.29 |
CodeTrans-MT-TF-Base | 72.89 |
CodeTrans-MT-TF-Large | 73.39 |
State of the art | 54.42 |
💻 Programming Language and Synthesis (Accuracy):
Language / Model | LISP |
---|---|
CodeTrans-ST-Small | 89.43 |
CodeTrans-ST-Base | 89.65 |
CodeTrans-TF-Small | 90.30 |
CodeTrans-TF-Base | 90.24 |
CodeTrans-TF-Large | 90.21 |
CodeTrans-MT-Small | 82.88 |
CodeTrans-MT-Base | 86.99 |
CodeTrans-MT-Large | 90.27 |
CodeTrans-MT-TF-Small | 90.31 |
CodeTrans-MT-TF-Base | 90.30 |
CodeTrans-MT-TF-Large | 90.17 |
State of the art | 85.80 |
❤️ Community and Contributions
The CodeTrans project is a open source project supported by various partner companies and research institutions. We are committed to share all our pre-trained models and knowledge. We are more than happy if you could help us on sharing new ptrained models, fixing bugs, proposing new feature, improving our documentation, spreading the word, or support our project.
📫 Have a question?
We are happy to hear your question in our issues page CodeTrans! Obviously if you have a private question or want to cooperate with us, you can always reach out to us directly via our RostLab email
🤝 Found a bug?
Feel free to file a new issue with a respective title and description on the the CodeTrans repository. If you already found a solution to your problem, we would love to review your pull request!.
✅ Requirements
For prediction, Text to Text libraray is needed. For source code feature extraction or fine-tuning our pre-trained models, Pytorch and Transformers library from huggingface is needed. For model visualization, you need to install BertViz library.
🤵 Team
- Technical University of Munich:
Ahmed Elnaggar | Wei Ding | Florian Matthes | Burkhard Rost |
---|---|---|---|
- Google:
Llion Jones |
---|
- Nvidia:
Tom Gibbs | Tamas Feher | Christoph Angerer |
---|---|---|
💰 Sponsors
Nvidia | Software Campus | ||
---|---|---|---|
📘 License
The CodeTrans pretrained models are released under the under terms of the MIT License.
✏️ Citation
If you use this code or our pretrained models for your publication, please cite the original paper:
@misc{elnaggar2021codetrans,
title={CodeTrans: Towards Cracking the Language of Silicon's Code Through Self-Supervised Deep Learning and High Performance Computing},
author={Ahmed Elnaggar and Wei Ding and Llion Jones and Tom Gibbs and Tamas Feher and Christoph Angerer and Silvia Severini and Florian Matthes and Burkhard Rost},
year={2021},
eprint={2104.02443},
archivePrefix={arXiv},
primaryClass={cs.SE}
}