• Stars
    star
    266
  • Rank 154,103 (Top 4 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated 20 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NaturalCC: An Open-Source Toolkit for Code Intelligence

NaturalCC

NaturalCC is a sequence modeling toolkit that allows researchers and developers to train custom models for many software engineering tasks, e.g., code summarization, code retrieval, code completion, code clone detection and type inference. Our vision is to bridge the gap between programming language and natural language through machine learning techniques.

Version Python pytorch license


🔖 News

  • [May 10] We have merged the source code of "What Do They Capture? - A Structural Analysis of Pre-Trained Language Models for Source Code" into NaturalCC.

Features

  • A collection of code corpus with data preprocessing
  • Performance benchmark
  • Mixed precision training
    • Nvidia APEX
    • Automatic Mixed Precision
  • Multi-GPU training
  • Better logging output
  • Various Implementations:
    • tensorflow gradient clipping
    • optimizers or learning schedulers
    • baseline models
    • binary data formats

🚀 Installation

Requirements

  • PyTorch version >= 1.6.0
  • Python version >= 3.6
  • GCC/G++ > 5.0
  • For training new models, you'll also need an NVIDIA GPU, NCCL and Cuda Toolkit installed.
  • (optional) For faster training, you need to install NVIDIA's apex library.

1. Install prerequisite libraries

git clone https://github.com/CGCL-codes/naturalcc && cd naturalcc
pip install -r requirements.txt

Once you installed prerequisite libraries, you can check them via python -m env_test

2. Build or install NaturalCC

Export your NaturalCC cache directory (data and models will be saved in this directory) to user variables(~/.bashrc or ~/.zshrc in Linux, ~/.zsh_profile or ~/.bash_profile in macOS).

# Linux
echo "export NCC=<path_to_store ncc_data>" >> ~/.bashrc
# macOS
echo "export NCC=<path_to_store ncc_data>" >> ~/.bash_profile

Note: PyCharm cannot get environment variables and, therefore, we recommend you to register your NCC variable at ncc/__init__.py.

Compile Cython files to accelerate programs and register NaturalCC into your pip list

# compile for debug
# python setup.py build_ext --inplace
# install 
pip install --editable ./

3. Half precision computation (optional)

NaturalCC supports half precision training.

  • If your Pytorch.__version__ < 1.6.0 and nvcc -V is runnable, please install apex.
  • Otherwise, use Automatic Mixed Precision (AMP). Available Now (set amp: 1 in yaml file, An example).

4. Install GCC/G++ with conda (if you do not have permission)

Since NCC is build via Cython, your GCC/G++ version should be greater than 4.9. If you have the root permission, update GCC/G++; otherwise, install GCC/G++ with conda.

# install GCC/G++ with conda
conda install -c anaconda gxx_linux-64
conda install -c conda-forge gcc_linux-64
cd ~/anaconda/envs/XXX/bin
ln -s x86_64-conda_cos6-linux-gnu-gcc gcc
ln -s x86_64-conda_cos6-linux-gnu-g++ g++
# check
conda deactivate
conda activate XXX
>> type "gcc/g++ -v" in terminals

📚 Dataset

Currently, we have processed the following datasets:

🤖 Implementations

Code retrieval (search)

Code completion

Heterogeneous mapping

Code summarization

Structural Analysis of Pre-Trained Language Models for Source Code

📋 Experiments

Code Summarization

Dataset: Python (Wan et al.)

BLEU-4 METEOR ROUGE-L Cost Logs
Seq2Seq+Attn 25.57 14.40 39.41 0.09s/b click here
Tree2Seq+Attn 23.35 12.59 36.49 0.48s/b click here
Transformer 30.64 17.65 44.59 0.26s/b click here
Transformer+RPE 31.57 17.74 45.18 0.27s/b click here
PLBART 32.71 18.13 46.05 0.80s/b TBC

Code Retrieval

Dataset: CodeSearchNet (Husain et al.)

MRR Go Java JS PHP Python Ruby Cost Logs
NBOW 66.59 59.92 47.15 54.75 63.33 42.86 0.16s/b click here
ConV1d 70.87 60.49 38.81 61.92 67.29 36.53 0.30s/b click here
BiRNN 65.80 48.60 23.23 51.36 48.28 19.35 0.74s/b click here
SelfAttn 78.45 66.55 50.38 65.78 79.09 47.96 0.25s/b click here

Code Completion

Dataset: Py150 (official processed) (raw)

MRR Attr Num Name Param Tokens Cost Logs
LSTM 51.67 47.45 46.52 66.06 73.73 0.31s/b click here
GPT-2 70.37 62.20 63.84 73.54 82.17 0.43s/b click here
TravTrans 72.08 68.55 76.33 71.08 83.17 0.43s/b click here

Type Inference

Dataset: CodeSearchNet-Java (Husain et al.)

Acc@1 (All types) Acc@5 (All types) Acc@1 (Any types) Acc@5 (Any types) Cost Logs
DeepTyper 0.52 0.67 0.43 0.67 0.42s/b TBC
Transformer 0.32 0.64 0.37 0.75 0.85s/b TBC

Heterogeneous Mapping

Dataset: OpenCL (Grewe et al.)

Accuracy AMD NVIDIA
Static mapping 58.82 56.91
Decision tree 70.29 74.56
Inst2vec 82.79 81.76
DeepTune 83.24 80.15

🏫 Examples & Tutorials

All the running commands here should be executed in the root of project folder (the path of your naturalcc). For example, in my environment I will stay at /data/wanyao/Dropbox/ghproj-v100/naturalcc.

We also have more detailed READMEs to start your tutorial of NaturalCC.

Step 1: Download and process a dataset from datasets, and follow the instructions from the README.md file.

# ref: dataset/python_wan/README.md
# download dataset
bash dataset/python_wan/download.sh
# clean data
python -m dataset.python_wan.clean
# cast data attributes into different files
python -m dataset.python_wan.attributes_cast

# ref: dataset/python_wan/summarization/README.md
# save code tokens and docstirng tokens into MMAP format
python -m dataset.python_wan.summarization.preprocess

Step 2 (optional): Register your self-defined models

  • If you want to create a new model, please add your model at ncc/models and ncc/modules.

  • If your training policy are more complex than we thought, you should update your criterions and training procedure at ncc/criterions and ncc/trainers, respectively.

    Do not forget to update your self defined module at ncc/XX/__init__.py.

Step 3: Training and inference.

  • Select a task and a model from task list and follow the instructions in its README.md to start your learning.
# ref: run/summarization/transformer/README.md
# train
CUDA_VISIBLE_DEVICES=0,1,2,3 nohup python -m run.summarization.transformer.train -f config/python_wan/python > run/summarization/transformer/config/python_wan/python.log 2>&1 &
# inference
CUDA_VISIBLE_DEVICES=0 python -m run.summarization.transformer.eval -f config/python_wan/python -o run/summarization/transformer/config/python_wan/python.txt

FAQ

Please fell free to contact me if you have any troubles.

😘 License and Acknowledgement

NaturalCC is MIT-licensed. The license applies to the pre-trained models as well. This project is also highly inspired by Fairseq and AllenNLP.

🔗 Related Links

Paper
NaturalCC-demo
About us: XCodeMind

❤️ Citation

Please cite as:

@inproceedings{wan2022naturalcc,
              author    = {Yao Wan and
                           Yang He and
                           Zhangqian Bi and
                           Jianguo Zhang and
                           Yulei Sui and
                           Hongyu Zhang and
                           Kazuma Hashimoto and
                           Hai Jin and
                           Guandong Xu and
                           Caiming Xiong and
                           Philip S. Yu},
              title     = {NaturalCC: An Open-Source Toolkit for Code Intelligence},
              booktitle   = {Proceedings of 44th International Conference on Software Engineering, Companion Volume},
              publisher = ACM,
              year      = {2022}
            }

More Repositories

1

YiTu

YiTu is an easy-to-use runtime to fully exploit the hybrid parallelism of different hardwares (e.g., GPU) to efficiently support the execution of various kinds of graph algorithms (e.g., GNNs).
Python
353
star
2

VulDeePecker

VulDeePecker: A Deep Learning-Based System for Vulnerability Detection
C
289
star
3

Android-Container

Method on running Linux containers (Docker) on the android platform. Migrate container from X86-Based ubuntu to ARM-Based android.
Python
187
star
4

SCVDT

Source Code Vulnerability Detection Tools(SCVDT)provides a vulnerable code database, vulnerability detection service for Java and C/C++ programs, and other security service.
C
109
star
5

AdvCLIP

The implementation of our ACM MM 2023 paper "AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal Contrastive Learning"
Python
80
star
6

AMT-GAN

The official implementation of our CVPR 2022 paper "Protecting Facial Privacy: Generating Adversarial Identity Masks via Style-robust Makeup Transfer".
Python
78
star
7

AdvEncoder

The implementation of our ICCV 2023 paper "Downstream-agnostic Adversarial Examples"
Python
68
star
8

Tensorflow-RDMA

Tensorflow is a computational library using data flow graphs for scalable machine learning, and Tensorflow-RDMA is the implementation over RDMA, which can get about 4.5x speedup on two nodes comparing with TCP/IP.
C++
60
star
9

awesome-code-intelligence

57
star
10

BadHash

The official implementation of BadHash
Python
56
star
11

XGCN_library

Python
52
star
12

HSCC

HSCC is implemented with zsim-nvmain hybrid simulator, it has achieved the following functions: (1) Memory management simulations (such as MemoryNode, Zone, Buddy Allocator etc.); (2) TLB, page table and reversed page table simulations; (3) Implementation of SHMA, a hierarchical hybrid DRAM/NVM memory system that brought DRAM caching issues into software level; (4) Multiple DRMA-NVM hybrid architecture supports.
C++
51
star
13

HME

HME a hybrid memory emulator for studying the performance and energy characteristics of upcoming NVM technologies. HME exploits features available in commodity NUMA architectures to emulate two kinds of memories: fast, local DRAM, and slower, remote NVM on other NUMA nodes. HME can emulates a wide range of NVM latencies and bandwidth by injecting different memory access delay on the remote NUMA nodes. To facilitate programmers and researchers in evaluating the impact of NVM on the application performance, a high-level programming interface is also provided to allocate memory from NVM or DRAM nodes.
49
star
14

VulCNN

C++
46
star
15

MavenEcoSysResearch

Python
43
star
16

Libdroid

An unikernel-based runtime for mobile computation offloading under Mobile Fog Computing or Mobile Edge Computing scenarios.
Java
38
star
17

Frog

Frog is Asynchronous Graph Processing on GPU with Hybrid Coloring Model. The fundamental idea is based on Pareto principle (or 80-20 rule) about coloring algorithms as we observed through masses of real graph coloring cases.
Cuda
37
star
18

DCF

Dynamic Cuckoo Filter (DCF) is succinct data structure of approximate set representing and membership testing for large-scale dynamic data sets. DCF supports item insertion/deletion/query, and can flexibly adjust its capacity. A DCF reduces the memory space of the state-of-the-art Dynamic Bloom Filter significantly by 75% as well as greatly improving the speeds of insert/query/delete operation by 30% to 80%.
C++
37
star
19

Graphchallenge21

graph challenge 2021
Cuda
27
star
20

pFedSD

The implementation of "Personalized Edge Intelligence via Federated Self- Knowledge Distillation".
Python
20
star
21

Pensieve

Pensieve is a skewness-aware multi-version graph processing system that exploits the time locality of graph version access and leverages a differentiated graph storage strategy.
C++
20
star
22

TeCo

The official implementation of our CVPR 2023 paper "Detecting Backdoors During the Inference Stage Based on Corruption Robustness Consistency".
Python
19
star
23

HierGAT

the implementation of "Entity Resolution via Hierarchical Graph Attention Network"
Python
18
star
24

Rattrap

Rattrap is a container-based cloud platform for mobile code offloading and provides mobile code runtime environments through Cloud Android Container. In this framework, the cloud runtime is not VM or JVM. We use OS-level virtualization "Linux Container (LXC)" as the runtime for mobile code. For the purpose of running android code in x86 GNU-Linux server, we have modified android source code and the linux kernel it uses. The modification work is based on Android-x86 project. With our effort, android OS can finally run in the ordinary linux containers!
Makefile
18
star
25

Horae

Horae is a graph stream summarization structure for efficient temporal range queries. Horae can deal with temporal queries with arbitrary and elastic range while guaranteeing one-sided and controllable errors. More to the point, Horae provides a worst query time of O(log L), where L is the length of query range. Hoare leverages multi-layer storage and Binary Range Decomposition (BRD) algorithm to decompose the temporal range query to logarithmic time interval queries and executes these queries in corresponding layers.
C++
18
star
26

gengar

Gengar, a distributed shared hybrid memory pool with RDMA support. Gengar allows applications to access remote DRAM/NVM in a large and global memory space through a client/server model.
JavaScript
18
star
27

RETIA

Released codes of the RETIA model.
Python
16
star
28

PathEval

This is an evaluation set for the problem of directed/targeted test input generation. We use it to benchmark the ability of Large Language Models for generating inputs to reach a certain code location or produce a particular result.
C
15
star
29

FastJoin

A scalable distributed stream join system
Java
14
star
30

PStream

PStream is a popularity-aware differentiated distributed stream processing system, which identifies the popularity of keys in the stream data and uses a differentiated partitioning scheme. PStream greatly outperforms Storm on skew distributed data in terms of throughput and processing latency.
Java
14
star
31

RGraph

RGraph is an RDMA-assisted asynchronous distributed graph processing system. RGraph distributes edges into two parts to isolate master and mirror vertices. RGraph exploits the asymmetry of RDMA to accelerate the one-to-many communication between master and mirror vertices. The results in comprehensive experiments show that compared to existing designs, PowerGraph, RGraph reduces the execution time by up to 81%.
C++
14
star
32

LDCF

LDCF is a novel efficient approximate set representation structure for large-scale dynamic data sets. LDCF uses a novel multi-level tree structure and reduces the worst insertion and membership testing times from O(N) to O(1).
C++
14
star
33

mioDB

MioDB: Devouring Data Byte-addressable LSM-based KV Stores for Hybrid Memory
C++
13
star
34

TransferAttackSurrogates

The official code of IEEE S&P 2024 paper "Why Does Little Robustness Help? A Further Step Towards Understanding Adversarial Transferability". We study how to train surrogates model for boosting transfer attack.
Python
13
star
35

Simois

Simois is a scalable distributed stream join system, which supports efficient join operations in two streams with highly skewed data distribution. Simois can support the completeness of the join results, and greatly outperforms the existing stream join systems in terms of system throughput and the average processing latency.
Java
13
star
36

HistFuzz

A practical fuzzing tool for SMT solvers
SMT
11
star
37

BCF

Better Choice Cuckoo Filter (BCF) is an efficient approximate set representation data structure. Different from the standard Cuckoo Filter (CF), BCF leverages the principle of the power of two choices to select the better candidate bucket during insertion. BCF reduces the average number of relocations of the state-of-the-art CF by 35%.
C++
11
star
38

ScalaBFS

A Scalable BFS Accelerator on FPGA-HBM Platform
Scala
10
star
39

GraphInstruct

The benchmark proposed in paper: GraphInstruct: Empowering Large Language Models with Graph Understanding and Reasoning Capability
Python
10
star
40

Argus

Argus is a novel RDMA-assisted job scheduler which achieves high resource utilization by fully exploiting the structure feature of stage dependency. Comprehensive experiments using large-scale traces collected from real world show that Argus reduces job completion time and job makespan by 21% and 20%, respectively, compared to RDMA-Spark.
Scala
10
star
41

DGraph

DGraph is a system for directed graph processing with taking advantage of the strongly connected component structure. On this system, most graph partitions are able to reach convergence in order and need to be loaded into the main memory for exactly once, getting much lower data access cost and faster convergence.
C++
10
star
42

HCB-pHCB

Python
9
star
43

PathGraph

PathGraph, a path-centric graph processing system for fast iterative computation on large graphs with billions of edges. Large scale graph analysis applications typically involve datasets of massive scale. Most of existing approaches address the iterative graph computation problem by programming and executing graph computation using either vertex centric or edge centric approaches. We develop a path-centric graph processing system PathGraph for fast iterative computation on large graphs with billions of edges.
C++
9
star
44

Whale

Whale is a novel RDMA-assisted DSPS with efficient one-to-many data partitioning. Whale explores a novel RDMA-assisted stream multicast mechanism and a new worker-oriented communication mechanism. We implement Whale on top of Apache Storm and evaluate it using experiments with large-scale datasets. The results show that Whale achieves 56.6x improvement of system throughput and 97% reduction of processing latency compared to existing designs.
Java
9
star
45

DHUNET

Released code of the DHU-NET model published in ICDM2022.
Python
8
star
46

Attack_PTMC

The dataset, source code and the results of our ESEC/FSE 2023 paper "An Extensive Study on Adversarial Attack against Pre-trained Models of Code".
Python
8
star
47

ShareRender

ShareRender is a cloud gaming system that enables fine-grained resource sharing at the frame-level. Existing cloud gaming systems suffer from low GPU utilization in the virtualized environment. Moreover, GPU resources are scheduled in units of virtual machines (VMs) and this kind of coarse-grained scheduling at the VM-level fails to fully exploit GPU processing capacity. ShareRender offloads graphics workloads within VMs directly to GPUs, bypassing GPU virtualization. For each game running in a VM, ShareRender starts a graphics wrapper to intercept frame rendering requests and assign them to render agents responsible for frame rendering on GPUs. Thanks to the flexible workload assignment among multiple render agents, ShareRender enables fine-grained resource sharing at the frame-level to significantly improve GPU utilization. If you want to know more about ShareRender, please refer to our paper in Multimedia 2017. Wei Zhang, Xiaofei Liao, Peng Li, Hai Jin, Li Lin, "ShareRender: Bypassing GPU Virtualization to Enable Fine-grained Resource Sharing for Cloud Gaming". In Proceedings of ACM International Conference on Multimedia (MM'17), Mountain View, CA, 2017.
C++
8
star
48

PHunter

Java
7
star
49

FeatureIndistinguishableAttack

Implementation of ACM CCS 2021 paper "Feature-Indistinguishable Attack to Circumvent Trapdoor-enabled Defense".
Python
7
star
50

Ares

Ares is a high performance and fault tolerant distributed stream processing system, which considers both both system performance and fault tolerant capability during task allocation and use a game-theoretic approach to obtain an optimal scheduler for task allocation. Ares greatly outperforms Storm in terms of system throughput and the average processing latency.
Java
7
star
51

Amain

Detecting Semantic Code Clones by Building AST-based Markov Chains Model
Python
7
star
52

MorphDAG-prototype

Released code of the MorphDAG prototype (version 1.0)
Go
7
star
53

PRDMA

pRDMA proposes persistent RPC designs. Persistent RPCs use several hardware-supported RDMA Flush primitives to decouple the data persisting from the complicated RPC processing. Also, pRDMA implements several RPC transmission models of state-of-the-art RPC work for performance comparison.
C
7
star
54

MXNet-G

MXNet-G is a deep learning framework designed based on MXNet (https://mxnet.incubator.apache.org/index.html). It allows you to train models with a novel distributed SGD (Stochastic Gradient Descent) algorithm named Grouping-SGD. A new parallelization scheme named GSP (Grouping Synchronous Parallel) is used in Grouping-SGD for distributed deep learning on heterogeneous clusters.
C++
7
star
55

HMCached

HMCached is an in-memory K-V store built on a hybrid DRAM/NVM system. HMCached utilizes an application-level data access counting mechanism to identify frequently-accessed (hotspot) objects (i.e., K-V pairs) in NVM, and migrates them to fast DRAM to reduce the costly NVM accesses. We also propose an NVM-friendly index structure to store the frequently-updated portion of object metadata in DRAM, and thus further mitigate the NVM accesses. Moreover, we propose a benefit-aware memory reassignment policy to address the slab calcification problem in slab-based K-V store systems, and significantly improve the benefit gain from the DRAM.
C
7
star
56

BlockSim

A blockchain network simulator, which can be used for blockchain network protocol verification.
Java
7
star
57

TripeBit

TripeBit is designed based on two important observations. First, it is important to design a storage structure that can directly and efficiently query the RDF graph. This motivates us to design a compact storage and index structure in TripleBit. Second, in order to truly scale the RDF query processor, we need efficient index structures and query evaluation algorithms to minimize the size of intermediate results generated when evaluating queries, especially complex join queries. This leads us to the design decision that we should not only reduce the size of indexes, but also minimize the number of indexes used in query evaluation.
C++
7
star
58

TreeCen

Python
6
star
59

VulLLM

An implementation of the ACL 2024 Findings paper "Generalization-Enhanced Code Vulnerability Detection via Multi-Task Instruction Fine-Tuning".
Python
6
star
60

FedGKD

Python
6
star
61

vdgraph

Python
6
star
62

Patrol

Promela
6
star
63

Auxo

Auxo is a scalable and efficient framework for graph stream summarization
C++
6
star
64

DarkSAM

The implementation of our NeurIPS 2024 paper "DarkSAM: Fooling Segment Anything Model to Segment Nothing".
6
star
65

Nezha

An efficient concurrency control mechanism towards DAG-based blockchains
Go
6
star
66

VulBG

Python
6
star
67

AdvHash

The official implementation of our ACM MM 2021 paper "AdvHash: Set-to-set Targeted Attack on Deep Hashing with One Single Adversarial Patch".
Python
6
star
68

NightWatch

NightWatch is an extension of memory management system that provides general, transparent and low-overhead cache pollution control. NightWatch extends the memory mapping into two types: restrictive-mapping and open-mapping. The restrictive-mapping is used for restricting the pollution effect of the poor locality data, while the open-mapping is used for cache friendly data. When a malloc request arrives, NightWatch will predict the access locality of the to be allocated memory, determine the proper cache demand, and select the right mapping type for the malloc request. NightWatch is based on the observation that data within the same memory chunk or chunks within the same allocation context often share similar locality property. NightWatch embodies this observation by online monitoring current cache locality to predict future behavior and restricting potential cache polluters proactively.
C++
6
star
69

MHSim

C++
5
star
70

HomDroid

5
star
71

Gen-AF

The implementation of our IEEE S&P 2024 paper "Securely Fine-tuning Pre-trained Encoders Against Adversarial Examples".
Python
5
star
72

PointCRT

PointCRT: Detecting Backdoor in 3D Point Cloud via Corruption Robustness (MM '23)
Python
5
star
73

SSDUPplus

The new version of SSDUP, an optimized SSD Burst Buffer for HPC by Traffic Detection
C
5
star
74

AndroidX

AndroidX is a customizable execution runtime environment for running Android applications on clouds
C
5
star
75

SubInfer

the source code and supplementary materials of paper "An Efficient Subgraph-inferring Framework for Large-scale Heterogeneous Graphs"
Python
5
star
76

Scube

Scube is an efficient summarization structure for skewed graph stream. Two factors contribute to the efficiency of Scube. First, Scube proposes a space and computation efficient probabilistic counting scheme to identify high-degree nodes in a graph stream. Second, Scube differentiates the storage strategy for the edges associated with high-degree nodes by dynamically allocating multiple rows or columns. We conduct comprehensive experiments to evaluate the performance of Scube on large-scale real-world datasets. The results show that Scube significantly reduces the query latency over a graph stream by 48%-99%, as well as achieving acceptable query accuracy compared to the state-of-the-art designs.
C++
5
star
77

SSDUP

SSDUP is a traffic-aware burst buffer for HPC systems, which detects the randomness in HPC IO write operations and flush the SSD buffer with a pipeline mode overlapping the SSD flush phase and write phase.
C
5
star
78

streambox

Python
4
star
79

PandaKit

Kotlin
4
star
80

GradientsScrutinizer

Python
4
star
81

DynamicTG

Java
4
star
82

GoPie

Go
4
star
83

FL_Bug_Study

The data, source code and the results of our ESEC/FSE 2023 paper "Understanding the Bug Characteristics and Fix Strategies of Federated Learning Systems".
Python
4
star
84

Mammoth

Mammoth is a new MapReduce system which aims to improve MapReduce performance using global memory management. We have conducted extensive experiments with comparison against the native Hadoop platform. The results show that the Mammoth system can reduce the total job execution time by 40% in typical cases, without requiring any modifications of Hadoop programs. When a system is short of memory, the performance improvement can be up to 5 times as observed for CPU and I/O intensive applications, such as PageRank. Given the growing importance of supporting large-scale data processing and analysis, and the proven success of the MapReduce platform, the Mammoth system can have a promising potential and impact.
Java
4
star
85

LiveRender

LiveRender is an open source cloud gaming system based on graphics streaming. LiveRender intercepts the D3D graphics commands and migrates them from the server to the client. We use several compression techniques to reduce the data transmission of graphics streaming, and so LiveRender provides a better experience of cloud gaming.
C
4
star
86

ACStor

In virtualized data centers, the access of virtual disk images (VDIs) is critical for the overall system performance. As the system scales up to a large number of running VMs, the overall network traffic would become unbalanced with hot spots on some VMs inevitably, leading to I/O performance degradation when accessing the VMs. We propose an adaptive and collaborative VDI storage system (ACStor) to resolve the above performance issue, which can dynamically balance the traffic workloads in accessing VDI chunks based on the run-time network state.
C
4
star
87

LoomIO

LoomIO is an object-level coordination system for distributed file systems. It adopts wait-free design to enable interfering object requests self-organizing and obtain an optimized scheduling decision. Currently, LoomIO is implemented and integrated in Ceph.
C++
4
star
88

JOpFuzzer

Java
4
star
89

LDV

A Lightweight DAG-Based Blockchain for Vehicular Social Networks
Go
3
star
90

ChartStamp

The implementation of ACM MM 2022 paper "ChartStamp: Robust Chart Encoding for Real-World Applications".
C++
3
star
91

CausalNET

Python
3
star
92

StateDiver

Python
3
star
93

Minix-Container-Support

Minix kernel hacking
C
3
star
94

E2CF

The entry-extensible cuckoo filter (E2CF) is an approximate set representation structure, which supports entry-level extension and avoids many discrete memory accesses in a query.
C++
3
star
95

FastResponse

C
3
star
96

Robin

The implementation of our ASE 2023 paper "Robin: A Novel Method to Produce Robust Interpreters for Deep Learning-Based Code Classifiers".
Python
3
star
97

COVID19-Dataset

The labeled news data about coronavirus pneumonia.
3
star
98

FJoin

SystemVerilog
3
star
99

Layrub

Layrub is a runtime data placement strategy for extreme-scale net training. It is developed on BVLC Caffe, and achieves memory savings more than 50% over BVLC Caffe.
C++
3
star
100

LambdaMisuse

Jupyter Notebook
3
star