JPlag - Detecting Software Plagiarism
JPlag is a system that finds similarities among multiple sets of source code files. This way it can detect software plagiarism and collusion in software development. JPlag currently supports various programming languages, EMF metamodels, and natural language text.
Supported Languages
In the following, a list of all supported languages with their supported language version is provided. A language can be selected from the command line using the -l <cli argument name>
argument.
Language | Version | CLI Argument Name | state | parser |
---|---|---|---|---|
Java | 17 | java | mature | JavaC |
C/C++ | 11 | cpp | legacy | JavaCC |
C/C++ | 14 | cpp2 | beta | ANTLR 4 |
C# | 8 | csharp | beta | ANTLR 4 |
Go | 1.17 | golang | beta | ANTLR 4 |
Kotlin | 1.3 | kotlin | beta | ANTLR 4 |
Python | 3.6 | python3 | legacy | ANTLR 4 |
R | 3.5.0 | rlang | beta | ANTLR 4 |
Rust | 1.60.0 | rust | beta | ANTLR 4 |
Scala | 2.13.8 | scala | beta | Scalameta |
Scheme | ? | scheme | unknown | JavaCC |
Swift | 5.4 | swift | beta | ANTLR 4 |
EMF Metamodel | 2.25.0 | emf | alpha | EMF |
Text (naive) | - | text | legacy | CoreNLP |
Download and Installation
You need Java SE 17 to run or build JPlag.
Downloading a release
- Download a released version.
- In case you depend on the legacy version of JPlag we refer to the legacy release v2.12.1 and the legacy branch.
Via Maven
JPlag is released on Maven Central, it can be included as follows:
<dependency>
<groupId>de.jplag</groupId>
<artifactId>jplag</artifactId>
</dependency>
Building from sources
- Download or clone the code from this repository.
- Run
mvn clean package
from the root of the repository to compile and build all submodules. Runmvn clean package assembly:single
instead if you need the full jar which includes all dependencies. - You will find the generated JARs in the subdirectory
cli/target
.
Usage
JPlag can either be used via the CLI or directly via its Java API. For more information, see the usage information in the wiki. If you are using the CLI, you can display your results via jplag.github.io. No data will leave your computer!
CLI
Note that the legacy CLI is varying slightly.
positional arguments:
rootDir Root-directory with submissions to check for plagiarism
named arguments:
-h, --help show this help message and exit
-new NEW [NEW ...] Root-directory with submissions to check for plagiarism (same as the root directory)
-old OLD [OLD ...] Root-directory with prior submissions to compare against
-l {cpp,csharp,emf,go,java,kotlin,python3,rlang,rust,scala,scheme,swift,text}
Select the language to parse the submissions (default: java)
-bc BC Path of the directory containing the base code (common framework used in all
submissions)
-t T Tunes the comparison sensitivity by adjusting the minimum token required to be counted
as a matching section. A smaller <n> increases the sensitivity but might lead to more
false-positives
-n N The maximum number of comparisons that will be shown in the generated report, if set
to -1 all comparisons will be shown (default: 100)
-r R Name of the directory in which the comparison results will be stored (default: result)
Advanced:
-d Debug parser. Non-parsable files will be stored (default: false)
-s S Look in directories <root-dir>/*/<dir> for programs
-p P comma-separated list of all filename suffixes that are included
-x X All files named in this file will be ignored in the comparison (line-separated list)
-m M Comparison similarity threshold [0.0-1.0]: All comparisons above this threshold will
be saved (default: 0.0)
Clustering:
--cluster-skip Skips the clustering (default: false)
--cluster-alg {AGGLOMERATIVE,SPECTRAL}
Which clustering algorithm to use. Agglomerative merges similar submissions bottom up.
Spectral clustering is combined with Bayesian Optimization to execute the k-Means
clustering algorithm multiple times, hopefully finding a "good" clustering
automatically. (default: spectral)
--cluster-metric {AVG,MIN,MAX,INTERSECTION}
The metric used for clustering. AVG is intersection over union, MAX can expose some
attempts of obfuscation. (default: MAX)
Java API
The new API makes it easy to integrate JPlag's plagiarism detection into external Java projects:
Language language = new de.jplag.java.Language();
Set<File> submissionDirectories = Set.of(new File("/path/to/rootDir"));
File baseCode = new File("/path/to/baseCode");
JPlagOptions options = new JPlagOptions(language, submissionDirectories, Set.of()).withBaseCodeSubmissionDirectory(baseCode);
JPlag jplag = new JPlag(options);
try {
JPlagResult result = jplag.run();
// Optional
ReportObjectFactory reportObjectFactory = new ReportObjectFactory();
reportObjectFactory.createAndSaveReport(result, "/path/to/output");
} catch (ExitException e) {
// error handling here
}
Contributing
We're happy to incorporate all improvements to JPlag into this codebase. Feel free to fork the project and send pull requests. Please consider our guidelines for contributions.
Contact
If you encounter bugs or other issues, please report them here. For other purposes, you can contact us at [email protected] . If you are doing research related to JPlag, we would love to know what you are doing. Feel free to contact us!