• Stars
    star
    2,033
  • Rank 22,758 (Top 0.5 %)
  • Language
    Python
  • Created over 9 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

πŸ” ScanCode detects licenses, copyrights, dependencies by "scanning code" ... to discover and inventory open source and third-party packages used in your code. Sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase, the Google Summer of Code, Azure credits, nexB and others generous sponsors!

ScanCode toolkit

A typical software project often reuses hundreds of third-party packages. License and packages, dependencies and origin information is not always easy to find and not normalized: ScanCode discovers and normalizes this data for you.

Read more about ScanCode here: https://scancode-toolkit.readthedocs.io/.

Check out the code at https://github.com/nexB/scancode-toolkit

Discover also:

Build and tests status

We run 30,000+ tests on each commit on multiple CIs to ensure a good platform compabitility with multiple versions of Windows, Linux and macOS.

Azure RTD Build GitHub actions Docs GitHub actions Release
Azure tests status (Linux, macOS, Windows) Documentation Status Documentation Tests Release tests

Why use ScanCode?

  • As a standalone command-line tool, ScanCode is easy to install, run, and embed in your CI/CD processing pipeline. It runs on Windows, macOS, and Linux.
  • ScanCode is used by several projects and organizations such as the Eclipse Foundation, OpenEmbedded.org, the FSFE, the FSF, OSS Review Toolkit, ClearlyDefined.io, RedHat Fabric8 analytics, and many more.
  • ScanCode detects licenses, copyrights, package manifests, direct dependencies, and more both in source code and binary files and is considered as the best-in-class and reference tool in this domain, re-used as the core tools for software composition data collection by several open source tools.
  • ScanCode provides the most accurate license detection engine and does a full comparison (also known as diff or red line comparison) between a database of license texts and your code instead of relying only on approximate regex patterns or probabilistic search, edit distance or machine learning.
  • Written in Python, ScanCode is easy to extend with plugins to contribute new and improved scanners, data summarization, package manifest parsers, and new outputs.
  • You can save your scan results as JSON, YAML, HTML, CycloneDX or SPDX or even create your own format with Jinja templates.
  • You can also organize and run ScanCode server-side with the companion ScanCode.io web app to organize and store multiple scan projects including scripted scanning pipelines.
  • ScanCode output data can be easily visualized and analysed using the ScanCode Workbench desktop app.
  • ScanCode is actively maintained, has a growing users and contributors community.
  • ScanCode is heavily tested with an automated test suite of over 20,000 tests.
  • ScanCode has an extensive and growing documentation.
  • ScanCode can process packages, build manifest and lockfile formats to collect Package URLs and extract metadata: Alpine packages, BUCK files, ABOUT files, Android apps, Autotools, Bazel, JavaScript Bower, Java Axis, MS Cab, Rust Cargo, Cocoapods, Chef Chrome apps, PHP Composer and composer.lock, Conda, CPAN, Debian, Apple dmg, Java EAR, WAR, JAR, FreeBSD packages, Rubygems gemspec, Gemfile and Gemfile.lock, Go modules, Haxe packages, InstallShield installers, iOS apps, ISO images, Apache IVY, JBoss Sar, R CRAN, Apache Maven, Meteor, Mozilla extensions, MSI installers, JavaScript npm packages, package-lock.json, yarn.lock, NSIS Installers, NugGet, OPam, Cocoapods, Python PyPI setup.py, setup.cfg, and several related lockfile formats, semi structured README files such as README.android, README.chromium, README.facebook, README.google, README.thirdparty, RPMs, Shell Archives, Squashfs images, Java WAR, Windows executables and the Windows registry and a few more. See all available package parsers for the exhaustive list.

See our roadmap for upcoming features.

Documentation

The ScanCode documentation is hosted at scancode-toolkit.readthedocs.io.

If you are new to visualization of scancode results data, start with our newcomer page.

If you want to compare output changes between different versions of ScanCode, or want to look at scans generated by ScanCode, review our reference scans.

Other Important Documentation Pages:

See also https://aboutcode.org for related companion projects and tools.

Installation

Before installing ScanCode make sure that you have installed the prerequisites properly. This means installing Python 3.8 for x86/64 architectures. We support Python 3.8, 3.9, 3.10 and 3.11.

See prerequisites for detailed information on the support platforms and Python versions.

There are a few common ways to install ScanCode.

Quick Start

After ScanCode is installed successfully you can run an example scan printed on screen as JSON:

scancode -clip --json-pp - samples

Follow the How to Run a Scan tutorial to perform a basic scan on the samples directory distributed by default with ScanCode.

See more command examples:

scancode --examples

See How to select what will be detected in a scan and How to specify the output format for more information.

You can also refer to the command line options synopsis and an exhaustive list of all available command line options.

Archive extraction

By default ScanCode does not extract files from tarballs, zip files, and other archives as part of the scan. The archives that exist in a codebase must be extracted before running a scan: extractcode is a bundled utility behaving as a mostly-universal archive extractor. For example, this command will recursively extract the mytar.tar.bz2 tarball in the mytar.tar.bz2-extract directory:

./extractcode mytar.tar.bz2

See all extractcode options and how to extract archives for details.

Support

If you have a problem, a suggestion or found a bug, please enter a ticket at: https://github.com/nexB/scancode-toolkit/issues

For discussions and chats, we have:

  • an official Gitter channel for web-based chats. Gitter is now accessible through Element or an IRC bridge. There are other AboutCode project-specific channels available there too.
  • The discussion channel for scancode specifically aimed at users and developers using scancode-toolkit.

Source code and downloads

License

  • Apache-2.0 as the overall license
  • CC-BY-4.0 for reference datasets (initially was in the Public Domain).
  • Multiple other secondary permissive or copyleft licenses (LGPL, MIT, BSD, GPL 2/3, etc.) for third-party components and test suite code and data.

See the NOTICE file and the .ABOUT files that document the origin and license of the third-party code used in ScanCode for more details.

More Repositories

1

vulnerablecode

A free and open vulnerabilities database and the packages they impact. And the tools to aggregate and correlate these vulnerabilities. Sponsored by NLnet https://nlnet.nl/project/vulnerabilitydatabase/ for https://www.aboutcode.org/ Chat at https://gitter.im/aboutcode-org/vulnerablecode Docs at https://vulnerablecode.readthedocs.org/
Python
510
star
2

aboutcode

AboutCode project: tools and data to uncover things about code: the provenance, origin, license, and more (packages, security, quality, etc.) of FOSS code
Batchfile
153
star
3

scancode-workbench

πŸ“Š ScanCode Workbench is a desktop app to review and conclude license and origin from code scans generated by ScanCode Toolkit.
TypeScript
145
star
4

scancode.io

ScanCode.io is a server to script and automate software composition analysis pipelines with ScanPipe pipelines. This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ Google Summer of Code, nexB and others generous sponsors!
Python
94
star
5

aboutcode-toolkit

βœ… AboutCode Toolkit provides a simple way to document provenance metadata (origin and license) about third-party code that you use in your project: it includes utilities to generate inventory/BOM or Attribution documentation.
Python
90
star
6

license-expression

Utility library to parse, normalize and compare License expressions for Python using a boolean logic engine. For expressions using SPDX or any other license id scheme.
Python
54
star
7

extractcode

A mostly universal file extraction library and CLI tool to extract almost any archive in a reasonably safe way on Linux, macOS and Windows.
Python
31
star
8

container-inspector

container-inspector is a suite of analysis utilities and command line tools for Docker container images, their layers and how these relate to each other. It can also handle OCI images and Dockerfiles.
Python
30
star
9

python-publicsuffix2

A small Python library to deal with publicsuffix data (includes a bundled PSL as "package data") in a wheel friendly format. Fork and continuation of TomaΕΎ Ε olc's "publicsuffix"
Python
29
star
10

purldb

Tools to create and expose a database of purls (Package URLs). This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
HTML
29
star
11

scancode-licensedb

A free and open database of all the licenses, in particular all the open source software licenses
Makefile
27
star
12

univers

Parse and compare all the package versions and all the ranges. From debian, npm, pypi, ruby and more. Process all the version range specs and expressions. This project is sponsored by an NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ , the Google Summer of Code, nexB and others generous sponsors!
Python
27
star
13

tracecode-toolkit-strace

Trace software components, packages and files between Development/Source and Deployment/Distribution/Binaries codebases - strace build analysis
Python
25
star
14

python-inspector

Inspect Python code and PyPI package manifests. Resolve Python dependencies.
Python
20
star
15

deltacode

DeltaCode: compare two codebase scans (from ScanCode) to detect significant changes.
Python
19
star
16

scancode-server

This project is no longer maintained. Visit https://github.com/nexB/scancode.io/ instead for similar and current project
Python
19
star
17

dejacode

Automate open source license compliance and ensure software supply chain integrity
Python
18
star
18

pip-requirements-parser

a mostly correct pip requirements parsing library
Python
16
star
19

debian-inspector

A python library to parse Debian deb822-style control and copyright files and all related Debian, Ubuntu and Debian-derivative manifest and metadata files, an alternative approach to python-debian.
Python
13
star
20

cwe2

Common weakness enumeration library for Python (maintained fork of https://github.com/Julian-Nash/cwe )
Python
11
star
21

saneyaml

Cleaner, simpler, safer and saner YAML parsing/serialization in Python, for YAML meant to be readable first, on top of PyYAML
Python
9
star
22

fetchcode

A library to reliably fetch code via HTTP, FTP and version control systems. This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ Google Summer of Code, nexB and others generous sponsors!
HTML
9
star
23

skeleton

Python
8
star
24

typecode

Python
7
star
25

clearcode-toolkit

ClearCode is a simple tool to fetch and sync all ClearlyDefined data locally.
Python
7
star
26

scancode-analyzer

scancode-results-analyzer
Python
4
star
27

scancode-thirdparty-src

Source code for ScanCode prebuilt dependencies
HTML
4
star
28

nuget-inspector

Inspect and resolve .NET and NuGet package dependencies like dotnet and nuget do. Fetch manifests data. Runs on Linux, Windows and macOS as a standalone application.
C#
4
star
29

purldb-data

A dataset of purl for offline lookup and verification usage. This project is sponsored by NLnet project https://nlnet.nl/project/vulnerabilitydatabase/ and nexB for https://www.aboutcode.org/ Chat is at https://gitter.im/aboutcode-org/discuss
4
star
30

scancode-action

Run ScanCode.io pipelines from your Workflows
4
star
31

commoncode

Python
3
star
32

pkginfo2

Git mirror of http://bazaar.launchpad.net/~tseaver/pkginfo ... with modifications
Python
3
star
33

pygmars

Craft simple regex-based small language lexers and parsers. Build parsers from grammars and accept Pygments lexers as an input. Derived from NLTK.
Python
3
star
34

turbo-spdx

Fast and lightweight Python library for parsing and writing SPDX JSON documents correctly.
Python
2
star
35

scancode-plugins

A set of plugins either delivered as builtin scancode-toolkit or extra plugins
HTML
2
star
36

scancode-toolkit-contrib

Candidate additions and contribution for the ScanCode toolkit
C
2
star
37

dependency-inspector

A general purpose, mostly universal software package dependency resolver.
Go
2
star
38

scancode-toolkit-plugin-cookiecutter

Python
1
star
39

plugincode

Python
1
star
40

jvm-inspector

[WIP] jvm-inspector is a set of tools and utility functions to inspect JVM byte code and source code
Python
1
star
41

sanexml

Python
1
star
42

federatedcode

Python
1
star
43

dejacode-toolkit

[Work in progress] An API client and toolkit with libraries, utilities and helpers to work with the DejaCode API
1
star
44

go-inspector

[WIP] An inspector for Go language-based source, binaries, packages, dependencies and metadata
Python
1
star
45

scancode.io-pipeline-glc_scan

Python
1
star
46

scancode-toolkit-reference-scans

scancode-toolkit-reference-scans
HTML
1
star
47

heritedcode

A software heritage API client
Python
1
star
48

vulnerablecode-data

1
star
49

aboutcode-cyclonedx-taxonomy

AboutCode CycloneDX Property Taxonomy
1
star
50

spdx-licenses

A mirror of http://spdx.org licenses
1
star
51

matchcode-toolkit

Python
1
star
52

attributecode

[Archived] This project was an Attribution generation tool with many content and format options for the input data. All its features have been folded back in the latest AboutCode Toolkit at https://github.com/nexB/aboutcode-toolkit
Python
1
star