• Stars
    star
    184
  • Rank 209,144 (Top 5 %)
  • Language
    Python
  • License
    Apache License 2.0
  • Created over 9 years ago
  • Updated about 1 month ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A Python library to parse, validate and create SPDX documents.

Python library to parse, validate and create SPDX documents

CI status (Linux, macOS and Windows): Install and Test

Current state, please read!

This repository was subject to a major refactoring recently to get ready for the upcoming SPDX v3.0 release. Therefore, we'd like to encourage you to post any and all issues you find at https://github.com/spdx/tools-python/issues.
If you are looking for the source code of the current PyPI release, check out the v0.7 branch. Note, though, that this will only receive bug fixes but no new features.

We encourage you to use the new, refactored version (on the main branch) if you

  • want to use the soon-to-be released SPDX v3.0 in the future
  • want to perform full validation of your SPDX documents against the v2.2 and v2.3 specification
  • want to use the RDF format of SPDX with all v2.3 features.

If you are planning to migrate from v0.7.x of these tools, please have a look at the migration guide.

Information

This library implements SPDX parsers, convertors, validators and handlers in Python.

License

Apache-2.0

Features

  • API to create and manipulate SPDX v2.2 and v2.3 documents
  • Parse, convert, create and validate SPDX files
  • supported formats: Tag/Value, RDF, JSON, YAML, XML
  • visualize the structure of a SPDX document by creating an AGraph. Note: This is an optional feature and requires additional installation of optional dependencies

Experimental support for SPDX 3.0

  • Create v3.0 elements and payloads
  • Convert v2.2/v2.3 documents to v3.0
  • Serialize to JSON-LD

See Quickstart to SPDX 3.0 below.
The implementation is based on the descriptive markdown files in the repository https://github.com/spdx/spdx-3-model (latest commit: a5372a3c145dbdfc1381fc1f791c68889aafc7ff).

Installation

As always you should work in a virtualenv (venv). You can install a local clone of this repo with yourenv/bin/pip install . or install it from PyPI (check for the newest release and install it like yourenv/bin/pip install spdx-tools==0.8.0a2). Note that on Windows it would be Scripts instead of bin.

How to use

Command-line usage

  1. PARSING/VALIDATING (for parsing any format):
  • Use pyspdxtools -i <filename> where <filename> is the location of the file. The input format is inferred automatically from the file ending.

  • If you are using a source distribution, try running:
    pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json

  1. CONVERTING (for converting one format to another):
  • Use pyspdxtools -i <input_file> -o <output_file> where <input_file> is the location of the file to be converted and <output_file> is the location of the output file. The input and output formats are inferred automatically from the file endings.

  • If you are using a source distribution, try running:
    pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag

  • If you want to skip the validation process, provide the --novalidation flag, like so:
    pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json -o output.tag --novalidation
    (use this with caution: note that undetected invalid documents may lead to unexpected behavior of the tool)

  • For help use pyspdxtools --help

  1. GRAPH GENERATION (optional feature)
  • This feature generates a graph representing all elements in the SPDX document and their connections based on the provided relationships. The graph can be rendered to a picture. Below is an example for the file tests/data/SPDXJSONExample-v2.3.spdx.json: SPDXJSONExample-v2.3.spdx.png
  • Make sure you install the optional dependencies networkx and pygraphviz. To do so run pip install ".[graph_generation]".
  • Use pyspdxtools -i <input_file> --graph -o <output_file> where <output_file> is an output file name with valid format for pygraphviz (check the documentation here).
  • If you are using a source distribution, try running pyspdxtools -i tests/data/SPDXJSONExample-v2.3.spdx.json --graph -o SPDXJSONExample-v2.3.spdx.png to generate a png with an overview of the structure of the example file.

Library usage

  1. DATA MODEL
  • The spdx_tools.spdx.model package constitutes the internal SPDX v2.3 data model (v2.2 is simply a subset of this). All relevant classes for SPDX document creation are exposed in the __init__.py found here.
  • SPDX objects are implemented via @dataclass_with_properties, a custom extension of @dataclass.
    • Each class starts with a list of its properties and their possible types. When no default value is provided, the property is mandatory and must be set during initialization.
    • Using the type hints, type checking is enforced when initializing a new instance or setting/getting a property on an instance (wrong types will raise ConstructorTypeError or TypeError, respectively). This makes it easy to catch invalid properties early and only construct valid documents.
    • Note: in-place manipulations like list.append(item) will circumvent the type checking (a TypeError will still be raised when reading list again). We recommend using list = list + [item] instead.
  • The main entry point of an SPDX document is the Document class from the document.py module, which links to all other classes.
  • For license handling, the license_expression library is used.
  • Note on documentDescribes and hasFiles: These fields will be converted to relationships in the internal data model. As they are deprecated, these fields will not be written in the output.
  1. PARSING
  • Use parse_file(file_name) from the parse_anything.py module to parse an arbitrary file with one of the supported file endings.
  • Successful parsing will return a Document instance. Unsuccessful parsing will raise SPDXParsingError with a list of all encountered problems.
  1. VALIDATING
  • Use validate_full_spdx_document(document) to validate an instance of the Document class.
  • This will return a list of ValidationMessage objects, each consisting of a String describing the invalidity and a ValidationContext to pinpoint the source of the validation error.
  • Validation depends on the SPDX version of the document. Note that only versions SPDX-2.2 and SPDX-2.3 are supported by this tool.
  1. WRITING
  • Use write_file(document, file_name) from the write_anything.py module to write a Document instance to the specified file. The serialization format is determined from the filename ending.
  • Validation is performed per default prior to the writing process, which is cancelled if the document is invalid. You can skip the validation via write_file(document, file_name, validate=False). Caution: Only valid documents can be serialized reliably; serialization of invalid documents is not supported.

Example

Here are some examples of possible use cases to quickly get you started with the spdx-tools. If you want a more comprehensive example about how to create an SPDX document from scratch, have a look here.

import logging

from license_expression import get_spdx_licensing

from spdx_tools.spdx.model import (Checksum, ChecksumAlgorithm, File, 
                                   FileType, Relationship, RelationshipType)
from spdx_tools.spdx.parser.parse_anything import parse_file
from spdx_tools.spdx.validation.document_validator import validate_full_spdx_document
from spdx_tools.spdx.writer.write_anything import write_file

# read in an SPDX document from a file
document = parse_file("spdx_document.json")

# change the document's name
document.creation_info.name = "new document name"

# define a file and a DESCRIBES relationship between the file and the document
checksum = Checksum(ChecksumAlgorithm.SHA1, "71c4025dd9897b364f3ebbb42c484ff43d00791c")

file = File(name="./fileName.py", spdx_id="SPDXRef-File", checksums=[checksum], 
            file_types=[FileType.TEXT], 
            license_concluded=get_spdx_licensing().parse("MIT and GPL-2.0"),
            license_comment="licenseComment", copyright_text="copyrightText")

relationship = Relationship("SPDXRef-DOCUMENT", RelationshipType.DESCRIBES, "SPDXRef-File")

# add the file and the relationship to the document 
# (note that we do not use "document.files.append(file)" as that would circumvent the type checking)
document.files = document.files + [file]
document.relationships = document.relationships + [relationship]

# validate the edited document and log the validation messages
# (depending on your use case, you might also want to utilize the validation_message.context)
validation_messages = validate_full_spdx_document(document)
for validation_message in validation_messages:
    logging.warning(validation_message.validation_message)

# if there are no validation messages, the document is valid 
# and we can safely serialize it without validating again
if not validation_messages:
    write_file(document, "new_spdx_document.rdf", validate=False)

Quickstart to SPDX 3.0

In contrast to SPDX v2, all elements are now subclasses of the central Element class. This includes packages, files, snippets, relationships, annotations, but also SBOMs, SpdxDocuments, and more.
For serialization purposes, all Elements that are to be serialized into the same file are collected in a Payload. This is just a dictionary that maps each Element's SpdxId to itself. Use the write_payload() functions to serialize a payload. There currently are two options:

  • The spdx_tools.spdx3.writer.json_ld.json_ld_writer module generates a JSON-LD file of the payload.
  • The spdx_tools.spdx3.writer.console.payload_writer module prints a debug output to console. Note that this is not an official part of the SPDX specification and will probably be dropped as soon as a better standard emerges.

You can convert an SPDX v2 document to v3 via the spdx_tools.spdx3.bump_from_spdx2.spdx_document module. The bump_spdx_document() function will return a payload containing an SpdxDocument Element and one Element for each package, file, snippet, relationship, or annotation contained in the v2 document.

Dependencies

Support

Contributing

Contributions are very welcome! See CONTRIBUTING.md for instructions on how to contribute to the codebase.

History

This is the result of an initial GSoC contribution by @ah450 (or https://github.com/a-h-i) and is maintained by a community of SPDX adopters and enthusiasts. In order to prepare for the release of SPDX v3.0, the repository has undergone a major refactoring during the time from 11/2022 to 03/2023.

More Repositories

1

license-list-data

Various data formats for the SPDX License List including RDFa, HTML, Text, and JSON
HTML
495
star
2

license-list-XML

This is the repository for the master files that comprise the SPDX License List
Makefile
344
star
3

spdx-spec

The SPDX specification in MarkDown and HTML formats.
Python
288
star
4

spdx-sbom-generator

Support CI generation of SBOMs via golang tooling.
Go
151
star
5

tools

SPDX Tools
Java
125
star
6

spdx-examples

Examples of SPDX files for software combinations
Java
123
star
7

tools-golang

Collection of Go packages to work with SPDX files
Go
121
star
8

spdx-3-model

The model for the information captured in SPDX version 3 standard.
69
star
9

spdx-online-tools

Source for the website providing online SPDX tools
JavaScript
60
star
10

tools-java

SPDX Command Line Tools using the Spdx-Java-Library
Java
59
star
11

spdx-to-osv

Produce an Open Source Vulnerability JSON file based on information in an SPDX document
Java
59
star
12

ntia-conformance-checker

Check SPDX SBOM for NTIA minimum elements
Python
53
star
13

spdx-maven-plugin

Plugin for supporting SPDX in a Maven build.
Java
44
star
14

license-list

SPDX License List - Archived through v2.6
42
star
15

Spdx-Java-Library

Java library which implements the Java object model for SPDX and provides useful helper functions
Java
33
star
16

spdx-license-diff

Chrome/Firefox browser extension to compare text against spdx license list
JavaScript
33
star
17

cdx2spdx

Utility that converts SBOM documents from CycloneDX to SPDX
Java
27
star
18

meetings

This repository stores meetings minutes for the SPDX project
26
star
19

spdx-license-matcher

A tool to match license text with SPDX license list using a an algorithm with finds close matches. It follows SPDX Matching guidelines to keep the substantial text as well as ignore the replaceable text for matching purposes.
Python
26
star
20

sbom-landscape

SPDX SBOM Landscape
15
star
21

governance

SPDX Governance, based on Community Specification model
15
star
22

spdx-gradle-plugin

Java
15
star
23

gordf

Go
11
star
24

LicenseListPublisher

Tool that generates license data found in the license-list-data repository from the license-list-XML source
Java
11
star
25

spdx-build-tool

Support a continuous integration (CI) generation of SPDX files by creating a plugins or extensions to build tools. These plugins or extensions will generate valid SPDX documents based on the build file metadata and source files. https://github.com/spdx/
Python
11
star
26

spdx-tools-js

JavaScript
9
star
27

ATTIC-osit

Open Source Inspect Tool by OSE, Samsung
Java
8
star
28

license-coverage-grader

This is a tool which take an SPDX document and pointer to the original source files, and determine a "grade" score to quantify how complete the licensing information is at the file level for the code represented by the SPDX document.
Python
7
star
29

ATTIC-tools-go

Legacy SPDX Parser Go Language Library - replaced by tools-golang
Go
6
star
30

yalm-python

Implement SPDX License Matching in Python. Project in CommunityBridge Linux Foundation 2020.
Python
6
star
31

spdx-github

SPDX Github Integration Tools
Python
5
star
32

spec-parser

automagically process the specification
Python
5
star
33

tools-ts

TypeScript
4
star
34

outreach

content for outreach activities
4
star
35

ATTIC-airs

Auto IdentifieR using Spdx by OSE, Samsung
Java
4
star
36

schema-to-java

Generates Java classes from the SPDX Schema
Java
3
star
37

spdx-adoption

Keeping list of projects that are using SPDX headers, and those that are able to generate SPDX documents.
3
star
38

license-test-files

Test files which can be used to check license scanners.
3
star
39

change-proposal

Repository for change proposal for the SPDX project
3
star
40

license-test-generator

Tool to generate the license test files (github.com/spdx/license-test-files) from the SPDX listed licenses (github.com/spdx/license-list-data)
PHP
3
star
41

spdx-java-jackson-store

JSON storage implementation for the SPDX tools
Java
3
star
42

TEST-LicenseList-XML

This is a copy of the LicenseListXML repository to be used for testing
Makefile
2
star
43

spdx-java-rdf-store

SPDX Tools RDF Support Library
Java
2
star
44

GSoC

SPDX participation in the Google Summer of Code program
2
star
45

canonical-serialisation

SPDX Canonicalisation repo
CSS
2
star
46

package-licenses-mapping

Data mapping license declarations as found in package manifests to a SPDX license expression.
2
star
47

spdx-java-spreadsheet-store

SPDX Java library spreadsheet storage
Java
2
star
48

license-test-scans

Tools to help compare license scans
Python
2
star
49

spec-v3-template

Templates and examples for writing the v3 specification
2
star
50

spdx-java-tagvalue-store

SPDX Document Storage using the Tag/Value format
Java
2
star
51

spdx-model-to-java

Generates Java source files from the SPDX spec version 3+ suitable for inclusion in the SPDX Java Library
Java
2
star
52

tools-list

List of the known available tools in a machine readable format.
1
star
53

license-namespace-test

Test repository for the license namespace
1
star
54

spdx-testbed

Java
1
star
55

spdx-website

This repo contains all the assets used in the spdx.org website
1
star
56

licensegenplugin

Maven plugin for generating the license data from the license list XML repository
Java
1
star
57

DOCS

This is a repository for general documentation related to SPDX
1
star
58

spdx-3-build-profile

1
star
59

spdx-3-serialization-prototype-playground

TEMPORARY repo to contain different draft examples for SPDX 3.0 serializations
Python
1
star
60

licenseRequestImages

License Request Image Repository
1
star
61

license-mgmt

GSoC 2022 project for a web-based license management system
CSS
1
star
62

crypto-algorithms

List of cryptographic algorithms and their characteristics
1
star
63

rollup-plugin-spdx

TypeScript
1
star
64

using

Information on how to use the SPDX specification
1
star