• Stars
    star
    108
  • Rank 321,259 (Top 7 %)
  • Language
    Python
  • License
    Other
  • Created almost 8 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

PyPI generator, backed entirely by static files

dumb-pypi

Build Status PyPI version

dumb-pypi is a simple read-only PyPI index server generator, backed entirely by static files. It is ideal for internal use by organizations that have a bunch of their own packages which they'd like to make available.

You can view an example generated repo.

A rant about static files (and why you should use dumb-pypi)

The main difference between dumb-pypi and other PyPI implementations is that dumb-pypi has no server component. It's just a script that, given a list of Python package names, generates a bunch of static files which you can serve from any webserver, or even directly from S3.

There's something magical about being able to serve a package repository entirely from a tree of static files. It's incredibly easy to make it fast and highly-available when you don't need to worry about running a bunch of application servers (which are serving a bunch of read-only queries that could have just been pre-generated).

Linux distributions have been doing this right for decades. Debian has a system of hundreds of mirrors, and the entire thing is powered entirely by some fancy rsync commands.

For the maintainer of a PyPI repositry, dumb-pypi has some nice properties:

  • File serving is extremely fast. nginx can serve your static files faster than you'd ever need. In practice, there are almost no limits on the number of packages or number of versions per package.

  • It's very simple. There's no complicated WSGI app to deploy, no databases, and no caches. You just need to run the script whenever you have new packages, and your index server is ready in seconds.

For more about why this design was chosen, see the detailed RATIONALE.md in this repo.

Usage

To use dumb-pypi, you need two things:

  • A script which generates the index. (That's this project!)

  • A generic webserver to serve the generated index.

    This part is up to you. For example, you might sync the built index into an S3 bucket, and serve it directly from S3. You might run nginx from the built index locally.

My recommended high-availability (but still quite simple) deployment is:

  • Store all of the packages in S3.

  • Have a cronjob (or equivalent) which rebuilds the index based on the packages in S3. This is incredibly fastโ€”it would not be unreasonable to do it every sixty seconds. After building the index, sync it into a separate S3 bucket.

  • Have a webserver (or set of webservers behind a load balancer) running nginx (with the config provided below), with the source being that second S3 bucket.

Generating static files

First, install dumb-pypi somewhere (e.g. into a virtualenv).

By design, dumb-pypi does not require you to have the packages available when building the index. You only need a list of filenames, one per line. For example:

dumb-init-1.1.2.tar.gz
dumb_init-1.2.0-py2.py3-none-manylinux1_x86_64.whl
ocflib-2016.10.31.0.40-py2.py3-none-any.whl
pre_commit-0.9.2.tar.gz

You should also know a URL to access these packages (if you serve them from the same host as the index, it can be a relative URL). For example, it might be https://my-pypi-packages.s3.amazonaws.com/ or ../../pool/.

You can then invoke the script:

$ dumb-pypi \
    --package-list my-packages \
    --packages-url https://my-pypi-packages.s3.amazonaws.com/ \
    --output-dir my-built-index

The built index will be in my-built-index. It's now up to you to figure out how to serve that with a webserver (nginx is a good option โ€” details below!).

Additional options for packages

You can extend the capabilities of your registry using the extended JSON input syntax when providing your package list to dumb-pypi. Instead of using the format listed above of one filename per line, format your file with one JSON object per line, like this:

{"filename": "dumb-init-1.1.2.tar.gz", "hash": "sha256=<hash>", "requires_python": ">=3.6", "uploaded_by": "ckuehl", "upload_timestamp": 1512539924, "yanked_reason": null, "core_metadata": "sha256=<hash>"}
Key Required? Description
filename Yes Name of the file
hash No Hash of the file in the format <hashalgo>=<hashvalue>
requires_python No Python requirement string for the package (PEP345)
core_metadata No Either string "true" or a string in the format <hashalgo>=<hashvalue> to indicate metadata is available for this file by appending .metadata to the file URL (PEP658, PEP714)
uploaded_by No Freeform text to indicate an uploader of the package; only shown on web UI
upload_timestamp No UNIX timestamp to indicate upload time of the package
yanked_reason No Freeform text to indicate the package is yanked for the given reason (PEP592)
requires_dist No (Deprecated) Array of requires_dist dependencies (PEP345), used only in the JSON API; consider using core_metadata instead

The filename key is required. All other keys are optional and will be used to provide additional information in your generated repository. This extended information can be useful to determine, for example, who uploaded a package. (Most of this information is useful in the web UI by humans, not by pip.)

Where should you get information about the hash, uploader, etc? That's up to youโ€”dumb-pypi isn't in the business of storing or calculating this data. If you're using S3, one easy option is to store it at upload time as S3 metadata.

Partial rebuild support

If you want to avoid rebuilding your entire registry constantly, you can pass the --previous-package-list (or --previous-package-list-json) argument to dumb-pypi, pointing to the list you used the last time you called dumb-pypi. Only the files relating to changed packages will be rebuilt, saving you time and unnecessary I/O.

The previous package list json is available in the output as packages.json.

Recommended nginx config

You can serve the packages from any static webserver (including directly from S3), but for compatibility with old versions of pip, it's necessary to do a tiny bit of URL rewriting (see RATIONALE.md for full details about the behavior of various pip versions).

In particular, if you want to support old pip versions, you need to apply this logic to package names (taken from PEP 503):

def normalize(name):
    return re.sub(r'[-_.]+', '-', name).lower()

Here is an example nginx config which supports all versions of pip and easy_install:

server {
    location / {
        root /path/to/index;
        set_by_lua $canonical_uri "return string.gsub(string.lower(ngx.var.uri), '[-_.]+', '-')";
        try_files $uri $uri/index.html $canonical_uri $canonical_uri/index.html =404;
    }
}

If you don't care about easy_install or versions of pip prior to 8.1.2, you can omit the canonical_uri hack.

Using your deployed index server with pip

When running pip, pass -i https://my-pypi-server/simple or set the environment variable PIP_INDEX_URL=https://my-pypi-server/simple.

Known incompatibilities with public PyPI

We try to maintain compatibility with the standard PyPI interface, but there are some incompatibilities currently which are hard to fix due to dumb-pypi's design:

  • While both JSON API endpoints are supported, many keys in the JSON API are not present since they require inspecting packages which dumb-pypi can't do. Some of these, like requires_python and requires_dist, can be passed in as JSON.

  • The per-version JSON API endpoint only includes data about the current requested version and not all versions, unlike public PyPI. In other words, if you access /pypi/<package>/1.0.0/json, you will only see the 1.0.0 release under the releases key and not every release ever made. The regular non-versioned API route (/pypi/<package>/json) will have all releases.

Contributing

Thanks for contributing! To get started, run make venv and then . venv/bin/activate to source the virtualenv. You should now have a dumb-pypi command on your path using your checked-out version of the code.

To run the tests, call make test. To run an individual test, you can do pytest -k name_of_test tests (with the virtualenv activated).

More Repositories

1

fluffy

A file-sharing web app that doesn't suck.
Python
166
star
2

rustenv

Virtual, activate-able environments for Rust
Python
83
star
3

docker-storage-benchmark

Docker storage driver benchmarks (last updated October 2017)
Python
53
star
4

python3.6-debian-stretch

Python 3.6 backport for Debian stretch
Makefile
48
star
5

pypi-browser

PyPI package browsing web application
Python
37
star
6

dotfiles

My dotfiles.
Vim Script
35
star
7

puppet-pre-commit-hooks

pre-commit hooks for Puppet projects
Ruby
29
star
8

pygments-ansi-color

ANSI color-code highlighting for Pygments
Python
25
star
9

shipit

HTML
22
star
10

nacl.js

JavaScript implementation of the NaCl curve25519xsalsa20poly1305 crypto algorithm
JavaScript
9
star
11

mapman

nmap auto-config of Rackspace Cloud Monitoring
Python
6
star
12

browseright

iPad app for teaching students to be responsible digital citizens
JavaScript
5
star
13

ctf

Python
5
star
14

pre-commit-stats

Python
5
star
15

slack-asyncio

don't use this (yet)!
Python
4
star
16

checkup.py

Simple Python scripts for monitoring website uptime.
Python
4
star
17

dumb-init-dockers

dumb-init docker images
Python
4
star
18

magento-clone

Scripts for cloning a Magento installation over SSH in one step
Python
3
star
19

tap2tap

simple L2 point-to-point VPN
C
3
star
20

linux-wheels

Python
3
star
21

backup-tube

Java application for maintaining copies of a YouTube channel
Java
3
star
22

trenders.org

Online stock market sim for students & teachers
Groovy
3
star
23

nineteen

Simple control panel in Grails for Linux web server administration.
CSS
2
star
24

jacketeer-app

iPad app for collecting signatures, superlatives, and more from students
Objective-C
2
star
25

nodeenv

nodeenv Debian packaging
Python
2
star
26

banshee2itunes

Easily export a Banshee library to iTunes
Python
2
star
27

lazy-build

Remotely cache build artifacts based on file hashes
Python
2
star
28

musicman

Terrible music library mangler, exporting to Banshee and Android
Python
1
star
29

example-rust-pre-commit-hook

Rust
1
star
30

docker-tracker

Track Docker containers launched by processes
Python
1
star
31

jacketeer-distribution

iPad app for distributing yearbooks with signature collection.
Objective-C
1
star
32

fluffy-code

A developer-friendly code rendering library for Python web applications
Python
1
star
33

codedebt.io

Python
1
star
34

rescomp-dragon

Fire-breathing slayer of ResComp authentication portals
Objective-C
1
star
35

autopkgtest

Python
1
star
36

etcd-debian-bin

Binary-only Debian packages of etcd
Makefile
1
star
37

cargo-docserve

Cargo subcommand for serving docs with an embedded webserver
Rust
1
star
38

denon-receiver-control

Python
1
star