• Stars
    star
    847
  • Rank 53,812 (Top 2 %)
  • Language
    Python
  • License
    Creative Commons ...
  • Created over 12 years ago
  • Updated about 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.

unitedstates/congress

This is a community-run project to develop Python tools to collect data about the bills, amendments, roll call votes, and other core data about the U.S. Congress into simple-to-use structured data files.

The tools include:

  • Downloading the official bulk bill status data from Congress, the official source of information on the life and times of legislation, and converting the data to an easier-to-use format.

  • Scrapers for House and Senate roll call votes.

  • A document fetcher for GovInfo.gov, which holds bill text, bill status, and other official documents, and which downloads only newly updated files.

  • A defunct THOMAS scraper for presidential nominations in Congress.

Read about the contents and schema in the documentation in the github project wiki.

This repository was originally developed by GovTrack.us and the Sunlight Foundation in 2013 (see Eric's blog post) and is currently maintained by GovTrack.us and other contributors. For more information about data in Congress, see the Congressional Data Coalition.

Setting Up

This project is tested using Python 3.

System dependencies

On Ubuntu, you'll need wget, pip, and some support packages:

sudo apt-get install git python3-dev libxml2-dev libxslt1-dev libz-dev python3-pip python3-venv

On OS X, you'll need developer tools installed (XCode), and wget.

brew install wget

Python dependencies

It's recommended you use a virtualenv (virtual environment) for development. Create a virtualenv for this project:

python3 -m venv env
source env/bin/activate

Finally, with your virtual environment activated, install the package, which will automatically pull in the Python dependencies:

pip install .

Collecting the data

The general form to start the scraping process is:

usc-run <data-type> [--force] [other options]

where data-type is one of:

To get data for bills, resolutions, and amendments, run:

usc-run govinfo --bulkdata=BILLSTATUS
usc-run bills

The bills script will output bulk data into a top-level data directory, then organized by Congress number, bill type, and bill number. Two data output files will be generated for each bill: a JSON version (data.json) and an XML version (data.xml).

Common options

Debugging messages are hidden by default. To include them, run with --log=info or --debug. To hide even warnings, run with --log=error.

To get emailed with errors, copy config.yml.example to config.yml and fill in the SMTP options. The script will automatically use the details when a parsing or execution error occurs.

The --force flag applies to all data types and supresses use of a cache for network-retreived resources.

Data Output

The script will cache downloaded pages in a top-level cache directory, and output bulk data in a top-level data directory.

Two bulk data output files will be generated for each object: a JSON version (data.json) and an XML version (data.xml). The XML version attempts to maintain backwards compatibility with the XML bulk data that GovTrack.us has provided for years. Add the --govtrack flag to get fully backward-compatible output using GovTrack IDs (otherwise the source IDs used for legislators is used).

See the project wiki for documentation on the output format.

Contributing

Pull requests with patches are awesome. Unit tests are strongly encouraged (example tests).

The best way to file a bug is to open a ticket.

Running tests

To run this project's unit tests:

./test/run

Public domain

This project is dedicated to the public domain. As spelled out in CONTRIBUTING:

The project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

Build Status

More Repositories

1

congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Python
1,927
star
2

contact-congress

Sending electronic written messages to members of Congress by reverse engineering their contact forms.
Python
630
star
3

python-us

A package for easily working with US and state metadata
Python
479
star
4

districts

GeoJSON and other shape files for the federal legislative districts of the US.
260
star
5

citation

Legal citation extractor, via command line, JavaScript, or HTTP. See a live example at:
JavaScript
213
star
6

images

Public domain photos of Members of the United States Congress
Python
173
star
7

congressional-record

A parser for the Congressional Record.
HTML
119
star
8

inspectors-general

Collecting reports from Inspectors General across the US federal government.
Python
104
star
9

uscode

A working parser for the US Code's hierarchy, and a work-in-progress parser for the full content.
Python
101
star
10

APIs

A Hub of US Government APIs
CSS
59
star
11

bill-nicknames

Table of popular nicknames and keywords for bills, curated manually.
56
star
12

uslaw.link

A legal citation resolver.
JavaScript
54
star
13

unitedstates.github.io

Simple homepage for this organization.
CSS
50
star
14

glossary

A glossary for the United States.
Ruby
42
star
15

acronym

A library of government acronyms
39
star
16

orgchart

An organization chart for the government of the United States.
37
star
17

federal_spending

Importer for US Spending data
Python
34
star
18

congress-votes-servo

Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.
HTML
33
star
19

data-seal

Data Seal is a lightweight, UELMA-compliant data authentication service.
HTML
32
star
20

licensing

Best practices language for making open government data "license-free".
HTML
27
star
21

rtyaml

All the annoying stuff we had to do to make YAML usable.
Python
27
star
22

congress-data

Legislative data from the congress repository
19
star
23

complaints

An index of formal complaint systems
17
star
24

wish-list

A wish list for this organization, open an Issue to discuss what we can add. Derived from a News Foo session.
16
star
25

domains

Organizing and publishing the web domains of the US federal government
16
star
26

petitions

White House petition crawler.
Python
15
star
27

data-releases

A listing of public data releases by federal agencies
15
star
28

BillMap

Utilities and applications for the FlatGov project by Demand Progress
JavaScript
14
star
29

legisworks-historical-statutes

Metadata and per-statute PDFs for the U.S. Statutes at Large through volume 64 (1789-1951).
Python
14
star
30

am_mem_law

Documentation & data for the Library of Congress American Memory Century of Lawmaking collection.
Python
12
star
31

agency-regions

A collection of data about how federal agencies divide their agency coverage geospatially
11
star
32

scotus-bound-volumes

11
star
33

chaplains

Text of prayers delivered by guest chaplains to House
Python
11
star
34

reports

Storage space for public US reports which need a place to go.
HTML
10
star
35

statements-of-administration-policy

An archive and scraper of White House Statements of Administration Policy
Python
9
star
36

nabors

Bill numbers for early American statutes based on Nabors's Legislative Reference Checklist book.
Python
8
star
37

congress-publish

Script to publish bill and amendment data as a JSON API.
Python
8
star
38

congress-calendar

A calendar of Congressional events, like committee meetings and votes
6
star
39

data-issues

(NO LONGER USED.)
3
star