• Stars
    star
    213
  • Rank 185,410 (Top 4 %)
  • Language
    JavaScript
  • License
    Creative Commons ...
  • Created over 12 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Legal citation extractor, via command line, JavaScript, or HTTP. See a live example at:

Citation

Build Status

A fast, stand-alone legal citation extractor.

Currently supports:

  • usc: US Code
  • law: US Slip Laws (public and private laws)
  • stat: US Statutes at Large
  • cfr: US Code of Federal Regulations
  • dc_code: DC Code
  • dc_register: DC Register
  • dc_law: DC Slip Law

With limited, opt-in support for:

As you can see, Citation is currently US-only, but we'd love for that to change. There are lots more citation types out there, and it's easy to contribute, so please help us grow!

Compatible in-browser with modern browsers, including IE 9+.

Use

Citation can be used:

  1. In JavaScript, in browser or in Node. This method supports the most options, including passing in JavaScript functions as callbacks.
  2. Over HTTP, via GET or POST. Supports JSON and JSONP. Options require function callbacks are not supported (it won't eval JavaScript).
  3. On the command line or Unix pipes, over STDOUT. Options require function callbacks are not supported (it won't eval JavaScript).

But one way or another, you pass in text:

Citation.find("pursuant to 5 U.S.C. 552(a)(1)(E) and");

And you get back data about matched citations:

[{
  "match": "5 U.S.C. 552(a)(1)(E)",
  "citation": "5 U.S.C. 552(a)(1)(E)",
  "type": "usc",
  "index": "0",
  "usc": {
    "title": "5",
    "section": "552",
    "subsections": ["a", "1", "E"],
    "id": "usc/5/552/a/1/E",
    "section_id": "usc/5/552"
  }
}]

In the browser

Check out /browser for browser-ready compressed and uncompressed versions of the library.

Loading any of them with a <script> tag will result in a global Citation object being available for immediate use.

In Node

Install Node.js and NPM, then install Citation globally (may require sudo):

npm install -g citation

Or install it locally to a node_modules directory with npm install citation.

JavaScript API

Citation.find(text, options)

Check a block of text for citations of a given type, returning an array of matches with citations broken out into fields.

options can include:

  • types: (string | string array) Limit citation types to those given. e.g. ["usc", "law"]
  • excerpt: (integer) Return an excerpt of the surrounding text for each detected cite, with the given number of characters on either side.
  • parents: (boolean) For any cite, return any "parent" cites alongside it. For example, matching "5 USC 552(b)(3)" would return 3 results - one for the parent section, one for (b), and one for (b)(3).
  • filter: (string) Enable Filtering.
  • replace: (function | object) Enable Replacement.
  • links: (boolean) Include Links.
  • Also: see Cite-specific options to pass in options for a particular citation type.

Some examples:

Citation.find("pursuant to 5 U.S.C. 552(a)(1)(E) and");

// Yields:

[{
  "match": "5 U.S.C. 552(a)(1)(E)",
  "citation": "5 U.S.C. 552(a)(1)(E)",
  "type": "usc",
  "index": "0",
  "usc": {
    "title": "5",
    "section": "552",
    "subsections": ["a", "1", "E"],
    "id": "usc/5/552/a/1/E",
    "section_id": "usc/5/552"
  }
}]

Citation.find("that term in section 5362(5) of title 31, United States Code.", {
  excerpt: 10
})

// Yields:

[{
  "match": "section 5362(5) of title 31",
  "citation": "31 U.S.C. 5362(5)",
  "excerpt": "t term in section 5362(5) of title 31, United S",
  // ... more details ...
}]

HTTP API

Start the API on a given port (defaults to 3000):

cite-server [port]

GET or POST to /citation/find with a text parameter:

curl http://localhost:3000/citation/find?text=5+U.S.C.+552%28a%29%281%29%28E%29

curl -XPOST "http://localhost:3000/citation/find" -d "text=5 U.S.C. 552(a)(1)(E)"

Will return the results of running Citation.find() on the block of text, under a results key:

{
  "results": [
    {
      "match": "5 U.S.C. 552(a)(1)(E)",
      "citation": "5 U.S.C. 552(a)(1)(E)",
      "type": "usc",
      "index": "0",
      "usc": {
        "title": "5",
        "section": "552",
        "subsections": ["a", "1", "E"],
        "id": "usc/5/552/a/1/E",
        "section_id": "usc/5/552"
      }
    }
  ]
}

Supported options

Some HTTP-specific parameters:

  • callback: a function name to use as a JSONP callback.
  • pretty: prettify (indent) output.

And some of the options that the JavaScript API supports:

  • text: required, text to extract citations from.
  • options[excerpt]: include excerpts with up to this many characters around it.
  • options[types]: limit citation types to a comma-separated list (e.g. "usc,law")

Server deployment

See etc/ for an example upstart script to keep cite-server running in production.

Command line API

The shell command can accept a string to parse as an argument or through STDIN, and outputs results to STDOUT as indented JSON.

cite "section 5362(5) of title 31"

echo "section 5362(5) of title 31" | cite

cite "pursuant to 5 U.S.C. 552(a)(1)(E) and > results.json"

Options

Pass any options the library takes, using dot operators to pass nested options.

For example, searching among types:

cite --types=usc,law "section 5362(5) of title 31"

Passing nested options:

cite --dc_code.source=dc_code "and then Β§ 3-101.01 happened"

Opt-in to using walverine to search judicial cites with --judicial:

cite --judicial "Smith v. Hardibble, 111 Cal.2d 222, 555, 558, 333 Cal.3d 444 (1988)"

Add --links to include links in the output.

Filters

Instead of treating the input text as just a blob of text that matches citations at a string index, you can apply a "filter" that will parse the input text and provide more precise context.

Lines

For each citation, return the line number and the relative character index of the match inside that line.

Example:

cite --pretty --filter=lines "I once met a cite named nancy
whose 5 usc 552 was awfully fancy
and then the poem ended"
{
  "citations": [
    {
      "type": "usc",
      "match": "5 usc 552",
      "index": 6,
      "citation": "5 U.S.C. 552",
      "usc": {
        "title": "5",
        "section": "552",
        "subsections": [],
        "id": "usc/5/552"
      },
      "line": 2
    }
  ]
}
XPath

For each citation, return an XPath statement identifying the match's specific node in the input document, and the relative character index of the match inside that node.

Example:

cite --pretty --filter=xpath_xml "
<?xml>
<document>
  <title>Best Bill of 2012</title>
  <bill>
    <introduction>Bill to enforce happiness amongst all the children</introduction>
    <closing>All information releasable through 5 U.S.C. 552 is now banned</closing>
    <footer>(c) Congress</footer>
  </bill>
</document>
"
{
  "citations": [
    {
      "type": "usc",
      "match": "5 U.S.C. 552",
      "index": 35,
      "citation": "5 U.S.C. 552",
      "usc": {
        "title": "5",
        "section": "552",
        "subsections": [],
        "id": "usc/5/552"
      },
      "xpath": "/document[1]/bill[1]/closing[1]/text()[1]"
    }
  ]
}

Replacement

You can perform a "find-and-replace" with detected citations, by providing a replace callback to be executed on each citation, that returns the string to replace that citation.

By passing a replace callback, a text field will be included at the top of the returned object, with the processed text.

Citation.find("click on 5 USC 552 to read more", {
  replace: function(cite) {
    var url = "http://www.law.cornell.edu/uscode/text/" + cite.usc.title + "/" + cite.usc.section;
    return "<a href=\"" + url + "\">" + cite.match + "</a>";
  };
});

The response will have a text field containing:

click on <a href="http://www.law.cornell.edu/uscode/text/5/552">5 USC 552</a> to read more

This feature is only available in the JavaScript API.

Include Links

With the links option, each matched citation will include URLs to access the content of the citation on the web. For:

Citation.find("pursuant to 5 U.S.C. 552(a)(1)(E) and", { links: true });

you will get back an extended object with permalinks:

[{
  "match": "5 U.S.C. 552(a)(1)(E)",
  "type": "usc",
  ...
  "usc": {
    "id": "usc/5/552/a/1/E",
    ...
    "links": {
      "usgpo": {
        "source": {
          "name": "U.S. Government Publishing Office",
          "abbreviation": "US GPO",
          "link": "http://www.gpo.gov",
          "authoritative": true,
          "note": "2014 edition. Sub-section citation is not reflected in the link."
        },
        "pdf": "http://api.fdsys.gov/link?collection=uscode&year=2014&title=5&section=552&type=usc",
        "html": "http://api.fdsys.gov/link?collection=uscode&year=2014&title=5&section=552&type=usc&link-type=html",
        "landing": "http://api.fdsys.gov/link?collection=uscode&year=2014&title=5&section=552&type=usc&link-type=contentdetail"
      },
      "cornell_lii": {
        "source": {
          "name": "Cornell Legal Information Institute",
          "abbreviation": "Cornell LII",
          "link": "https://www.law.cornell.edu/uscode/text",
          "authoritative": false,
          "note": "Link is to most current version of the US Code, as available at law.cornell.edu."
        },
        "landing": "https://www.law.cornell.edu/uscode/text/5/552#a_1_E"
      }
    }
  }
}]

The links object maps sources to one or more renditions. The rendition types are pdf, html (for raw HTML content), landing for a landing page (i.e. a website) about the document refered to by the citation, and mods (US GPO MODS XML files).

Cite-specific options

You can pass arbitrary options to individual citators, if that citator supports them.

By using a key is the key of a citator, e.g. usc or dc_code, that citator's processors will get the value of that key passed in as an argument.

Example: DC Code relative cites

For example, the dc_code citator accepts a source option, to indicate what the text source is. If the value of source is itself "dc_code", then the citator will apply a looser pattern to detect internal cites.

That looks like this:

Citation.find("required under Β§ 3-101.01(13)(e), the Commission shall perform the", {
  dc_code: {source: "dc_code"}
})

That will match Β§ 3-101.01(13)(e), because the dc_code citator assumes it's processing the text of the DC Code itself, and internal references are unambiguous.

Court opinions

Citation can integrate with walverine to detect and return results for US court opinions.

To use walverine, you may need to "opt-in" to including judicial-type citations.

In JavaScript:

Citation.types.judicial = require("./citations/judicial");

In CLI:

cite --judicial "Text to scan"

The HTTP server, cite-server actually loads judicial cites by default, since the performance penalty is absorbed on start-up.

Unsupported features

walverine's support for extra features is limited. When detecting judicial-type cites, there is no support for:

  • Returning parent citations
  • Replacing detected text
  • A character index of detected citations

Tests

This project is tested with nodeunit.

To run tests, you'll need to install this project from source and install its node dependencies:

git clone [email protected]:unitedstates/citation.git
cd citation
npm install
npm test

Test cases are stored in the test directory. Each test case covers a subsection of the code and ensures that citations are correctly detected: for instance, see test/stat.js.

To run all tests:

nodeunit test

To run a specific test:

nodeunit test/usc.js

Public domain

This project is dedicated to the public domain. As spelled out in CONTRIBUTING:

The project is in the public domain within the United States, and copyright and related rights in the work worldwide are waived through the CC0 1.0 Universal public domain dedication.

All contributions to this project will be released under the CC0 dedication. By submitting a pull request, you are agreeing to comply with this waiver of copyright interest.

More Repositories

1

congress-legislators

Members of the United States Congress, 1789-Present, in YAML/JSON/CSV, as well as committees, presidents, and vice presidents.
Python
1,927
star
2

congress

Public domain data collectors for the work of Congress, including legislation, amendments, and votes.
Python
847
star
3

contact-congress

Sending electronic written messages to members of Congress by reverse engineering their contact forms.
Python
630
star
4

python-us

A package for easily working with US and state metadata
Python
479
star
5

districts

GeoJSON and other shape files for the federal legislative districts of the US.
260
star
6

images

Public domain photos of Members of the United States Congress
Python
173
star
7

congressional-record

A parser for the Congressional Record.
HTML
119
star
8

inspectors-general

Collecting reports from Inspectors General across the US federal government.
Python
104
star
9

uscode

A working parser for the US Code's hierarchy, and a work-in-progress parser for the full content.
Python
101
star
10

APIs

A Hub of US Government APIs
CSS
59
star
11

bill-nicknames

Table of popular nicknames and keywords for bills, curated manually.
56
star
12

uslaw.link

A legal citation resolver.
JavaScript
54
star
13

unitedstates.github.io

Simple homepage for this organization.
CSS
50
star
14

glossary

A glossary for the United States.
Ruby
42
star
15

acronym

A library of government acronyms
39
star
16

orgchart

An organization chart for the government of the United States.
37
star
17

federal_spending

Importer for US Spending data
Python
34
star
18

congress-votes-servo

Tracking changes to the official U.S. House and Senate roll call votes XML data files. Monitored hourly-ish by @GovTrack/@JoshData.
HTML
33
star
19

data-seal

Data Seal is a lightweight, UELMA-compliant data authentication service.
HTML
32
star
20

licensing

Best practices language for making open government data "license-free".
HTML
27
star
21

rtyaml

All the annoying stuff we had to do to make YAML usable.
Python
27
star
22

congress-data

Legislative data from the congress repository
19
star
23

complaints

An index of formal complaint systems
17
star
24

wish-list

A wish list for this organization, open an Issue to discuss what we can add. Derived from a News Foo session.
16
star
25

domains

Organizing and publishing the web domains of the US federal government
16
star
26

petitions

White House petition crawler.
Python
15
star
27

data-releases

A listing of public data releases by federal agencies
15
star
28

BillMap

Utilities and applications for the FlatGov project by Demand Progress
JavaScript
14
star
29

legisworks-historical-statutes

Metadata and per-statute PDFs for the U.S. Statutes at Large through volume 64 (1789-1951).
Python
14
star
30

am_mem_law

Documentation & data for the Library of Congress American Memory Century of Lawmaking collection.
Python
12
star
31

agency-regions

A collection of data about how federal agencies divide their agency coverage geospatially
11
star
32

scotus-bound-volumes

11
star
33

chaplains

Text of prayers delivered by guest chaplains to House
Python
11
star
34

reports

Storage space for public US reports which need a place to go.
HTML
10
star
35

statements-of-administration-policy

An archive and scraper of White House Statements of Administration Policy
Python
9
star
36

nabors

Bill numbers for early American statutes based on Nabors's Legislative Reference Checklist book.
Python
8
star
37

congress-publish

Script to publish bill and amendment data as a JSON API.
Python
8
star
38

congress-calendar

A calendar of Congressional events, like committee meetings and votes
6
star
39

data-issues

(NO LONGER USED.)
3
star