• Stars
    star
    62
  • Rank 488,249 (Top 10 %)
  • Language
  • License
    BSD 2-Clause "Sim...
  • Created over 8 years ago
  • Updated over 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

NICAR 2016 talk about PDFs!

More Repositories

1

990-xml-reader

IRSx: Turn the IRS' versioned XML 990 nonprofit annual tax returns into standardized python objects, json, or human readable text with original line number and description.
Python
118
star
2

whatwordwhere

Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
HTML
84
star
3

covid_hospitals_demographics

COVID-19 relevant data on hospital location / capacity, nursing home location / capacity, county demographics
HTML
24
star
4

990-xml-database

Django app to consume and store 990 data and metadata
Python
22
star
5

pdf17

nicar 17: advanced pdf manipulation
17
star
6

irsx_cookbook

IRSX Cookbook
Jupyter Notebook
16
star
7

pdf_bbox_utils

Helpers to create .csv files of word-level bounding boxes from text-based pdfs, or from hocr output.
Python
7
star
8

990-xml-metadata

metadata describing the 990 xml release, to be used by 990-xml-reader and related projects
7
star
9

plpython_textmatch

Add some fuzzy string match operations to postgreSQL
7
star
10

pdf20

Advanced PDF manipulation with pdfplumber for NICAR 2020 / New Orleans
Jupyter Notebook
6
star
11

doc-wrangler

Noodle with document cloud
Python
5
star
12

texas_rrc

some railroad commission oil / gas production files
5
star
13

reconcile-legislators

Test open refine reconciliation service to match legislators names
Python
5
star
14

paper_fec

Parse the OCR'ed paper FEC filings (as well as the electronic ones)
Python
5
star
15

nicar-nonprofit-datarelease

Documentation for nonprofit data released at NICAR 2020
5
star
16

easy-stats-113

Data from the census bureau's "easy stats" site--the first available on the 113th Congress.
Python
4
star
17

freefcc

Python
4
star
18

house_disbursements

muck with sunlight house disbursement csvs
Python
3
star
19

senate_disbursements

process--partially--the senate clerk's report on spending.
Python
2
star
20

inspectfile

like inspectdb, but for files
Python
2
star
21

irs_527

proces 527 data to csvs
Python
2
star
22

legacy_0809_acs_exporter

Legacy export of ACS processing from 2008 3-year ACS for R and PostgreSQL
Python
2
star
23

990-xml-admin

Keep tabs on 990 filings
HTML
1
star
24

fec_ftp

another bucket of scripts for grabbing the fec's ftp data etc for django + postgres
Python
1
star