Jacob Fenton (@jsfenfen)

Top repositories

1

990-xml-reader

IRSx: Turn the IRS' versioned XML 990 nonprofit annual tax returns into standardized python objects, json, or human readable text with original line number and description.
Python
118
star
2

whatwordwhere

Tooling to extract data from scanned paper forms OCR-ed by Tesseract using the HOCR standard.
HTML
84
star
3

parsing-prickly-pdfs

NICAR 2016 talk about PDFs!
62
star
4

covid_hospitals_demographics

COVID-19 relevant data on hospital location / capacity, nursing home location / capacity, county demographics
HTML
24
star
5

990-xml-database

Django app to consume and store 990 data and metadata
Python
22
star
6

pdf17

nicar 17: advanced pdf manipulation
17
star
7

irsx_cookbook

IRSX Cookbook
Jupyter Notebook
16
star
8

pdf_bbox_utils

Helpers to create .csv files of word-level bounding boxes from text-based pdfs, or from hocr output.
Python
7
star
9

990-xml-metadata

metadata describing the 990 xml release, to be used by 990-xml-reader and related projects
7
star
10

plpython_textmatch

Add some fuzzy string match operations to postgreSQL
7
star
11

pdf20

Advanced PDF manipulation with pdfplumber for NICAR 2020 / New Orleans
Jupyter Notebook
6
star
12

doc-wrangler

Noodle with document cloud
Python
5
star
13

texas_rrc

some railroad commission oil / gas production files
5
star
14

reconcile-legislators

Test open refine reconciliation service to match legislators names
Python
5
star
15

paper_fec

Parse the OCR'ed paper FEC filings (as well as the electronic ones)
Python
5
star
16

nicar-nonprofit-datarelease

Documentation for nonprofit data released at NICAR 2020
5
star
17

easy-stats-113

Data from the census bureau's "easy stats" site--the first available on the 113th Congress.
Python
4
star
18

freefcc

Python
4
star
19

house_disbursements

muck with sunlight house disbursement csvs
Python
3
star
20

senate_disbursements

process--partially--the senate clerk's report on spending.
Python
2
star
21

inspectfile

like inspectdb, but for files
Python
2
star
22

irs_527

proces 527 data to csvs
Python
2
star
23

legacy_0809_acs_exporter

Legacy export of ACS processing from 2008 3-year ACS for R and PostgreSQL
Python
2
star
24

990-xml-admin

Keep tabs on 990 filings
HTML
1
star
25

fec_ftp

another bucket of scripts for grabbing the fec's ftp data etc for django + postgres
Python
1
star