• Stars
    star
    120
  • Rank 295,924 (Top 6 %)
  • Language
    Python
  • License
    Other
  • Created about 12 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Python workers that collect tweets from the twitter streaming api and track deletions

Install Beanstalkd

http://kr.github.com/beanstalkd/download.html

Requires installing the libevent-dev package on apt-based systems.

Install Python dependencies

Install pip if you don't already have it then run:

pip install -r requirements.txt

Edit config file

First:

cp conf/tweets-client.ini.example conf/tweets-client.ini

In the [tweets-client] section, add your Twitter account's username and password. This account will be authenticated against to make all API requests.

In the [beanstalk] section, change "tweets_tube" and "screenshot_tube". The values don't matter much, they just need to be unique.

In the [database] section, update the "host", "port", "username", "password", and "database" sections with your own details, if the defaults are not appropriate.

In the [aws] section, add your access key, secret access key, bucket name, and any path prefix inside the bucket you want to use. This is for archiving images and screenshots of tweeted links.

Running

Run tweets-client.py to start streaming items from Twitter into the beanstalk queue. Append the lib directory to the PYTHONPATH, either persistently or as part of the command:

PYTHONPATH=$PYTHONPATH:`pwd`/lib ./bin/tweets-client.py

Then run politwoops-worker.py to start pulling the tweets out of beanstalk and loading them into MySQL:

PYTHONPATH=$PYTHONPATH:`pwd`/lib ./bin/politwoops-worker.py --images

Finally, if you ran politwoops-worker.py with the images option turned on, run screenshot-worker.py to grab screenshots of webpages and mirror images linked in tweets.

PYTHONPATH=$PYTHONPATH:`pwd`/lib ./bin/screenshot-worker.py

These three scripts all accept the following options:

  • --loglevel - Sets the verbosity of logging.
  • --output - Destination for log files.
  • --restart - Restart if the script encounters an error that cannot be handled.

More Repositories

1

upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
HTML
1,614
star
2

guides

ProPublica's News App and Data Style Guides
1,163
star
3

compas-analysis

Data and analysis for 'Machine Bias'
Jupyter Notebook
600
star
4

weepeople

A typeface of people sillhouettes, to make it easy to build web graphics featuring little people instead of dots.
489
star
5

stateface

A typeface of U.S. state shapes to use in web apps.
HTML
359
star
6

timeline-setter

A tool to create HTML timelines from spreadsheets of events.
JavaScript
328
star
7

nyc-dna-software

The source code, acquired by ProPublica, for New York City's Forensic Statistical Tool.
C#
318
star
8

facebook-political-ads

Monitoring Facebook Political Ads
HTML
237
star
9

daybreak

A simple-dimple key value store for ruby.
HTML
236
star
10

sunlight-congress

The Sunlight Foundation's Congress API. Shut down on Oct. 1, 2017.
Ruby
169
star
11

landline

Simple SVG maps that work everywhere.
HTML
166
star
12

qis

Quick Instagram search tool
HTML
158
star
13

column-setter

Custom responsive grids in Sass that work in older browsers.
SCSS
130
star
14

Capitol-Words

Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
Python
121
star
15

simple-tiles

Simple tile generation for maps.
C
106
star
16

django-collaborative

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.
Python
99
star
17

transcribable

Drop in crowdsourcing for your Rails app. Extracted from Free the Files.
Ruby
84
star
18

schooner-tk

A collection of (hopefully) useful utilities for working with satellite images.
C++
71
star
19

newsappmodel

Conceptual Model for News Applications
58
star
20

table-setter

Easy Peasy CSV to HTML
JavaScript
57
star
21

ilcampaigncash

Load Illinois political contribute and spending data efficiently
TSQL
57
star
22

congress-api-docs

Documentation for the ProPublica Congress API
HTML
54
star
23

campaign_cash

A Ruby client for interacting with ProPublica Campaign Finance API
Ruby
52
star
24

politwoops_sunlight

Politwoops web front end
CSS
44
star
25

data-institute-2019

Materials for the ProPublica Data Institute 2019
43
star
26

table-fu

A utility for spreadsheet-style handling of arrays (e.g. filtering, formatting, and sorting)
Ruby
35
star
27

fakenator

PHP
27
star
28

staffers

Interactive and searchable House staffer directory, based on House disbursement data.
HTML
26
star
29

data-institute-2018

For students of https://projects.propublica.org/graphics/ida-propublica-data-institute
26
star
30

vid-skim

Transcripts and commentary for long boring videos on YouTube!
Ruby
26
star
31

simpler-tiles

Ruby bindings for Simple Tiles
HTML
25
star
32

cookcountyjail2

A new version of the cook county jail scraper, inspired by the Supreme Chi-Town Coding Crew
HTML
23
star
33

disbursements

Data and scripts relating to the publishing of the House expenditure reports, and hopefully the Senate's in future.
Ruby
23
star
34

pixel-pong

Interface and data-collection backend for PixelPing.
JavaScript
22
star
35

propertyassessments

Analysis behind the "How the Cook County Assessor Failed Taxpayers"
R
22
star
36

data-nicar-2019

Nicar ML/NLP workshop by J Kao
Jupyter Notebook
19
star
37

thinner

Slow purges for varnish useful on app deploys.
Ruby
17
star
38

data-institute-2021

13
star
39

transcript-audio-sync

Tools for synchronizing audio and text on a webpage
JavaScript
12
star
40

pac-donor-similarity

Cosine similarity scores for PAC donors to federal candidates
10
star
41

capitol_words_nlp

Experimenting with parsing the congressional record using NLP techniques and tools
Python
9
star
42

redactor

Tool to remove email addresses, person entities, and phone numbers from a text
Python
9
star
43

auditData

data and scripts for https://projects.propublica.org/graphics/eitc-audit
R
9
star
44

il-tickets-notebooks

Explore Chicago ticket data.
Jupyter Notebook
9
star
45

il-ticket-loader

Load and analyze Chicago parking and camera ticket data
Jupyter Notebook
7
star
46

fbpac-api-public

API supporting more complex queries on the database of ads gathered by github.com/propublica/facebook-political-ads
Ruby
6
star
47

data-institute-2022

6
star
48

collaborative-playbook

5
star
49

northern-il-federal-gun-cases

Jupyter Notebook
5
star
50

d4dPartD-analysis

analysis of doctors' promotional payments from drug companies and their prescribing behavior
R
4
star
51

table-setter-generator

A rails generator for table-setter
JavaScript
4
star
52

institute-files

Data Institute Lessons
4
star
53

pentagon

CartoCSS
3
star
54

campaign-finance-api-docs

Documentation for campaign finance API
3
star
55

collaborative-playbook-pt

Collaborative Playbook in Portuguese
2
star
56

vital-signs-hackathon

1
star
57

political-ad-collector

web landing page for propublica's political ad collector
CSS
1
star