• This repository has been archived on 03/Mar/2022
  • Stars
    star
    237
  • Rank 169,854 (Top 4 %)
  • Language
    HTML
  • License
    MIT License
  • Created over 7 years ago
  • Updated over 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Monitoring Facebook Political Ads

Facebook Political Ad Collector

This repository is no longer maintained and exists for archival purposes only.

For a more recent project on this topic, see Ad Observer.

See archival README information.

This is the source code behind our project to collect political ads on Facebook. You can browse the American ads we've collected at ProPublica, and the Australian ads over on the Guardian's website.

We're asking our readers to use this extension when they are browsing Facebook. While they are on Facebook a background script runs to collect ads they see. The extension shows those ads to users and asks them to decide whether or not a particular ad is political. Serverside, we use those ratings to train a naive bayes classifier that then automatically rates the other ads we've collected. The extension also asks the server for the most recent ads that the classifier thinks are political so that users can see political ads they haven't seen. We're careful to protect our user's privacy by not sending identifying information to our backend server.

We're open sourcing this project because we'd love your help. Collecting these ads is challenging, and the more eyes on the problem the better.

Run it on your own

You can find instructions on how to set up your own full-fledged version of the Facebook Political Ad Collector in INSTALLATION.md

There is an explanation of all the moving parts in ARCHITECTURE.md

Stories

Where We Need Your Help

In general, the project needs more tests. We've written a couple of tests for parsing the Facebook timeline in the extension directory, and a few for the tricky bits in the server, but any help here would be great!

Also, the rust backend needs a bit of love and care, and there is a bit of a mess in backend/server/src/server.rs that could use cleaning up.

Types of ads the collector doesn't collect

  • mobile ads
  • pre-roll, midstream video ads
  • video from ads in the stream
  • Instagram-only ads (Note that many ads are set to run on Facebook and Instagram with the same creative)

TODOs to Consider:

  • considering triggering the ad parsing routine only on scroll, to mitigate the clicking-off problems.
  • consider retaining utm params (i.e. a whitelisted set of parameters in links that are added by advertisers, not by FB and ipso facto are not personally identifiable, e.g. utm_content, etc., since those sometimes include useful metadata about the ad.)
  • consider turning off the panelist_ads table, etc.
  • consider seeding the partisanship model in new languages with political tweets.

More Repositories

1

upton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
HTML
1,614
star
2

guides

ProPublica's News App and Data Style Guides
1,163
star
3

compas-analysis

Data and analysis for 'Machine Bias'
Jupyter Notebook
600
star
4

weepeople

A typeface of people sillhouettes, to make it easy to build web graphics featuring little people instead of dots.
489
star
5

stateface

A typeface of U.S. state shapes to use in web apps.
HTML
359
star
6

timeline-setter

A tool to create HTML timelines from spreadsheets of events.
JavaScript
328
star
7

nyc-dna-software

The source code, acquired by ProPublica, for New York City's Forensic Statistical Tool.
C#
318
star
8

daybreak

A simple-dimple key value store for ruby.
HTML
236
star
9

sunlight-congress

The Sunlight Foundation's Congress API. Shut down on Oct. 1, 2017.
Ruby
169
star
10

landline

Simple SVG maps that work everywhere.
HTML
166
star
11

qis

Quick Instagram search tool
HTML
158
star
12

column-setter

Custom responsive grids in Sass that work in older browsers.
SCSS
130
star
13

Capitol-Words

Scraping, parsing and indexing the daily Congressional Record to support phrase search over time, and by legislator and date
Python
121
star
14

politwoops-tweet-collector

Python workers that collect tweets from the twitter streaming api and track deletions
Python
120
star
15

simple-tiles

Simple tile generation for maps.
C
106
star
16

django-collaborative

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.
Python
99
star
17

transcribable

Drop in crowdsourcing for your Rails app. Extracted from Free the Files.
Ruby
84
star
18

schooner-tk

A collection of (hopefully) useful utilities for working with satellite images.
C++
71
star
19

newsappmodel

Conceptual Model for News Applications
58
star
20

table-setter

Easy Peasy CSV to HTML
JavaScript
57
star
21

ilcampaigncash

Load Illinois political contribute and spending data efficiently
TSQL
57
star
22

congress-api-docs

Documentation for the ProPublica Congress API
HTML
54
star
23

campaign_cash

A Ruby client for interacting with ProPublica Campaign Finance API
Ruby
52
star
24

politwoops_sunlight

Politwoops web front end
CSS
44
star
25

data-institute-2019

Materials for the ProPublica Data Institute 2019
43
star
26

table-fu

A utility for spreadsheet-style handling of arrays (e.g. filtering, formatting, and sorting)
Ruby
35
star
27

fakenator

PHP
27
star
28

staffers

Interactive and searchable House staffer directory, based on House disbursement data.
HTML
26
star
29

data-institute-2018

For students of https://projects.propublica.org/graphics/ida-propublica-data-institute
26
star
30

vid-skim

Transcripts and commentary for long boring videos on YouTube!
Ruby
26
star
31

simpler-tiles

Ruby bindings for Simple Tiles
HTML
25
star
32

cookcountyjail2

A new version of the cook county jail scraper, inspired by the Supreme Chi-Town Coding Crew
HTML
23
star
33

disbursements

Data and scripts relating to the publishing of the House expenditure reports, and hopefully the Senate's in future.
Ruby
23
star
34

pixel-pong

Interface and data-collection backend for PixelPing.
JavaScript
22
star
35

propertyassessments

Analysis behind the "How the Cook County Assessor Failed Taxpayers"
R
22
star
36

data-nicar-2019

Nicar ML/NLP workshop by J Kao
Jupyter Notebook
19
star
37

thinner

Slow purges for varnish useful on app deploys.
Ruby
17
star
38

data-institute-2021

13
star
39

transcript-audio-sync

Tools for synchronizing audio and text on a webpage
JavaScript
12
star
40

pac-donor-similarity

Cosine similarity scores for PAC donors to federal candidates
10
star
41

capitol_words_nlp

Experimenting with parsing the congressional record using NLP techniques and tools
Python
9
star
42

redactor

Tool to remove email addresses, person entities, and phone numbers from a text
Python
9
star
43

auditData

data and scripts for https://projects.propublica.org/graphics/eitc-audit
R
9
star
44

il-tickets-notebooks

Explore Chicago ticket data.
Jupyter Notebook
9
star
45

il-ticket-loader

Load and analyze Chicago parking and camera ticket data
Jupyter Notebook
7
star
46

fbpac-api-public

API supporting more complex queries on the database of ads gathered by github.com/propublica/facebook-political-ads
Ruby
6
star
47

data-institute-2022

6
star
48

collaborative-playbook

5
star
49

northern-il-federal-gun-cases

Jupyter Notebook
5
star
50

d4dPartD-analysis

analysis of doctors' promotional payments from drug companies and their prescribing behavior
R
4
star
51

table-setter-generator

A rails generator for table-setter
JavaScript
4
star
52

institute-files

Data Institute Lessons
4
star
53

pentagon

CartoCSS
3
star
54

campaign-finance-api-docs

Documentation for campaign finance API
3
star
55

collaborative-playbook-pt

Collaborative Playbook in Portuguese
2
star
56

vital-signs-hackathon

1
star
57

political-ad-collector

web landing page for propublica's political ad collector
CSS
1
star