• Stars
    star
    119
  • Rank 297,930 (Top 6 %)
  • Language
    Python
  • License
    MIT License
  • Created over 6 years ago
  • Updated 20 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Automated training for Privacy Badger. Badger Sett automates browsers to visit websites to produce fresh Privacy Badger tracker data.

Badger Sett

A sett or set is a badger's den which usually consists of a network of tunnels and numerous entrances. Setts incorporate larger chambers used for sleeping or rearing young.

This script is designed to raise young Privacy Badgers by teaching them about the trackers on popular sites. Every day, crawler.py visits thousands of the top sites from the Tranco List with the latest version of Privacy Badger, and saves its findings in results.json.

See the following EFF.org blog post for more information: Giving Privacy Badger a Jump Start.

Setup

  1. Prerequisites: have docker installed. Make sure your user is part of the docker group so that you can build and run docker images without sudo. You can add yourself to the group with

    $ sudo usermod -aG docker $USER
    
  2. Clone the repository

    $ git clone https://github.com/efforg/badger-sett
    
  3. Run a scan

    $ ./runscan.sh
    

    This will run a scan with the latest version of Privacy Badger's master branch and won't commit the results.

    To run the script with a different branch of privacy badger, set the PB_BRANCH variable. e.g.

    $ PB_BRANCH=my-feature-branch ./runscan.sh
    

    You can also pass arguments to crawler.py, the python script that does the actual crawl. Any arguments passed to runscan.sh will be forwarded to crawler.py. To control the number of sites that the crawler visits, use the --num-sites argument (the default is 2000). For example:

    $ ./runscan.sh --num-sites 10
    

    To exclude any sites with a given top level domain from the scan, pass in the --exclude argument followed by the TLD suffix you want to exclude. For example, if you wanted to exclude all sites with a .gov TLD:

    $ ./runscan.sh --exclude .gov
    

    To exclude multiple TLDs from a scan, pass in each TLD separated by a comma, with no space between. For example, if you wanted to exclude all sites with .org and .net TLDs:

    $ ./runscan.sh --exclude .org,.net
    

    You can load another extension to run in parallel to Privacy Badger during a scan. Use the --load-extension flag and pass along the filepath for the .crx or .xpi file that you want to load. For example:

    $ ./runscan.sh --load-extension parallel-extensions/ublock.crx
    
  4. Monitor the scan

    To have the scan print verbose output about which sites it's visiting, use the --log-stdout argument.

    If you don't use that argument, all output will still be logged to docker-out/log.txt, beginning after the script outputs "Running scan in Docker..."

Automatic crawling

To set up the script to run periodically and automatically update the repository with its results:

  1. Create a new ssh key with ssh-keygen. Give it a name unique to the repository.

    $ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/USER/.ssh/id_rsa): /home/USER/.ssh/id_rsa_badger_sett
    
  2. Add the new key as a deploy key with R/W access to the repo on Github. https://developer.github.com/v3/guides/managing-deploy-keys/

  3. Add a SSH host alias for Github that uses the new key pair. Create or open ~/.ssh/config and add the following:

    Host github-badger-sett
      HostName github.com
      User git
      IdentityFile /home/USER/.ssh/id_rsa_badger_sett
    
  4. Configure git to connect to the remote over SSH. Edit .git/config:

    [remote "origin"]
      url = ssh://git@github-badger-sett:/efforg/badger-sett
    

    This will have git connect to the remote using the new SSH keys by default.

  5. Create a cron job to call runscan.sh once a day. Set the environment variable RUN_BY_CRON=1 to turn off TTY forwarding to docker run (which would break the script in cron), and set GIT_PUSH=1 to have the script automatically commit and push results.json when the scan finishes. Here's an example crontab entry:

    0 0 * * *  RUN_BY_CRON=1 GIT_PUSH=1 /home/USER/badger-sett/runscan.sh
    
  6. If everything has been set up correctly, the script should push a new version of results.json after each crawl. Soon, whenever you make a new version of Privacy Badger, it will pull the latest version of the crawler's data and ship it with the new version of the extension.

More Repositories

1

https-everywhere

A browser extension that encrypts your communications with many websites that offer HTTPS but still allow unencrypted connections.
JavaScript
3,364
star
2

privacybadger

Privacy Badger is a browser extension that automatically learns to block invisible trackers.
JavaScript
3,172
star
3

crocodilehunter

Taking one back for Steve Irwin γ€€ γ€€ (๑‒̀ㅂ‒́)و
Python
968
star
4

apkeep

Rust
827
star
5

OpenWireless

The official home of the EFF OpenWireless Project
JavaScript
731
star
6

action-center-platform

The EFF Action Center Platform
Ruby
453
star
7

privacybadgerfirefox-legacy

LEGACY Privacy Badger for Firefox SEE README
JavaScript
408
star
8

starttls-everywhere

A system for ensuring & authenticating STARTTLS encryption between mail servers
Python
371
star
9

yaya

Yet Another Yara Automaton - Automatically curate open source yara rules and run scans
Go
264
star
10

cover-your-tracks

Is your browser safe against tracking?
JavaScript
193
star
11

phantom-of-the-capitol

181
star
12

dnt-guide

How to Implement DNT
132
star
13

cryptolog

Cryptolog is a tool for anonymizing webserver logs.
Python
68
star
14

cryptobot-email

Python
61
star
15

actioncenter-mobile

2.0
JavaScript
60
star
16

dnt-policy

dnt-policy
44
star
17

sec

Security Education Companion
JavaScript
38
star
18

spot_the_surveillance

Spot the Surveillance is an open-source educational VR tool to help people identify street-level surveillance in their community. As each surveillance device is identified, the user is informed on how the device is used via text and narration. The experience is created with accessibility in mind, so is entirely gaze-based for people with mobility challenges. Audio is also used to assist low-vision users.
JavaScript
34
star
19

www-l10n

31
star
20

rayhunter

Rust tool to detect cell site simulators on an orbic mobile hotspot
Rust
31
star
21

pushserver

A server for sending push notifications to mobile apps
JavaScript
30
star
22

privacybadger-website

Code and content of https://privacybadger.org
SCSS
25
star
23

badger-swarm

Runs distributed Badger Sett scans on Digital Ocean.
Shell
24
star
24

design

Open Source product design resources
22
star
25

starttls-backend

STARTTLS Everywhere web backend and checker
Go
18
star
26

https-everywhere-lib-wasm

A library for HTTPS Everywhere which compiles to WASM
Rust
16
star
27

webrequest-tlsinfo-api

A proposed addition to the Web Extensions API for providing TLS and X.509 information to addons
15
star
28

smtp-tls-history

Produce graphs of the historical (in)security of SMTP transmissions by parsing mailboxes
Python
13
star
29

https-everywhere-lib-core

Core Rust library for HTTPS Everywhere
Rust
12
star
30

trackerlab

EFF's Tracker Blocking Laboratory is an experimental project to test heuristic blocking of non-consensual online tracking. It's based on AdBlock Plus.
D
11
star
31

https-everywhere-docker-base

The Dockerfile for installing all the system-level requirements for HTTPS Everywhere
Dockerfile
11
star
32

onlinecensorship

Ruby
11
star
33

eff_diceware

A ruby gem for creating secure passphrases using EFF's long wordlist.
Ruby
10
star
34

cyberspying

cyberspying.eff.org twitter tool
JavaScript
10
star
35

ssd-l10n

ssd-l10n
10
star
36

https-everywhere-atlas

Static site generator for the HTTPS-Everywhere atlas.
CSS
10
star
37

starttls-policy-cli

Python
9
star
38

stopwatchingus

StopWatching.us Site
HTML
9
star
39

sas

Stand Against Spying
HTML
9
star
40

https-everywhere-standalone

Transparently redirect insecure HTTP to secure HTTPS using HTTPS Everywhere and `mitmproxy`
Python
9
star
41

congress-forms-test

Way for volunteers to test EFF's congress-forms repo
JavaScript
8
star
42

observatory

Python
7
star
43

dayofaction-banner

User-installable banner for activism campaigns
CSS
7
star
44

generate-smarter-encryption-bloom-filter

Generates the bloom file needed for HTTPS Everywhere's DuckDuckGo Smarter Encryption update channel.
Rust
6
star
45

starttls-frontend

Static front end for the STARTTLS scanner
CSS
6
star
46

OpenWireless-WebUI

Open Wireless Web UI
6
star
47

tokio-dl-stream-to-disk

A micro-library for downloading from a URL and streaming it directly to the disk
Rust
6
star
48

congress_forms

Ruby
5
star
49

congress-forms.js

A javascript widget which can construct plain forms that submit to a contact-congress server.
JavaScript
5
star
50

aws_one_click_staging

Ruby
4
star
51

psi-tumblr-crawler

JavaScript
4
star
52

https-everywhere-full-fetch-test

A docker wrapper to generate a patch after a full fetch test
Dockerfile
4
star
53

projectsecretidentity

CSS
4
star
54

ow-python

stripped down implementation of python for openwireless
4
star
55

org.eff.optimizedautocomplete

CiviCRM extension: Optimize the autocomplete search box mysql queries so they're more efficient on large databases
PHP
4
star
56

congress-pics

Generate dynamic images for members of congress
4
star
57

congress_forms_api

Ruby
4
star
58

fight215

JavaScript
3
star
59

roaming-android-mitm

Shell
3
star
60

ngw-website

JavaScript
3
star
61

httpse-ruleset-tests

JavaScript
3
star
62

lemonhrm

Fork of orangehrm open source hr management tool. Adds e-mail notifications and additional fields to the recruitment module.
PHP
3
star
63

psi-tumblr-uploader

2
star
64

eff_fab

Ruby
2
star
65

digitalcitizen

2
star
66

SEC-LevelUp

This repository is for the Level Up community to report issues with the Security Educatoin Companion, which is currently maintained by EFF. EFF is not actively developing this site and is not accepting feature requests..
2
star
67

active_preview

Rails plugin to make previews of active record objects
Ruby
2
star
68

rails_response_headers

Configure ActionController response headers with YAML.
Ruby
2
star
69

actioncenter-feedback

Repo for getting feedback on the new actioncenter
2
star
70

tosback2

HTML
2
star
71

stop-sesta

CSS
2
star
72

find-aa-domains

Create a script to find domains in the PB pre-block list which are mentioned in the Acceptable Ads list as well
JavaScript
2
star
73

privacybadger-test-fixtures-subdomain

Test fixtures for Privacy Badger
HTML
1
star
74

https-everywhere-labeller

Alexa Labeller for HTTPS Everywhere Repo
JavaScript
1
star
75

eff_matomo

Matomo API in Ruby
Ruby
1
star
76

sovereign-keys

C++
1
star
77

privacybadger-test-fixtures

Test fixtures for Privacy Badger
HTML
1
star
78

dear_fcc

Dear FCC
Ruby
1
star
79

fingerprinting-list

1
star
80

https-docs

1
star
81

petition-widget

Boilerplate code for embedding petitions
CSS
1
star
82

apkeep-files

1
star
83

ruby-civicrm

Ruby client for CiviCRM REST interface
Ruby
1
star