• Stars
    star
    102
  • Rank 323,617 (Top 7 %)
  • Language
    Python
  • License
    MIT License
  • Created about 6 years ago
  • Updated 3 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Browser automation for Privacy Badger. Used to pre-train new Badgers before every release.

Badger Sett

A sett or set is a badger's den which usually consists of a network of tunnels and numerous entrances. Setts incorporate larger chambers used for sleeping or rearing young.

This script is designed to raise young Privacy Badgers by teaching them about the trackers on popular sites. Every day, crawler.py visits thousands of the top sites from the Tranco List with the latest version of Privacy Badger, and saves its findings in results.json.

See the following EFF.org blog post for more information: Giving Privacy Badger a Jump Start.

Setup

  1. Prerequisites: have docker installed. Make sure your user is part of the docker group so that you can build and run docker images without sudo. You can add yourself to the group with

    $ sudo usermod -aG docker $USER
    
  2. Clone the repository

    $ git clone https://github.com/efforg/badger-sett
    
  3. Run a scan

    $ ./runscan.sh
    

    This will run a scan with the latest version of Privacy Badger's master branch and won't commit the results.

    To run the script with a different branch of privacy badger, set the PB_BRANCH variable. e.g.

    $ PB_BRANCH=my-feature-branch ./runscan.sh
    

    You can also pass arguments to crawler.py, the python script that does the actual crawl. Any arguments passed to runscan.sh will be forwarded to crawler.py. To control the number of sites that the crawler visits, use the --num-sites argument (the default is 2000). For example:

    $ ./runscan.sh --num-sites 10
    

    To exclude any sites with a given top level domain from the scan, pass in the --exclude argument followed by the TLD suffix you want to exclude. For example, if you wanted to exclude all sites with a .gov TLD:

    $ ./runscan.sh --exclude .gov
    

    To exclude multiple TLDs from a scan, pass in each TLD separated by a comma, with no space between. For example, if you wanted to exclude all sites with .org and .net TLDs:

    $ ./runscan.sh --exclude .org,.net
    

    You can load another extension to run in parallel to Privacy Badger during a scan. Use the --load-extension flag and pass along the filepath for the .crx or .xpi file that you want to load. For example:

    $ ./runscan.sh --load-extension parallel-extensions/ublock.crx
    
  4. Monitor the scan

    To have the scan print verbose output about which sites it's visiting, use the --log-stdout argument.

    If you don't use that argument, all output will still be logged to docker-out/log.txt, beginning after the script outputs "Running scan in Docker..."

Automatic crawling

To set up the script to run periodically and automatically update the repository with its results:

  1. Create a new ssh key with ssh-keygen. Give it a name unique to the repository.

    $ ssh-keygen
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/USER/.ssh/id_rsa): /home/USER/.ssh/id_rsa_badger_sett
    
  2. Add the new key as a deploy key with R/W access to the repo on Github. https://developer.github.com/v3/guides/managing-deploy-keys/

  3. Add a SSH host alias for Github that uses the new key pair. Create or open ~/.ssh/config and add the following:

    Host github-badger-sett
      HostName github.com
      User git
      IdentityFile /home/USER/.ssh/id_rsa_badger_sett
    
  4. Configure git to connect to the remote over SSH. Edit .git/config:

    [remote "origin"]
      url = ssh://git@github-badger-sett:/efforg/badger-sett
    

    This will have git connect to the remote using the new SSH keys by default.

  5. Create a cron job to call runscan.sh once a day. Set the environment variable RUN_BY_CRON=1 to turn off TTY forwarding to docker run (which would break the script in cron), and set GIT_PUSH=1 to have the script automatically commit and push results.json when the scan finishes. Here's an example crontab entry:

    0 0 * * *  RUN_BY_CRON=1 GIT_PUSH=1 /home/USER/badger-sett/runscan.sh
    
  6. If everything has been set up correctly, the script should push a new version of results.json after each crawl. Soon, whenever you make a new version of Privacy Badger, it will pull the latest version of the crawler's data and ship it with the new version of the extension.

More Repositories

1

https-everywhere

A browser extension that encrypts your communications with many websites that offer HTTPS but still allow unencrypted connections.
JavaScript
3,375
star
2

privacybadger

Privacy Badger is a browser extension that automatically learns to block invisible trackers.
JavaScript
2,996
star
3

crocodilehunter

Taking one back for Steve Irwin ใ€€ ใ€€ (เน‘โ€ขฬ€ใ…‚โ€ขฬ)ูˆ
Python
926
star
4

OpenWireless

The official home of the EFF OpenWireless Project
JavaScript
731
star
5

apkeep

Rust
684
star
6

action-center-platform

The EFF Action Center Platform
Ruby
428
star
7

privacybadgerfirefox-legacy

LEGACY Privacy Badger for Firefox SEE README
JavaScript
411
star
8

starttls-everywhere

A system for ensuring & authenticating STARTTLS encryption between mail servers
Python
370
star
9

yaya

Yet Another Yara Automaton - Automatically curate open source yara rules and run scans
Go
251
star
10

phantom-of-the-capitol

178
star
11

cover-your-tracks

Is your browser safe against tracking?
JavaScript
175
star
12

dnt-guide

How to Implement DNT
128
star
13

cryptolog

Cryptolog is a tool for anonymizing webserver logs.
Python
68
star
14

cryptobot-email

Python
61
star
15

actioncenter-mobile

2.0
JavaScript
60
star
16

dnt-policy

dnt-policy
43
star
17

sec

Security Education Companion
JavaScript
39
star
18

spot_the_surveillance

Spot the Surveillance is an open-source educational VR tool to help people identify street-level surveillance in their community. As each surveillance device is identified, the user is informed on how the device is used via text and narration. The experience is created with accessibility in mind, so is entirely gaze-based for people with mobility challenges. Audio is also used to assist low-vision users.
JavaScript
34
star
19

www-l10n

31
star
20

pushserver

A server for sending push notifications to mobile apps
JavaScript
30
star
21

badger-swarm

Runs distributed Badger Sett scans on Digital Ocean.
Shell
21
star
22

design

Open Source product design resources
21
star
23

privacybadger-website

Code and content of https://privacybadger.org
SCSS
20
star
24

starttls-backend

STARTTLS Everywhere web backend and checker
Go
16
star
25

https-everywhere-lib-wasm

A library for HTTPS Everywhere which compiles to WASM
Rust
16
star
26

webrequest-tlsinfo-api

A proposed addition to the Web Extensions API for providing TLS and X.509 information to addons
15
star
27

smtp-tls-history

Produce graphs of the historical (in)security of SMTP transmissions by parsing mailboxes
Python
13
star
28

https-everywhere-lib-core

Core Rust library for HTTPS Everywhere
Rust
12
star
29

onlinecensorship

Ruby
12
star
30

trackerlab

EFF's Tracker Blocking Laboratory is an experimental project to test heuristic blocking of non-consensual online tracking. It's based on AdBlock Plus.
D
11
star
31

https-everywhere-docker-base

The Dockerfile for installing all the system-level requirements for HTTPS Everywhere
Dockerfile
11
star
32

cyberspying

cyberspying.eff.org twitter tool
JavaScript
10
star
33

https-everywhere-atlas

Static site generator for the HTTPS-Everywhere atlas.
CSS
10
star
34

ssd-l10n

ssd-l10n
10
star
35

eff_diceware

A ruby gem for creating secure passphrases using EFF's long wordlist.
Ruby
9
star
36

stopwatchingus

StopWatching.us Site
HTML
9
star
37

sas

Stand Against Spying
HTML
9
star
38

starttls-policy-cli

Python
9
star
39

https-everywhere-standalone

Transparently redirect insecure HTTP to secure HTTPS using HTTPS Everywhere and `mitmproxy`
Python
9
star
40

congress-forms-test

Way for volunteers to test EFF's congress-forms repo
JavaScript
8
star
41

generate-smarter-encryption-bloom-filter

Generates the bloom file needed for HTTPS Everywhere's DuckDuckGo Smarter Encryption update channel.
Rust
7
star
42

dayofaction-banner

User-installable banner for activism campaigns
CSS
7
star
43

observatory

Python
6
star
44

OpenWireless-WebUI

Open Wireless Web UI
6
star
45

tokio-dl-stream-to-disk

A micro-library for downloading from a URL and streaming it directly to the disk
Rust
6
star
46

congress_forms

Ruby
5
star
47

starttls-frontend

Static front end for the STARTTLS scanner
CSS
5
star
48

congress-forms.js

A javascript widget which can construct plain forms that submit to a contact-congress server.
JavaScript
5
star
49

aws_one_click_staging

Ruby
4
star
50

psi-tumblr-crawler

JavaScript
4
star
51

https-everywhere-full-fetch-test

A docker wrapper to generate a patch after a full fetch test
Dockerfile
4
star
52

ow-python

stripped down implementation of python for openwireless
4
star
53

projectsecretidentity

CSS
4
star
54

org.eff.optimizedautocomplete

CiviCRM extension: Optimize the autocomplete search box mysql queries so they're more efficient on large databases
PHP
4
star
55

congress-pics

Generate dynamic images for members of congress
4
star
56

fight215

JavaScript
3
star
57

ngw-website

JavaScript
3
star
58

httpse-ruleset-tests

JavaScript
3
star
59

congress_forms_api

Ruby
3
star
60

lemonhrm

Fork of orangehrm open source hr management tool. Adds e-mail notifications and additional fields to the recruitment module.
PHP
3
star
61

psi-tumblr-uploader

2
star
62

digitalcitizen

2
star
63

roaming-android-mitm

Shell
2
star
64

find-aa-domains

Create a script to find domains in the PB pre-block list which are mentioned in the Acceptable Ads list as well
JavaScript
2
star
65

active_preview

Rails plugin to make previews of active record objects
Ruby
2
star
66

rails_response_headers

Configure ActionController response headers with YAML.
Ruby
2
star
67

actioncenter-feedback

Repo for getting feedback on the new actioncenter
2
star
68

eff_fab

Ruby
2
star
69

stop-sesta

CSS
2
star
70

tosback2

HTML
2
star
71

eff_matomo

Matomo API in Ruby
Ruby
1
star
72

sovereign-keys

C++
1
star
73

privacybadger-test-fixtures

Test fixtures for Privacy Badger
HTML
1
star
74

SEC-LevelUp

This repository is for the Level Up community to report issues with the Security Educatoin Companion, which is currently maintained by EFF. EFF is not actively developing this site and is not accepting feature requests..
1
star
75

dear_fcc

Dear FCC
Ruby
1
star
76

https-everywhere-labeller

Alexa Labeller for HTTPS Everywhere Repo
JavaScript
1
star
77

fingerprinting-list

1
star
78

ruby-civicrm

Ruby client for CiviCRM REST interface
Ruby
1
star
79

https-docs

1
star
80

petition-widget

Boilerplate code for embedding petitions
CSS
1
star
81

apkeep-files

1
star