• Stars
    star
    416
  • Rank 101,739 (Top 3 %)
  • Language
    Python
  • Created about 10 years ago
  • Updated 13 days ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

URL testing lists intended for discovering website censorship

Usage

What Is It?

Contained are URL testing lists intended to help in testing URL censorship, divided by country codes. In addition to these local lists, the global list consists of a wide range of internationally relevant and popular websites, including sites with content that is perceived to be provocative or objectionable. Most of the websites on the global list are in English. In contrast, the local lists are designed individually for each country by regional experts. They have content representing a wide range of categories at the local and regional levels, and content in local languages. In countries where Internet censorship has been reported, the local lists also include many of the sites that are alleged to have been blocked.

Categories are divided among four broad themes:

  • Political (This category is focused primarily on Web sites that express views in opposition to those of the current government. Content more broadly related to human rights, freedom of expression, minority rights, and religious movements is also considered here.)

  • Social (This group covers material related to sexuality, gambling, and illegal drugs and alcohol, as well as other topics that may be socially sensitive or perceived as offensive).

  • Conflict/Security (Content related to armed conflicts, border disputes, separatist movements, and militant groups is included in this category).

  • Internet Tools (Web sites that provide e-mail, Internet hosting, search, translation, Voice-over Internet Protocol (VoIP) telephone service, and circumvention methods are grouped in this category.)

More information about testing methodology can be found here.

The only testing list that applies regionally (more than one or more country) is the CIS testing list which is intended for testing former Commonwealth of Independent States nations.

Lists are available in both CSV and JSON format.

Please note that these lists are not the entirety of testing lists but rather just the newest list for every unique country code.

Contributing URLs

To learn how to contribute URLs for testing see: https://ooni.org/get-involved/contribute-test-lists/

Citation

If using this dataset in a publication, please see the following BibTeX File format.

@misc{testlist,
  title={URL testing lists intended for discovering website censorship},
  author={Citizen Lab and Others},
  year={2014},
  url={https://github.com/citizenlab/test-lists},
  note={\href{https://github.com/citizenlab/test-lists}{https://github.com/citizenlab/test-lists}}
}

An example Chicago Style citation is included below:

Citizen Lab and Others. 2014. URL Testing Lists Intended for Discovering Website Censorship. https://github.com/citizenlab/test-lists.

License

All data is provided under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International and available in full here and summarized here

More Repositories

1

chat-censorship

Data related to the investigation of realtime censorship
Lua
618
star
2

malware-indicators

Citizen Lab Malware Reports
YARA
258
star
3

malware-signatures

Yara rules for malware families seen as part of targeted threats project
Vim Script
127
star
4

web-censorship

Collection of data about URL filtering in various countries
HTML
38
star
5

spyware-scan

Ruby
36
star
6

ami

AMI is a web application that helps people to create legal requests for copies of their personal information from data operators.
PHP
27
star
7

blockpages

Collection of censorship blockpages as collected by various sources
HTML
26
star
8

badtraffic

Supporting data for BAD TRAFFIC Citizen Lab report.
Python
23
star
9

vuln-disclosures

This repository contains information related to vulnerability disclosures done by the Citizen Lab.
21
star
10

tiktok-report-data

JavaScript
21
star
11

bluecoat-investigations

Investigation data from two reports around the Blue Coat networking kit.
18
star
12

ami-frontend

Access My Info Frontend
CSS
12
star
13

wechat-report-data

JavaScript
11
star
14

censored-keyword-isolation

Algorithms for determining keyword combinations used to filter text
Python
10
star
15

filtering-annotations

A collection of text patterns related to filtering infrastructure
HTML
9
star
16

planetnetsweeper

Supporting data for Citizen Lab Planet Netsweeper Report
6
star
17

endless_mayfly

Dataset for the report "Burned After Reading: Endless Mayfly’s Ephemeral Disinformation Campaign"
6
star
18

reports

A mirror of various Citizen Lab research reports in PDF
4
star
19

alg-policing-foi-records

A collection of records and letters from freedom of information requests submitted to various federal and provincial departments, and municipal police services in Canada.
3
star
20

ami-community

JavaScript
1
star
21

not-ok-on-vk-data

Data release associated with the "Not OK On VK" report.
1
star