• Stars
    star
    160
  • Rank 234,703 (Top 5 %)
  • Language
    Python
  • Created over 7 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A reference implementation in python of a simple crawler for Ads.txt

Synopsis

An example crawler for ads.txt files given a list of URLs or domains etc and saves them to a SQLite DB table.

Usage Example

Usage: adstxt_crawler.py [options]

Options:
  -h, --help            show this help message and exit
  -t FILE, --targets=FILE
                        list of domains to crawler ads.txt from
  -d FILE, --database=FILE
                        Database to dump crawlered data into
  -v, --verbose         Increase verbosity (specify multiple times for more)

Targets File

The targets file can be a list of domains, URLs etc. For each, line the crawler will extract the full hostname, validate it, and cause a request to http://HOSTNAME/ads.txt

$ cat target_domains.txt 
#https://chicagotribune.com
#http://latimes.com/sports
#washingtonpost.com
#http://nytimes.com/index.html
localhosttribune.com

Installation

The project depends on these libraries and programs installed

  • Python 2 or better
  • sqlite3
  • See requirements.txt for all Python packages to install

Execute this command to install the DB table

$sqlite3 adstxt.db < adstxt_crawler.sql 

Running

The usual usage would be to pass a filename of target URLs and a filename of the SQLite DB.

$ ./adstxt_crawler.py -t target_domains.txt -d adstxt.db
Wrote 3 records from 1 URLs to adstxt.db

Upon each run a sequence of entries in adstxt_crawler.log is created.

You can examine the DB records created as follows:

$echo "select * from adstxt;" | sqlite3 adstxt.db

You can clear the DB records as follows:

$echo "delete from adstxt;" | sqlite3 adstxt.db

Warnings

This is an example prototype crawler and would be suitable only for a very modest production usage. It doesn't contain a lot of niceties of a production crawler, such as parallel HTTP download and parsing of the data files, stateful recovery of target servers being down, usage of a real production DB server etc.

Contributors

Maintainer: Neal Richter, [email protected] or [email protected]

Contributors (GitHub.com account names) iantri jhpacker brk212 bradlucas nag4 AntoineJac markparolisi sean-mcmann Breza miyaichi

License

The open source license used is the 2-clause BSD license

More Repositories

1

GDPR-Transparency-and-Consent-Framework

Technical specifications for IAB Europe Transparency and Consent Framework that will help the digital advertising industry interpret and comply with EU rules on data protection and privacy - notably the General Data Protection Regulation (GDPR) that comes into effect on May 25, 2018.
864
star
2

openrtb

Open RTB is a protocol for real time bidding on digital media
398
star
3

AdBlockDetection

AdBlock Detection Scripts
JavaScript
193
star
4

VAST_Samples

Sample VAST Tag Use cases
183
star
5

vast

Video Ad Serving Template (VAST)
HTML
112
star
6

iabtcf-es

Official compliant tool suite for implementing the Transparency and Consent Framework (TCF) v2.0. The essential toolkit for CMPs.
TypeScript
109
star
7

AdCOM

Advertising Common Object Model (OpenMedia)
101
star
8

safeframe

SafeFrame reference implementation official mirror from SourceForge
JavaScript
92
star
9

Consent-String-SDK-JS

Transparency and Consent Framework v1.1 Consent String SDK - javascript
JavaScript
89
star
10

openvv

ActionScript
81
star
11

USPrivacy

Technical specifications to support US Privacy initiatives, starting with CCPA (California Consumer Privacy Act)
67
star
12

openrtb2.x

OpenRTB 2.x specification, from 2.6 onward
67
star
13

taxonomy

Machine-readable Taxonomies with ID mappings
64
star
14

Open-Measurement-JSClients

JavaScript
53
star
15

SIMID

Secure Interactive Media Interface Definition (SIMID)
HTML
47
star
16

iabtcf-java

Transparency and Consent Framework v2.0 Library - Java
Java
34
star
17

flex-ad-testing

Helper code and samples for testing and using the new IAB Flexible ad formats
HTML
34
star
18

Global-Privacy-Platform

IAB Tech Lab Global Privacy Platform specification
30
star
19

Consent-String-SDK-Java

Transparency and Consent Framework Consent String SDK - Java
Java
30
star
20

vmap

Digital Video Multiple Ad Playlist
25
star
21

ads.txt-parser

A reference implementation of ads.txt parsing code (currently focused on app-ads.txt)
Go
25
star
22

AdManagementAPI

Ad Management API (creative approval specification in OpenMedia specification stack)
24
star
23

OpenDirect

20
star
24

CCPA-reference-code

workspace for CCPA publisher code dev
HTML
19
star
25

dynamicContentAdsSchema

Dynamic Content Ads Schema
19
star
26

openvv-html

OpenVV project to provide viewability data and API to HTML5 based video
JavaScript
19
star
27

PrivacyChain

HTML
15
star
28

openrtb-model

Data models for 2.X and 3.0
Java
15
star
29

iabtcf

Vue
14
star
30

MRAID-3.0-Compliance-Ads

JavaScript
12
star
31

openrtb3-converter

Java
12
star
32

Consent-String-SDK-Swift

Transparency and Consent Framework Consent String SDK Swift
Swift
11
star
33

daast

8
star
34

Open-Measurement-ReferenceApp-iOS

Swift
8
star
35

HTML5-AdValidator

8
star
36

ads-cert-openrtb3-verifier

Java
5
star
37

vpaid

3
star
38

Consent-String-SDK-Scala

Transparency and Consent Framework Consent String SDK Scala
3
star
39

Consent-String-SDK-Android-

Transparency and Consent Framework Consent String SDK
2
star
40

Consent-String-SDK-C

Transparency and Consent Framework Consent String SDK C
2
star
41

OpenDirect-Spec-Review

2
star
42

mraidadtester-android-mobileapp

MRAID ad unit tester for Android
Java
1
star
43

safeframe-contrib

Community contributions and extensions to SafeFrame
1
star
44

test-transfer

An /empty/ repo to test transfers (bar readme.md)
1
star
45

OMSDK-KnownIssues

1
star