• Stars
    star
    18
  • Rank 1,208,065 (Top 24 %)
  • Language
    Ruby
  • License
    GNU General Publi...
  • Created about 10 years ago
  • Updated about 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Scripts for managing scrapers

More Repositories

1

LookingGlass

Intuitive and configurable search interface for document archives.
Ruby
197
star
2

NSA-Data

NSA documents in machine readable form
Ruby
90
star
3

ICWATCH-Data

Resume data and scripts for managing it
Ruby
89
star
4

Harvester

Web crawling and document processing through a usable interface.
JavaScript
71
star
5

TransparencyToolkit

Main repository for Transparency Toolkit
41
star
6

LinkedInData

Scrapes all LinkedIn profiles including search terms.
Ruby
39
star
7

JSONToNetworkGraph

Generates network graphs from a JSON.
JavaScript
30
star
8

generalscraper

Scrapes all pages on any site you specify for keywords.
Ruby
24
star
9

DocManager

Universal backend for indexing, storing, and querying documents.
Ruby
24
star
10

IndeedScraper

Scraper for Indeed
Go
16
star
11

ArchivePile

A read-only theme for publishing email archives using Mailpile
CSS
11
star
12

Transparency-Toolkit-Prototype

Analysis system for Transparency Toolkit.
Ruby
11
star
13

Twiddler

A user friendly tool for text processing, light NLP, and keyword extraction
JavaScript
11
star
14

LinkedinCrawler

Crawls public LinkedIn profiles
Ruby
10
star
15

dataspec-sii

Dataspec for SII
Ruby
10
star
16

LinkedinParser

A parser for LinkedIn profiles
Ruby
7
star
17

Catalyst

Text mining framework.
Ruby
7
star
18

Surveillance-Research-Data

Raw data and scripts for Surveillance Research Archive
Ruby
6
star
19

TwitterCrawler

A crawler for Twitter
Ruby
6
star
20

IndeedCrawler

Crawler for the resume website Indeed
Ruby
6
star
21

NameToEmail

Gets a list of potential emails from a JSON with names.
Ruby
6
star
22

JSONToMap

Converts a JSON with locations into a map with points.
Ruby
5
star
23

Thumbtack

An open narrative mapping tool to corroborate narratives across multiple sources and formats
Ruby
5
star
24

DesignAssets

A collection of branding, interfaces, and other visual resources!
HTML
5
star
25

JSONToChoropleth

Generates choropleth maps from JSONs.
Ruby
4
star
26

theme-snowden

A theme for LookingGlass for Snowden doc search
CSS
4
star
27

FacebookCrawler

A crawler for Facebook data from public web and Graph API
Ruby
4
star
28

EmailParser

A crawler for converting email files on disk to JSON
Ruby
4
star
29

ParseFile

OCRs document and extracts metadata
Ruby
3
star
30

ExtractPatterns

Extracts terms matching certain patterns. For finding new codewords and tracking mentions of known ones.
Ruby
3
star
31

TSJobCrawler

Collects listings for jobs that require security clearance.
Ruby
3
star
32

EntityExtractor

Extracts entities and terms matching certain patterns.
Ruby
3
star
33

IndeedParser

Parser for Indeed resumes
Ruby
3
star
34

transparencytoolkit.github.io

A styleguide site for Transparency Toolkit
CSS
3
star
35

CrawlerManager

API for calling crawlers
Ruby
3
star
36

dataspec-LinkedinCrawl

A LookingGlass dataspec file for data scraped form LinkedIn.com
3
star
37

PiplCollector

Request info from Pipl for all items in dataset
Ruby
2
star
38

DirCrawl

Runs block of code on every file in directory
Ruby
2
star
39

PiplRequest

Request profiles from Pipl
Ruby
2
star
40

JSONToChart

Converts JSONs to pretty charts
Ruby
2
star
41

dataspec-IndeedCrawl

A LookingGlass dataspec file for data scraped from Indeed.com
2
star
42

wlsearchscraper

Gets a list of results from the WikiLeaks search.
Ruby
2
star
43

Archiver

Archives URLs
Ruby
2
star
44

theme-pi

A theme for Privacy International collaborations
CSS
1
star
45

UploadConvert

Tools for converting documents uploaded to Transparency Toolkit to properly formatted JSONs.
Ruby
1
star
46

dataspec-GoogleCrawl

A dataspec for the Google crawler
1
star
47

dataspec-template

A starter template for LookingGlass json files
1
star
48

RequestManager

Manages scraper HTTP requests
Ruby
1
star
49

federalregisterscraper

Scraper for the Federal Register
Ruby
1
star
50

month-names

Names of months in multiple languages
1
star
51

dataspec-fbidhs

1
star
52

JSONCombiner

Combines JSONs.
Ruby
1
star
53

Test-Data

Test data for Transparency Toolkit development
HTML
1
star
54

ArchiveAdministrator

Archive administration system. Handles archive creation and user authentication.
CSS
1
star
55

DocUpload

Upload application for documents in archiving service.
CSS
1
star
56

dataspec-EmailCrawl

Dataspec for emails
1
star
57

IC-Company-Data

Intelligence contractors
Ruby
1
star
58

wordcloud

Changes word sizes in a document based on the number of times they occur.
Ruby
1
star
59

JSONCrossreference

Crossreferences JSONs and returns the matching data.
Ruby
1
star
60

OCRServer

OCR server for hosted archiving service
Ruby
1
star
61

ansible-role-lookingglass

Automates deployment of LookingGlass instances
Shell
1
star
62

NetworkGraph

Neo4j network graph generator prototype
1
star
63

CountryConvert

Converts 2-char ISO country codes to 3-char codes.
Ruby
1
star
64

classification-sensation

Parse classification-related information
Python
1
star
65

dataspec-LoadFiles

Dataspec for plain files loaded in via Harvester/DirCrawl.
1
star
66

dataspec-snowden

A dataspec for Snowden documents
1
star