• Stars
    star
    1
  • Language
  • License
    GNU General Publi...
  • Created about 8 years ago
  • Updated about 8 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

A dataspec for the Google crawler

More Repositories

1

LookingGlass

Intuitive and configurable search interface for document archives.
Ruby
197
star
2

NSA-Data

NSA documents in machine readable form
Ruby
90
star
3

ICWATCH-Data

Resume data and scripts for managing it
Ruby
89
star
4

Harvester

Web crawling and document processing through a usable interface.
JavaScript
71
star
5

TransparencyToolkit

Main repository for Transparency Toolkit
41
star
6

LinkedInData

Scrapes all LinkedIn profiles including search terms.
Ruby
39
star
7

JSONToNetworkGraph

Generates network graphs from a JSON.
JavaScript
30
star
8

generalscraper

Scrapes all pages on any site you specify for keywords.
Ruby
24
star
9

DocManager

Universal backend for indexing, storing, and querying documents.
Ruby
24
star
10

UtilityScripts

Scripts for managing scrapers
Ruby
18
star
11

IndeedScraper

Scraper for Indeed
Go
16
star
12

ArchivePile

A read-only theme for publishing email archives using Mailpile
CSS
11
star
13

Transparency-Toolkit-Prototype

Analysis system for Transparency Toolkit.
Ruby
11
star
14

Twiddler

A user friendly tool for text processing, light NLP, and keyword extraction
JavaScript
11
star
15

LinkedinCrawler

Crawls public LinkedIn profiles
Ruby
10
star
16

dataspec-sii

Dataspec for SII
Ruby
10
star
17

LinkedinParser

A parser for LinkedIn profiles
Ruby
7
star
18

Catalyst

Text mining framework.
Ruby
7
star
19

Surveillance-Research-Data

Raw data and scripts for Surveillance Research Archive
Ruby
6
star
20

TwitterCrawler

A crawler for Twitter
Ruby
6
star
21

IndeedCrawler

Crawler for the resume website Indeed
Ruby
6
star
22

NameToEmail

Gets a list of potential emails from a JSON with names.
Ruby
6
star
23

JSONToMap

Converts a JSON with locations into a map with points.
Ruby
5
star
24

Thumbtack

An open narrative mapping tool to corroborate narratives across multiple sources and formats
Ruby
5
star
25

DesignAssets

A collection of branding, interfaces, and other visual resources!
HTML
5
star
26

JSONToChoropleth

Generates choropleth maps from JSONs.
Ruby
4
star
27

theme-snowden

A theme for LookingGlass for Snowden doc search
CSS
4
star
28

FacebookCrawler

A crawler for Facebook data from public web and Graph API
Ruby
4
star
29

EmailParser

A crawler for converting email files on disk to JSON
Ruby
4
star
30

ParseFile

OCRs document and extracts metadata
Ruby
3
star
31

ExtractPatterns

Extracts terms matching certain patterns. For finding new codewords and tracking mentions of known ones.
Ruby
3
star
32

TSJobCrawler

Collects listings for jobs that require security clearance.
Ruby
3
star
33

EntityExtractor

Extracts entities and terms matching certain patterns.
Ruby
3
star
34

IndeedParser

Parser for Indeed resumes
Ruby
3
star
35

transparencytoolkit.github.io

A styleguide site for Transparency Toolkit
CSS
3
star
36

CrawlerManager

API for calling crawlers
Ruby
3
star
37

dataspec-LinkedinCrawl

A LookingGlass dataspec file for data scraped form LinkedIn.com
3
star
38

PiplCollector

Request info from Pipl for all items in dataset
Ruby
2
star
39

DirCrawl

Runs block of code on every file in directory
Ruby
2
star
40

PiplRequest

Request profiles from Pipl
Ruby
2
star
41

JSONToChart

Converts JSONs to pretty charts
Ruby
2
star
42

dataspec-IndeedCrawl

A LookingGlass dataspec file for data scraped from Indeed.com
2
star
43

wlsearchscraper

Gets a list of results from the WikiLeaks search.
Ruby
2
star
44

Archiver

Archives URLs
Ruby
2
star
45

theme-pi

A theme for Privacy International collaborations
CSS
1
star
46

UploadConvert

Tools for converting documents uploaded to Transparency Toolkit to properly formatted JSONs.
Ruby
1
star
47

dataspec-template

A starter template for LookingGlass json files
1
star
48

RequestManager

Manages scraper HTTP requests
Ruby
1
star
49

federalregisterscraper

Scraper for the Federal Register
Ruby
1
star
50

month-names

Names of months in multiple languages
1
star
51

dataspec-fbidhs

1
star
52

JSONCombiner

Combines JSONs.
Ruby
1
star
53

Test-Data

Test data for Transparency Toolkit development
HTML
1
star
54

ArchiveAdministrator

Archive administration system. Handles archive creation and user authentication.
CSS
1
star
55

DocUpload

Upload application for documents in archiving service.
CSS
1
star
56

dataspec-EmailCrawl

Dataspec for emails
1
star
57

IC-Company-Data

Intelligence contractors
Ruby
1
star
58

wordcloud

Changes word sizes in a document based on the number of times they occur.
Ruby
1
star
59

JSONCrossreference

Crossreferences JSONs and returns the matching data.
Ruby
1
star
60

OCRServer

OCR server for hosted archiving service
Ruby
1
star
61

ansible-role-lookingglass

Automates deployment of LookingGlass instances
Shell
1
star
62

NetworkGraph

Neo4j network graph generator prototype
1
star
63

CountryConvert

Converts 2-char ISO country codes to 3-char codes.
Ruby
1
star
64

classification-sensation

Parse classification-related information
Python
1
star
65

dataspec-LoadFiles

Dataspec for plain files loaded in via Harvester/DirCrawl.
1
star
66

dataspec-snowden

A dataspec for Snowden documents
1
star