• Stars
    star
    197
  • Rank 197,722 (Top 4 %)
  • Language
    Ruby
  • License
    GNU General Publi...
  • Created over 10 years ago
  • Updated about 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Intuitive and configurable search interface for document archives.

LookingGlass

Search, filter, and browse any set of documents. LookingGlass includes full text search, category filters, and date queries all through a nice search interface with an Elasticsearch backend. LookingGlass also supports customizable themes and flexible document view pages for browsing and embedding a variety of document types.

LookingGlass requires DocManager so that it can interact with Elasticsearch. LookingGlass can be used in combination with Harvester for crawling, parsing, and loading documents and automatically turning them into a searchable archive. However, it also works well as a standalone archiving tool.

Installation

Dependencies

  • DocManager and all of its dependencies
  • ruby 2.4.1
  • rails 5
  • (optionally) Harvester
  • libmagic-dev

Setup Instructions

  1. Install the dependencies
  1. Get LookingGlass
  • Clone repo: git clone --recursive [email protected]:TransparencyToolkit/LookingGlass.git
  • Go into the LookingGlass directory: cd LookingGlass
  • Install the Rubygems LookingGlass uses: bundle install
  • Generate simple form data: rails generate simple_form:install --bootstrap
  • Precompile assets: rake assets:precompile
  1. Run LookingGlass
  • Start DocManager: Follow the instructions on the DocManager repo
  • Configure Project: Edit the file in config/initializers/project_config so that the PROJECT_INDEX value is the name of the index in the DocManager project config LookingGlass should use
  • Start LookingGlass: Run rails server -p 3001
  • Use LookingGlass: Go to http://0.0.0.0:3001 in your browser

Features

LookingGlass is a frontend for searchable document archives. Previously, it also included the backend that interacted with Elasticsearch, but this has since been split out into DocManager. The key features are described below.

Display of Document Sets

LookingGlass shows document sets from multiple data sources. It displays a list of documents on the main page. The fields displayed for each document on the index page and the order the documents are displayed in (sorted by date or another numerical field) are customizable in DocManager's data source config files.

Each individual document set is then displayed on its own page for easy reading. The document page includes a sidebar with the document's categorical field and a customizable set of tabs that can display the document text, embed the document itself (which is stored remotely, locally, or on document cloud), offer document downloads, or load links.

Search

LookingGlass allows full text of document sets using the Elasticsearch backend. It can be used to search documents in most languages. LookingGlass supports searching all fields or individual fields, and a variety of non-text fields like dates. Results are sorted by relevance with text matching the query highlighted.

Categorical Filters

Many document sets have categorical fields that are common across documents, either in the original data or that can be extracted with a tool like Catalyst. For example, countries mentioned in a document, file format, hashtags, and topic-specific keywords are common types of categories. LookingGlass allows filtering document sets by one or more categories by clicking links on the sidebar to get, say, all the documents that are about a particular country.

The category sidebar also displays the number of documents for each value in each category that matches the current query. This is great for getting an overview of the content in the document set.

Document View Templates

On both the search results/document index and individual document pages, the way the document is displayed is highly customizable. It is possible to add new templates to display different types of data sources however you want and even thread together multiple documents when needed (in email datasets, for example).

These view templates are defined in app/views/docs/show/tabs/panes (for the document view page) and app/views/docs/index/results/result_templates (for the index/result view). The fields to use as a thread ID and view templates to used are specified per-source in the DocManager data source config files.

Version Tracking

LookingGlass can be used to track which documents change over time and how. Documents that are changed are specified in categories on the sidebar and the document view page has a tool that allows users to view the exact difference between two documents over time.

The fields used to check if a document has changed are specified per-source in the DocManager data source config files.

Custom Themes

LookingGlass supports custom theming. The color scheme, fonts, logo, text, and links are all entirely customizable.

Some of these settings, like the theme used, project title, and logo are defined in the DocManager project config file. The colors and fonts can then be set by creating a theme.

More Repositories

1

NSA-Data

NSA documents in machine readable form
Ruby
90
star
2

ICWATCH-Data

Resume data and scripts for managing it
Ruby
89
star
3

Harvester

Web crawling and document processing through a usable interface.
JavaScript
71
star
4

TransparencyToolkit

Main repository for Transparency Toolkit
41
star
5

LinkedInData

Scrapes all LinkedIn profiles including search terms.
Ruby
39
star
6

JSONToNetworkGraph

Generates network graphs from a JSON.
JavaScript
30
star
7

generalscraper

Scrapes all pages on any site you specify for keywords.
Ruby
24
star
8

DocManager

Universal backend for indexing, storing, and querying documents.
Ruby
24
star
9

UtilityScripts

Scripts for managing scrapers
Ruby
18
star
10

IndeedScraper

Scraper for Indeed
Go
16
star
11

ArchivePile

A read-only theme for publishing email archives using Mailpile
CSS
11
star
12

Transparency-Toolkit-Prototype

Analysis system for Transparency Toolkit.
Ruby
11
star
13

Twiddler

A user friendly tool for text processing, light NLP, and keyword extraction
JavaScript
11
star
14

LinkedinCrawler

Crawls public LinkedIn profiles
Ruby
10
star
15

dataspec-sii

Dataspec for SII
Ruby
10
star
16

LinkedinParser

A parser for LinkedIn profiles
Ruby
7
star
17

Catalyst

Text mining framework.
Ruby
7
star
18

Surveillance-Research-Data

Raw data and scripts for Surveillance Research Archive
Ruby
6
star
19

TwitterCrawler

A crawler for Twitter
Ruby
6
star
20

IndeedCrawler

Crawler for the resume website Indeed
Ruby
6
star
21

NameToEmail

Gets a list of potential emails from a JSON with names.
Ruby
6
star
22

JSONToMap

Converts a JSON with locations into a map with points.
Ruby
5
star
23

Thumbtack

An open narrative mapping tool to corroborate narratives across multiple sources and formats
Ruby
5
star
24

DesignAssets

A collection of branding, interfaces, and other visual resources!
HTML
5
star
25

JSONToChoropleth

Generates choropleth maps from JSONs.
Ruby
4
star
26

theme-snowden

A theme for LookingGlass for Snowden doc search
CSS
4
star
27

FacebookCrawler

A crawler for Facebook data from public web and Graph API
Ruby
4
star
28

EmailParser

A crawler for converting email files on disk to JSON
Ruby
4
star
29

ParseFile

OCRs document and extracts metadata
Ruby
3
star
30

ExtractPatterns

Extracts terms matching certain patterns. For finding new codewords and tracking mentions of known ones.
Ruby
3
star
31

TSJobCrawler

Collects listings for jobs that require security clearance.
Ruby
3
star
32

EntityExtractor

Extracts entities and terms matching certain patterns.
Ruby
3
star
33

IndeedParser

Parser for Indeed resumes
Ruby
3
star
34

transparencytoolkit.github.io

A styleguide site for Transparency Toolkit
CSS
3
star
35

CrawlerManager

API for calling crawlers
Ruby
3
star
36

dataspec-LinkedinCrawl

A LookingGlass dataspec file for data scraped form LinkedIn.com
3
star
37

PiplCollector

Request info from Pipl for all items in dataset
Ruby
2
star
38

DirCrawl

Runs block of code on every file in directory
Ruby
2
star
39

PiplRequest

Request profiles from Pipl
Ruby
2
star
40

JSONToChart

Converts JSONs to pretty charts
Ruby
2
star
41

dataspec-IndeedCrawl

A LookingGlass dataspec file for data scraped from Indeed.com
2
star
42

wlsearchscraper

Gets a list of results from the WikiLeaks search.
Ruby
2
star
43

Archiver

Archives URLs
Ruby
2
star
44

theme-pi

A theme for Privacy International collaborations
CSS
1
star
45

UploadConvert

Tools for converting documents uploaded to Transparency Toolkit to properly formatted JSONs.
Ruby
1
star
46

dataspec-GoogleCrawl

A dataspec for the Google crawler
1
star
47

dataspec-template

A starter template for LookingGlass json files
1
star
48

RequestManager

Manages scraper HTTP requests
Ruby
1
star
49

federalregisterscraper

Scraper for the Federal Register
Ruby
1
star
50

month-names

Names of months in multiple languages
1
star
51

dataspec-fbidhs

1
star
52

JSONCombiner

Combines JSONs.
Ruby
1
star
53

Test-Data

Test data for Transparency Toolkit development
HTML
1
star
54

ArchiveAdministrator

Archive administration system. Handles archive creation and user authentication.
CSS
1
star
55

DocUpload

Upload application for documents in archiving service.
CSS
1
star
56

dataspec-EmailCrawl

Dataspec for emails
1
star
57

IC-Company-Data

Intelligence contractors
Ruby
1
star
58

wordcloud

Changes word sizes in a document based on the number of times they occur.
Ruby
1
star
59

JSONCrossreference

Crossreferences JSONs and returns the matching data.
Ruby
1
star
60

OCRServer

OCR server for hosted archiving service
Ruby
1
star
61

ansible-role-lookingglass

Automates deployment of LookingGlass instances
Shell
1
star
62

NetworkGraph

Neo4j network graph generator prototype
1
star
63

CountryConvert

Converts 2-char ISO country codes to 3-char codes.
Ruby
1
star
64

classification-sensation

Parse classification-related information
Python
1
star
65

dataspec-LoadFiles

Dataspec for plain files loaded in via Harvester/DirCrawl.
1
star
66

dataspec-snowden

A dataspec for Snowden documents
1
star