• Stars
    star
    146
  • Rank 252,769 (Top 5 %)
  • Language
    Ruby
  • License
    GNU General Publi...
  • Created over 12 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

The Lumen Database collects and analyzes legal complaints and requests for removal of online materials.

Build Status Code Climate

Lumen Database

The Lumen Database collects and analyzes legal complaints and requests for removal of online materials, helping Internet users to know their rights and understand the law. These data enable us to study the prevalence of legal threats and let Internet users see the source of content removals.

Automated Submissions and Search Using the API

The main Lumen Database instance has an API that allows individuals and organizations that receive large numbers of notices to submit them without using the web interface. The API also provides an easy way for researchers to search the database. Members of the public can test the database, but will likely need to request an API key from the Lumen team to receive a token that provides full access. To learn about the capabilities of the API, you can consult the API documentation.

Development

Stack

  • ruby 3.0.6
  • PostgreSQL 13.6
  • Elasticsearch 7.17.x
  • Java Runtime Environment (OpenJDK works fine)
  • Piwik Tracking (only used in prod)
  • Mail server (SMTP, Sendmail)
  • ChromeDriver (used only by test runner)

Using Docker

The easiest way to start is to use Docker. Make sure you have the Docker Engine and docker-compose installed.

Clone the repository.

cp config/database.yml.docker config/database.yml
cp .env.docker .env
docker-compose up
docker-compose exec website bash
rake db:drop db:create db:migrate
rake comfy:cms_seeds:import[lumen_cms,lumen_cms]
rake db:seed
rails s -b 0.0.0.0

Lumen will be available at http://localhost:8282.

Manual setup

By default, the app will try to connect to Elasticsearch on http://localhost:9200. If you want to use a different host set the ELASTICSEARCH_URL environment variable.

bundle install
cp config/database.yml.example config/database.yml

(edit database.yml as you wish)
(ensure PostgreSQL and Elasticsearch are running)

rails db:setup
rails lumen:set_up_cms
Running the app
rails s
Viewing the app
$BROWSER 'http://localhost:3000'

You can customize behavior during seeding (db:setup) with a couple of environment variables:

  • NOTICE_COUNT=10 will generate 10 (or any number you pass it) notices instead of the default 500
  • SKIP_FAKE_DATA=1 will skip generating fake seed data entirely.

Sample user logins

The seed data creates logins of the following form:

Username: {username}@lumendatabase.org
Password: password

username is one of {user, submitter, redactor, publisher, admin, super_admin}, with corresponding privileges.

If you seeded your database with an older version of seeds.rb, your username may use chillingeffects.org rather than lumendatabase.org.

Running Tests

$ rspec

The integration tests are quite slow; for some development purposes you may find it more convenient to bundle exec rspec spec/ --exclude-pattern="spec/integration/*".

If elasticsearch isn't on your $PATH, set ENV['TEST_ES_CLUSTER_COMMAND']=/path/to/elasticsearch, and make sure permissions are set correctly for your test suite to run it.

If you're running a subset of tests that you know don't require Elasticsearch, you can run them without setting it up via TEST_WITH_ELASTICSEARCH=0 rspec path/to/tests.

Parallelizing Tests

You can speed up tests by running them in parallel: $ rake parallel:spec

You will need to do some setup before the first time you run this:

  • alter config/database.yml so that the test database is yourproject_test<%= ENV['TEST_ENV_NUMBER'] %>
  • run rake parallel:setup

It will default to using the number of processors parallel_tests believes to be available, but you can change this by setting ENV['PARALLEL_TEST_PROCESSORS'] to the desired number.

Linting

Use rubocop and leave the code at least as clean as you found it. If you make linting-only changes, it's considerate to your code reviewer to keep them in their own commit.

Profiling

  • mini-profiler
    • available in dev by default
    • in use on prod, visible only to super_admins
    • in-depth memory profiling, stacktracing, and SQL queries; good for granular analysis
  • oink
    • memory usage, allocations
    • runs in dev by default; can run anywhere by setting ENV[USE_OINK] (ok to run in production)
    • logs to log/oink.log

Environment variables

Here are all the environment variables which Lumen recognizes. Find them in the code for documentation.

Environment variables should be set in .env and are managed by the dotenv gem. .env is not version-controlled so you can safely write secrets to it (but will also need to set these on all servers).

Unless setting an environment variable on the command line in the context of a command-line process, environment variables should ONLY be set in .env.

Most of these are optional and have sensible defaults (which may vary by environment).

Variable name Description
BATCH_SIZE Batch size of model items indexed during each run of Elasticsearch re-indexing
BUNDLE_GEMFILE Custom Gemfile location
BROWSER_VALIDATIONS Enable user HTML5 browser form validations
DEFAULT_SENDER Default mailer sender
ELASTICSEARCH_URL Elasticsearch host, e.g. https://127.0.0.1:9200
EMAIL_DOMAIN Default email domain in Action Mailer
ES_INDEX_SUFFIX Can be used to specify a suffix for the name of Elasticsearch indexes
FILE_NAME Name of CSV file to import as blog entries
GOOGLE_CUSTOM_BLOG_SEARCH_ID Custom Google search ID used in the CMS
LOG_ELASTICSEARCH Enabled logging of Elasticsearch calls, only used in tests
LOG_TO_LOGSTASH_FORMAT Set to true if you want to log in the Logstash format
USE_OINK Enable the oink gem in the production environment
MAILER_DELIVERY_METHOD Sets the delivery method for emails sent by the application
NOTICE_COUNT How many fake notices to create when seeding the db
RACK_ENV Don't use this; it's overridden by RAILS_ENV
RAILS_ENV Rails environment
RAILS_LOG_LEVEL Log level for all the application loggers
RAILS_SERVE_STATIC_FILES If present (with any value) will enable Rails to serve static files
RECAPTCHA_SITE_KEY reCAPTCHA public key
RECAPTCHA_SECRET_KEY reCAPTCHA private key
RETURN_PATH Default mailer return path
SEARCH_SLEEP Used in specs only, time out of Elasticsearch searches
SECRET_KEY_BASE The Rails secret token; required in prod
SERVER_TIME_ZONE Name of the server's timezone, e.g. Eastern Time (US & Canada)
SITE_HOST Site host, used in mailer templates
SKIP_FAKE_DATA Don't generate fake data when seeding the database
SMTP_ADDRESS SMTP server address
SMTP_DOMAIN SMTP server domain
SMTP_USERNAME SMTP server username
SMTP_PASSWORD SMTP server password
SMTP_PORT SMTP server port
SMTP_VERIFY_MODE Value of the openssl_verify_mode option of the SMTP client
TEST_ES_CLUSTER_COMMAND Path to an Elasticsearch binary used during a test suite run
USER_CRON_EMAIL For use in sending reports of court order files; can be a string or a list (in a JSON.parse-able format)
USER_CRON_MAGIC_DIR Directory used in the court order reporter cron job
WEB_CONCURRENCY Number of Unicorn workers
WEB_TIMEOUT Unicorn timeout

Email setup

The application requires a mail server, in development it's best to use a local SMTP server that will catch all outgoing emails. Mailcatcher is a good option.

Blog custom search

The /blog_entries page can contain a google custom search engine that searches the Lumen blog. To enable, create a custom search engine here restricted to the path the blog lives at, for instance https://www.lumendatabase.org/blog_entries/*. Extract the "cx" id from the javascript embed code and put it in the GOOGLE_CUSTOM_BLOG_SEARCH_ID environment variable. The blog search will appear after this variable has been configured.

Lumen API

You can search the database and, if you have a contributor token, add to the database using our API.

The Lumen API is documented in our GitHub Wiki: https://github.com/berkmancenter/lumendatabase/wiki/Lumen-API-Documentation

License

Lumen Database is licensed under GPLv2. See LICENSE.txt for more information.

Copyright

Copyright (c) 2016 President and Fellows of Harvard College

More Repositories

1

internet_monitor

The Internet Monitor is a research project to evaluate, describe, and summarize the means, mechanisms, and extent of Internet content controls and Internet activity around the world.
HTML
223
star
2

namae

Namae (名前) parses personal names and splits them into their component parts.
Ruby
160
star
3

tagteam

Enhanced Social Tagging for Academic Communities
Ruby
93
star
4

dotplot

Telling a story through dots
JavaScript
51
star
5

question_tool

A web application to propose and vote on questions on a particular subject.
JavaScript
45
star
6

amber_wordpress

Amber plugin for Wordpress
PHP
25
star
7

h2o

A suite of tools for online classrooms.
JavaScript
20
star
8

internet_dashboard

A dashboard with various internet-y widgets
JavaScript
18
star
9

bookanook

Ruby
14
star
10

adf

Augmented Dickey-Fuller implementation in Go
Go
12
star
11

zone1

Zone 1 Rescue Repository
Perl
10
star
12

ridge

Ridge regression in Go
Go
10
star
13

tmulk

Twitter mass bulk download
JavaScript
9
star
14

amber_drupal

Amber module for Drupal
8
star
15

corpusbuilder

Corpus Build OCR platform
CSS
8
star
16

url-lists

8
star
17

rpca

RPCA anomaly detection in Go for eventual use in Heka
Go
7
star
18

amber_common

Common code and documentation for the Amber project
HTML
6
star
19

devise_harvard_auth_proxy

Devise plugin to use the Harvard authentication proxy.
Ruby
5
star
20

SHARIAsource

JavaScript
5
star
21

redmine_asset_tracker

Redmine extension for tracking physical and non-physical IT resources.
JavaScript
5
star
22

netclerk

Automatic daily testing of access to URLs from other countries via public proxy servers
Ruby
4
star
23

author_names

Ruby
4
star
24

amber_nginx

Amber module for Nginx
C
4
star
25

LittleVoice

Ruby
4
star
26

PageOneX

PageOneX
JavaScript
4
star
27

amber_apache

Amber module for Apache
C
4
star
28

threads

the threads discussion tool
Ruby
4
star
29

brkmn

The dirt-simple URL shortener at brk.mn
Ruby
4
star
30

odie_backend

The admin site and api data source for the Online Discourse Insight Explorer.
Ruby
3
star
31

freezetag

Dashboard display for tagteam
JavaScript
3
star
32

slideshow_generator

Slideshow Generator for images contained in Harvard public collections
PHP
3
star
33

p5-Lingua-Stem-Snowball-Ca

Perl interface to Snowball stemmer for the Catalan language.
C
3
star
34

category_subscriptions

A wordpress plugin that allows for per-user category subscriptions.
PHP
3
star
35

berkman_custom_hewlett

Custom hewlett conference theme
JavaScript
3
star
36

herdict-twitter

Python
3
star
37

data-portraits

PHP
3
star
38

berkman_custom_cyberlaw_clinic

Custom theme for the Cyberlaw Clinic website
PHP
3
star
39

curarium

The Curarium is a collection of collections. It is a platform which leverages the power of the crowd in order to annotate, curate, and augment works within and beyond their respective collections, with the aim of constructing sharable, media-rich stories and elaborate arguments about individual items as well as groups of items in our corpora.
C
3
star
40

hekaanom

An anomaly detection filter plugin for Heka
Go
3
star
41

amber_problem_child

A public form for anonymous users to submit issues to Amber on GitHub without an account.
Ruby
3
star
42

fair_use_tool

Creative Rights fair use tool for determining copyright usage.
JavaScript
3
star
43

urlopedia

URLopedia is a very light web service for knowing things about URLs
JavaScript
3
star
44

hei

A tag heavy development project registry.
Ruby
2
star
45

classifurlr

Given request & response data, attempts to determine likelihood that page is available vs. down or blocked.
Ruby
2
star
46

15-Lessons

15 Lessons from the Berkman Fellows Program
2
star
47

cache-link

Specifications for marking up cached copies of hyperlink targets in HTML.
2
star
48

fut

Fair Use Tool | an interactive online tool for teaching how to use copyrighted content fairly.
Ruby
2
star
49

hewlett_graph

Hewlett Conference Visualization
Ruby
2
star
50

curricle

Ruby
2
star
51

privileges

Privileges app
Ruby
2
star
52

berkman_custom_hnmcp2011

Custom theme for HNMCP
PHP
2
star
53

herdict-mobile-reporter

Herdict mobile app for reporting site outages
JavaScript
2
star
54

berkman_custom_youth_and_media_redux

Custom wordpress theme for youthandmedia.org
JavaScript
2
star
55

combined_to_sqlite3

Turn apache log files into an sqlite3 database
Perl
2
star
56

tagging_archives

Tagging Archives Library Lab Project
Ruby
2
star
57

connected_scholar

Connected Scholar is an online workspace for students and scholars to draft writing projects while tracking the progression of one’s ideas through exploratory, intellectual engagement with established works.
Ruby
2
star
58

Inscriptio

An app to help manage the reservation of carrels and other reservable library resources
Ruby
2
star
59

preservation_map

Perservation mapping tool
Ruby
2
star
60

cohort_ng

Cohort - Next Generation
Ruby
2
star
61

category_subscriptions_export

A custom plugin that adds export functionality to https://github.com/berkmancenter/category_subscriptions.
PHP
2
star
62

csl-ruby

Citation Style Language (CSL) API for Ruby
Ruby
2
star
63

fsslideshow

A fullscreen slideshow theme for wordpress.
JavaScript
2
star
64

geosearch_solr_prep

geosearch_solr_prep
Ruby
2
star
65

netmaps

ASN Mapping Project
PHP
2
star
66

Not-A-Number

Ruby
2
star
67

bcms_decision_tree

A BrowserCMS plugin to implement a decision tree
Ruby
1
star
68

berkman_custom_internet_monitor

Custom WP Theme for Internet Monitor
PHP
1
star
69

feed-abstract

RSS Feed abstraction library using ruby standard lib classes.
Ruby
1
star
70

berkman_custom_symposium

A custom wordpress theme
JavaScript
1
star
71

tagteam-mobile

TagTeam mobile UI
JavaScript
1
star
72

digital_atlas_viewer

PHP
1
star
73

metalab_rgbdemo

metalab_rgbdemo
C
1
star
74

threads_server

JavaScript
1
star
75

blog-network-frontpage

A homepage theme for a WordPress Network install.
PHP
1
star
76

threads_client

Vue
1
star
77

headless_chrome_pynode

JavaScript
1
star
78

herdict-mobile-reporter-www

The PhoneGap build directory for Herdict-Mobile-Reporter.
JavaScript
1
star
79

berkman_custom_foti

Berkman Custom Theme for Future of the Internet site
PHP
1
star
80

berkman_custom_fudcon

FUDCon website custom theme.
JavaScript
1
star
81

netmaps-visualization

ASN Mapping Project - Visualization plugin
ActionScript
1
star
82

chillingeffects-csv-exporter

ChillingEffects CSV exporter
Ruby
1
star
83

drupal-githubprojects

A drupal module to import github projects as nodes (thus allowing listing).
PHP
1
star
84

berkman_custom_ocs

Custom theme for OCS based off Newsroom 1.4
PHP
1
star
85

querl

URL coding and survey tool
Ruby
1
star
86

timetracker

A simple time tracking application.
Ruby
1
star
87

slack-liveblog

Liveblog from Slack to your Wordpress site.
PHP
1
star
88

wiki-uturn

Wiki Reversion Tool
PHP
1
star
89

anomaly_detection

1
star
90

conectados-al-sur

App Web para el manejo de trabajos de investigación sobre periodismo y la visualización de las redes de colaboración.
PHP
1
star
91

flickr_widget

Modifications to Donncha's flickr_widget
PHP
1
star
92

media_metadata_harvester

Scrapes metadata from NPR and youtube
Perl
1
star
93

blog_network_stats

A wordpress plugin to aggregate data about blogs into a single table.
PHP
1
star
94

image_manipulation_timeline

HTML
1
star
95

deposit-harvard

A unified web-based deposit tool that enables the user to submit scholarship simultaneously to multiple internal and external repositories
Ruby
1
star
96

citeproc

A CSL Cite Processor API
Ruby
1
star
97

question_tool_php

Berkman question tool
PHP
1
star
98

berkman_custom_dpla

Custom wordpress theme for DPLA
PHP
1
star