Media Cloud (@mediacloud)

Top repositories

1

backend

Media Cloud is an open source, open data platform that allows researchers to answer quantitative questions about the content of online media.
Python
280
star
2

sentence-splitter

Text to sentence splitter using heuristic algorithm by Philipp Koehn and Josh Schroeder.
Python
223
star
3

cliff-annotator

A lightweight server to allow HTTP requests to the Stanford Named Entity Recognized and a heavily modified CLAVIN geoparser.
Java
119
star
4

api-client

Public client for consuming content from the Media Cloud Online News Archive & Directory.
Python
68
star
5

web-tools

The shared repository for Media Cloud web apps (Explorer, Source Manager, Topic Mapper)
JavaScript
63
star
6

date_guesser

A library to extract a publication date from a web page, along with a measure of the accuracy.
Python
42
star
7

nyt-news-labeler

Tag news stories based on models trained on the NYT corpus.
Python
39
star
8

api-tutorial-notebooks

A set of jupyter notebooks demonstrating how to use the Media Cloud API.
Jupyter Notebook
33
star
9

feed_seeker

Find rss, atom, xml, and rdf feeds on webpages
Python
31
star
10

metadata-lib

How Media Cloud approaches extracting metadata from online news stories
Python
12
star
11

web-search

Code that drives the public web-based tools for the Media Cloud Online News Archive and Directory.
JavaScript
9
star
12

copy-kvs

Copy a lot of objects between various key-value stores (MongoDB GridFS, PostgreSQL BLOBs, Amazon S3)
Perl
8
star
13

rss-fetcher

Intelligently fetch lists of URLs from a large collection of RSS Feeds as part of the Media Cloud Directory.
Python
5
star
14

cliff-api-client

A Python client for the CLIFF geoparsing tool
Python
5
star
15

email-templates

Templates for emails that Media Cloud sends.
HTML
4
star
16

wayback-news-client

A client library to access the Wayback Machine news archive search.
Python
4
star
17

word-embeddings-server

Helpful micro-service to return results from word2vec models
Python
2
star
18

glimpse

Get a glimpse of attention to a topic on social media.
Python
2
star
19

docker-compose-just-quieter

Docker Compose CLI utility wrapper which makes `docker-compose` quieter.
Python
2
star
20

postgresql-citus-aws-graviton2

PostgreSQL built for AWS Graviton2
2
star
21

sitemap-tools

simple toolkit of tools for consuming sitemaps
Python
2
star
22

fernandos-csv-randomizer

Fernando's CSV randomizer -- reads a CSV file, picks a specified number of random rows and writes them to a separate file
Python
1
star
23

cliff-homepage

A simple homepage for the CLIFF project
HTML
1
star
24

hausastemmer

Hausa language stemmer (Bimba et al., 2015)
Python
1
star
25

clavin-build-geonames-index

Builds and releases CLAVIN GeoNames.org index as a binary
1
star
26

sous-chef

Configurable Data Analytics Pipeline
Python
1
star
27

news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
Python
1
star
28

story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
Python
1
star