• Stars
    star
    7
  • Rank 2,235,480 (Top 46 %)
  • Language
  • Created almost 7 years ago
  • Updated 6 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Ansible playbook for deploying a Storm cluster

More Repositories

1

storm-crawler

A scalable, mature and versatile web crawler based on Apache Storm
HTML
842
star
2

behemoth

Behemoth is an open source platform for large scale document analysis based on Apache Hadoop.
Java
282
star
3

TextClassification

A Text Classification API in Java originally developed by DigitalPebble Ltd. The API is independent from the ML implementations used and can be used as a front end to various ML algorithms. libSVM and liblinear are currently embedded.
Java
48
star
4

textclassification-examples

Use cases for DigitalPebble's TextClassification API
Java
10
star
5

stormcrawlerfight

Crawl configurations for benchmarking / testing StormCrawler
Shell
9
star
6

stormcrawler-docker

Resources for running StormCrawler with Docker services
Dockerfile
7
star
7

TextClassificationPlugin

GATE Processing Resource wrapping DigitalPebble's TextClassification API
Java
5
star
8

ngrams-api

Java API for querying a N-Grams corpus. Uses Lucene for searching and indexing from the Google Web-1T format
Java
4
star
9

behemoth-commoncrawl

Support for old (pre 2013) CommonCrawl dataset in Behemoth
Java
4
star
10

NutchFight

Resources for comparison between 1.8 and 2.x of Apache Nutch
Java
4
star
11

tescobank

Setup for crawling tescobank with SC
Java
4
star
12

sc-warc

WARC resources for StormCrawler
2
star
13

behemoth-textclassification

Module for classifying Behemoth documents with a model from our Text Classification API
Java
1
star
14

crawlurlfrontier

Crawl config used to test URL Frontier on a large scale and produce WARCs for CommonCrawl.
FLUX
1
star
15

behemoth-elasticsearch

ElasticSearch module for Behemoth
Java
1
star
16

urlfrontier-client

URLFrontier client written in Rust (mostly as a way of learning Rust)
Rust
1
star