• Stars
    star
    2
  • Language
  • Created over 10 years ago
  • Updated over 10 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Packaged CRX distribution for Internet Archive "Save a Page" Plug-In

More Repositories

1

MapReduceAlgorithms

Data-Intensive Text Processing with MapReduce
TeX
620
star
2

guide

The Student's Guide to @lintool
280
star
3

Cloud9

Cloud9 is a Hadoop toolkit for working with big data
Java
236
star
4

twitter-tools

Twitter Tools
Java
217
star
5

warcbase

Warcbase is an open-source platform for managing analyzing web archives
Java
161
star
6

Mr.LDA

Scalable Topic Modeling using Variational Inference in MapReduce
Java
149
star
7

bespin

Reference implementations of data-intensive algorithms in MapReduce and Spark
Java
81
star
8

Ivory

A Hadoop toolkit for web-scale information retrieval research
Java
79
star
9

bigdata-2018w

CS 451/651 431/631 Data-Intensive Distribute Computing (Winter 2018) at the University of Waterloo
HTML
71
star
10

bigcows

Scrapes citation statistics from Google Scholar
JavaScript
59
star
11

UMD-courses

Course homepages for courses that I've taught at the University of Maryland
HTML
53
star
12

IR-Reproducibility

Open-Source Information Retrieval Reproducibility Challenge
Shell
50
star
13

my-data-is-bigger-than-your-data

My data is bigger than your data!
HTML
39
star
14

SparkTutorial

Spark Tutorial at the University of Maryland
38
star
15

bigdata-2016w

CS 489/698 Big Data Infrastructure (Winter 2016) at the University of Waterloo
HTML
38
star
16

wikiclean

A Java Wikipedia markup to plain text converter
Java
37
star
17

clueweb

Hadoop tools for manipulating ClueWeb collections
Java
26
star
18

chrome-archive-this-page

Internet Archive "Save a Page" Plug-In for Chrome
JavaScript
23
star
19

bigdata-2018f

CS 451/651 Data-Intensive Distribute Computing (Fall 2018) at the University of Waterloo
HTML
23
star
20

tools

Lintools: tools by @lintool
Java
22
star
21

art-science-empirical-cs-2022f

The Art and Science of Empirical Computer Science (Fall 2022)
20
star
22

bigdata-2017w

CS 489/698 Big Data Infrastructure (Winter 2017) at the University of Waterloo
HTML
15
star
23

TweetAnalysisWithSpark

Tweet Analysis with Spark
Scala
15
star
24

robust04-analysis

Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)
Python
12
star
25

JScene

A proof-of-concept in-browser JavaScript-based search engine
JavaScript
12
star
26

JASS

Anytime Ranking for Impact-Ordered Indexes
C
12
star
27

GrimmerSenatePressReleases

Grimmer's Senate Press Releases
Python
10
star
28

Enron2mbox

Converting the Enron email collection to mbox format
Python
10
star
29

OptTrees

Source code for: Nima Asadi, Jimmy Lin, and Arjen P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE Transactions on Knowledge and Data Engineering, 26(9):2281-2292, 2014.
C
9
star
30

non-blind-review

My proposal for non-blind reviewing at *ACL
6
star
31

art-science-empirical-cs-2023f

The Art and Science of Empirical Computer Science (Fall 2023)
6
star
32

IR-Reproducibility2

The Replicability of IR Replicability Experiments
Shell
5
star
33

UROC-projects

Undergraduate Research Opportunities Conference sponsored by the University of Waterloo
5
star
34

ClueWeb09-TREC-LTR

learning-to-rank dataset extracted from ClueWeb09 using TREC judgments
5
star
35

Cassovary-vs-GraphJet

Performance comparison between Cassovary and GraphJet
5
star
36

bespin-data

Datasets for Bespin
Python
4
star
37

Tweets2013-stats

4
star
38

robust04-analysis-papers

4
star
39

AnseriniMaven

Maven repo for some Anserini dependencies.
3
star
40

nyt-covid-map

HTML
3
star
41

c-bfscan

Implementations of brute force scans for document retrieval in C
C
3
star
42

MSMARCO-Document-Ranking-Archive.test

CSS
2
star
43

GiraphTutorial

Giraph Tutorial
2
star
44

MSMARCO-Document-Ranking-Archive

Python
2
star
45

Zambezi

Real-time indexer and search engine
C
2
star
46

NSF-projects

NSF project homepages
CSS
2
star
47

bfscan

Document retrieval using brute force scans
Java
2
star
48

wiki-tools

Collection of tools for working with Wikipedia
Java
2
star
49

msmarco-docker

Dockerfile
2
star
50

tools-javadoc

HTML
2
star
51

hadoop1-data

1
star
52

IR-Reproducibility-exp

Experimental runs from the Open-Source Information Retrieval Reproducibility Challenge.
MAXScript
1
star
53

TweetTap

Simple program to tap the Twitter sample stream
Java
1
star
54

chrome-scholar-search-extension

Google Scholar Search Extension for Chrome
JavaScript
1
star
55

trec-mb-vis

Visualization of TREC Microblog Track relevance judgments
JavaScript
1
star
56

clueweb09en01-webgraph

Webgraph for ClueWeb09 Category B
1
star
57

cs-big-cows

List of people with great achievements in Computer Science
Python
1
star