• Stars
    star
    4
  • Rank 3,304,323 (Top 66 %)
  • Language
  • Created over 11 years ago
  • Updated over 11 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

More Repositories

1

MapReduceAlgorithms

Data-Intensive Text Processing with MapReduce
TeX
620
star
2

guide

The Student's Guide to @lintool
280
star
3

Cloud9

Cloud9 is a Hadoop toolkit for working with big data
Java
236
star
4

twitter-tools

Twitter Tools
Java
217
star
5

warcbase

Warcbase is an open-source platform for managing analyzing web archives
Java
161
star
6

Mr.LDA

Scalable Topic Modeling using Variational Inference in MapReduce
Java
149
star
7

bespin

Reference implementations of data-intensive algorithms in MapReduce and Spark
Java
81
star
8

Ivory

A Hadoop toolkit for web-scale information retrieval research
Java
79
star
9

bigdata-2018w

CS 451/651 431/631 Data-Intensive Distribute Computing (Winter 2018) at the University of Waterloo
HTML
71
star
10

bigcows

Scrapes citation statistics from Google Scholar
JavaScript
59
star
11

UMD-courses

Course homepages for courses that I've taught at the University of Maryland
HTML
53
star
12

IR-Reproducibility

Open-Source Information Retrieval Reproducibility Challenge
Shell
50
star
13

my-data-is-bigger-than-your-data

My data is bigger than your data!
HTML
39
star
14

SparkTutorial

Spark Tutorial at the University of Maryland
38
star
15

bigdata-2016w

CS 489/698 Big Data Infrastructure (Winter 2016) at the University of Waterloo
HTML
38
star
16

wikiclean

A Java Wikipedia markup to plain text converter
Java
37
star
17

clueweb

Hadoop tools for manipulating ClueWeb collections
Java
26
star
18

chrome-archive-this-page

Internet Archive "Save a Page" Plug-In for Chrome
JavaScript
23
star
19

bigdata-2018f

CS 451/651 Data-Intensive Distribute Computing (Fall 2018) at the University of Waterloo
HTML
23
star
20

tools

Lintools: tools by @lintool
Java
22
star
21

art-science-empirical-cs-2022f

The Art and Science of Empirical Computer Science (Fall 2022)
20
star
22

bigdata-2017w

CS 489/698 Big Data Infrastructure (Winter 2017) at the University of Waterloo
HTML
15
star
23

TweetAnalysisWithSpark

Tweet Analysis with Spark
Scala
15
star
24

robust04-analysis

Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)
Python
12
star
25

JScene

A proof-of-concept in-browser JavaScript-based search engine
JavaScript
12
star
26

JASS

Anytime Ranking for Impact-Ordered Indexes
C
12
star
27

GrimmerSenatePressReleases

Grimmer's Senate Press Releases
Python
10
star
28

Enron2mbox

Converting the Enron email collection to mbox format
Python
10
star
29

OptTrees

Source code for: Nima Asadi, Jimmy Lin, and Arjen P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE Transactions on Knowledge and Data Engineering, 26(9):2281-2292, 2014.
C
9
star
30

non-blind-review

My proposal for non-blind reviewing at *ACL
6
star
31

art-science-empirical-cs-2023f

The Art and Science of Empirical Computer Science (Fall 2023)
6
star
32

IR-Reproducibility2

The Replicability of IR Replicability Experiments
Shell
5
star
33

UROC-projects

Undergraduate Research Opportunities Conference sponsored by the University of Waterloo
5
star
34

ClueWeb09-TREC-LTR

learning-to-rank dataset extracted from ClueWeb09 using TREC judgments
5
star
35

Cassovary-vs-GraphJet

Performance comparison between Cassovary and GraphJet
5
star
36

bespin-data

Datasets for Bespin
Python
4
star
37

robust04-analysis-papers

4
star
38

AnseriniMaven

Maven repo for some Anserini dependencies.
3
star
39

nyt-covid-map

HTML
3
star
40

c-bfscan

Implementations of brute force scans for document retrieval in C
C
3
star
41

MSMARCO-Document-Ranking-Archive.test

CSS
2
star
42

GiraphTutorial

Giraph Tutorial
2
star
43

MSMARCO-Document-Ranking-Archive

Python
2
star
44

Zambezi

Real-time indexer and search engine
C
2
star
45

chrome-archive-this-page-crx

Packaged CRX distribution for Internet Archive "Save a Page" Plug-In
2
star
46

NSF-projects

NSF project homepages
CSS
2
star
47

bfscan

Document retrieval using brute force scans
Java
2
star
48

wiki-tools

Collection of tools for working with Wikipedia
Java
2
star
49

msmarco-docker

Dockerfile
2
star
50

tools-javadoc

HTML
2
star
51

hadoop1-data

1
star
52

IR-Reproducibility-exp

Experimental runs from the Open-Source Information Retrieval Reproducibility Challenge.
MAXScript
1
star
53

TweetTap

Simple program to tap the Twitter sample stream
Java
1
star
54

chrome-scholar-search-extension

Google Scholar Search Extension for Chrome
JavaScript
1
star
55

trec-mb-vis

Visualization of TREC Microblog Track relevance judgments
JavaScript
1
star
56

clueweb09en01-webgraph

Webgraph for ClueWeb09 Category B
1
star
57

cs-big-cows

List of people with great achievements in Computer Science
Python
1
star