There are no reviews yet. Be the first to send feedback to the community and the maintainers!
MapReduceAlgorithms
Data-Intensive Text Processing with MapReduceguide
The Student's Guide to @lintoolCloud9
Cloud9 is a Hadoop toolkit for working with big datatwitter-tools
Twitter Toolswarcbase
Warcbase is an open-source platform for managing analyzing web archivesMr.LDA
Scalable Topic Modeling using Variational Inference in MapReducebespin
Reference implementations of data-intensive algorithms in MapReduce and SparkIvory
A Hadoop toolkit for web-scale information retrieval researchbigdata-2018w
CS 451/651 431/631 Data-Intensive Distribute Computing (Winter 2018) at the University of Waterloobigcows
Scrapes citation statistics from Google ScholarUMD-courses
Course homepages for courses that I've taught at the University of MarylandIR-Reproducibility
Open-Source Information Retrieval Reproducibility Challengemy-data-is-bigger-than-your-data
My data is bigger than your data!SparkTutorial
Spark Tutorial at the University of Marylandbigdata-2016w
CS 489/698 Big Data Infrastructure (Winter 2016) at the University of Waterloowikiclean
A Java Wikipedia markup to plain text converterclueweb
Hadoop tools for manipulating ClueWeb collectionschrome-archive-this-page
Internet Archive "Save a Page" Plug-In for Chromebigdata-2018f
CS 451/651 Data-Intensive Distribute Computing (Fall 2018) at the University of Waterlootools
Lintools: tools by @lintoolbigdata-2017w
CS 489/698 Big Data Infrastructure (Winter 2017) at the University of WaterlooTweetAnalysisWithSpark
Tweet Analysis with Sparkrobust04-analysis
Meta-Analysis of Robust04 Papers (Yang et al., SIGIR 2019)JScene
A proof-of-concept in-browser JavaScript-based search engineJASS
Anytime Ranking for Impact-Ordered IndexesGrimmerSenatePressReleases
Grimmer's Senate Press ReleasesEnron2mbox
Converting the Enron email collection to mbox formatOptTrees
Source code for: Nima Asadi, Jimmy Lin, and Arjen P. de Vries. Runtime Optimizations for Tree-Based Machine Learning Models. IEEE Transactions on Knowledge and Data Engineering, 26(9):2281-2292, 2014.non-blind-review
My proposal for non-blind reviewing at *ACLart-science-empirical-cs-2023f
The Art and Science of Empirical Computer Science (Fall 2023)IR-Reproducibility2
The Replicability of IR Replicability ExperimentsUROC-projects
Undergraduate Research Opportunities Conference sponsored by the University of WaterlooClueWeb09-TREC-LTR
learning-to-rank dataset extracted from ClueWeb09 using TREC judgmentsCassovary-vs-GraphJet
Performance comparison between Cassovary and GraphJetbespin-data
Datasets for BespinTweets2013-stats
robust04-analysis-papers
AnseriniMaven
Maven repo for some Anserini dependencies.nyt-covid-map
c-bfscan
Implementations of brute force scans for document retrieval in CMSMARCO-Document-Ranking-Archive.test
GiraphTutorial
Giraph TutorialMSMARCO-Document-Ranking-Archive
Zambezi
Real-time indexer and search enginechrome-archive-this-page-crx
Packaged CRX distribution for Internet Archive "Save a Page" Plug-InNSF-projects
NSF project homepagesbfscan
Document retrieval using brute force scanswiki-tools
Collection of tools for working with Wikipediamsmarco-docker
tools-javadoc
hadoop1-data
IR-Reproducibility-exp
Experimental runs from the Open-Source Information Retrieval Reproducibility Challenge.TweetTap
Simple program to tap the Twitter sample streamchrome-scholar-search-extension
Google Scholar Search Extension for Chrometrec-mb-vis
Visualization of TREC Microblog Track relevance judgmentsclueweb09en01-webgraph
Webgraph for ClueWeb09 Category Bcs-big-cows
List of people with great achievements in Computer ScienceLove Open Source and this site? Check out how you can help us