USC Information Retrieval & Data Science (@USCDataScience)

Top repositories

1

sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
Java
404
star
2

supervising-ui

Web UI for labelling dataset for supervised learning.
Python
78
star
3

Image-Similarity-Deep-Ranking

Deep Ranking based ImageSimilarity will be developed as plugin on ImageSpace. https://users.eecs.northwestern.edu/~jwa368/pdfs/deep_ranking.pdf
Python
36
star
4

SentimentAnalysisParser

Combines Apache OpenNLP and Apache Tika and provides facilities for automatically deriving sentiment from text.
31
star
5

dl4j-kerasimport-examples

This repository contains deeplearning4j examples for importing and making use of models trained in keras
Java
27
star
6

NLTKRest

This is a REST Server endpoint built using Flask and Python.
Java
24
star
7

tika-dockers

A suite of Machine Learning / Deep Learning Dockerfiles to allow Apache Tika to extract objects and to produce textual captions for images and video
21
star
8

polar.usc.edu

Polar USC activities related to NSF Polar CyberInfrastructure program at the University of Southern California
HTML
15
star
9

AgePredictor

Age classification from text using PAN16, blogs, Fisher Callhome, and Cancer Forum
Java
15
star
10

polar-deep-insights

Conceptual - Temporal - Spatial analysis of the trec polar dataset
JavaScript
10
star
11

hadoop-pot

A scalable Apache Hadoop-based implementation of the Pooled Time Series video similarity algorithm based on M. Ryoo et al paper CVPR 2015.
Java
10
star
12

uscdatascience.github.io

USC Information Retrieval and Data Science Group
HTML
9
star
13

parser-indexer-py

Python tools for parsing documents and building the inverted index with enriched metadata. Java version with slightly different features - https://github.com/USCDataScience/parser-indexer
Jupyter Notebook
9
star
14

video-recognition

Python
8
star
15

TextREST.jl

Language Detection REST Server using MIT Lincoln Lab’s Text.jl library
Julia
7
star
16

cmu-fg-bg-similarity

CMU Foreground/Background Similarity Server from DARPA MEMEX
C++
6
star
17

img2text

Models, and associated helper code for GSOC 2017 project Tensorflow Image to Text in Apache Tika
Python
6
star
18

counterfeit-electronics-tesseract

Training Tesseract to better extract serial numbers from images of electronic items
Java
6
star
19

svm-classifier-memex

Java
6
star
20

ufo.usc.edu

Collection of projects from IRDS students studying unidentified flying objects
HTML
6
star
21

pdi-topics

LDA Topic Modeling for Polar Data Insights
HTML
5
star
22

deepsentirank

Deep Learning based Sentiment Ranking for Multimedia
Python
5
star
23

file-content-analyzer

A set of python modules to perform Byte Frequency Analysis, Byte Frequency Correlation, Cross Correlation and FHT analysis on files
Python
5
star
24

PersonaExtraction

Java
4
star
25

imagecat2

Imagecat Version 2
XSLT
4
star
26

nutch-analytics

Nutch Crawl Analysis - Spark based project
Scala
4
star
27

memex-cca-esindex

Python
3
star
28

TrojanFootball

Analyses athletes past performance and workload for a better training
Java
2
star
29

counterfeit-crawling

Focused Crawling and Evaluation of Counterfeit Electronics Sites
Python
2
star
30

tika-dl-models

A place to release saved machine learning models for tika-dl
2
star
31

sparkler-jsdriver

Java
1
star
32

file-content-visualizer

Visualizations for Byte frequency analysis, Byte frequency correlation, Byte frequency cross-correlation and FHT.
CSS
1
star
33

sparkler-ui

JavaScript
1
star
34

PlanetaryIR

Information Retrieval for Planetary Science using DeepDive
Shell
1
star