Kavita Ganesan (@kavgan)

Top repositories

1

nlp-in-practice

Starter code to solve real world text data problems. Includes: Gensim Word2Vec, phrase embeddings, Text Classification with Logistic Regression, word count with pyspark, simple text preprocessing, pre-trained embeddings and more.
Jupyter Notebook
1,068
star
2

ROUGE-2.0

ROUGE automatic summarization evaluation toolkit. Support for ROUGE-[N, L, S, SU], stemming and stopwords in different languages, unicode text evaluation, CSV output.
Java
194
star
3

phrase-at-scale

Detect common phrases in large amounts of text using a data-driven approach. Size of discovered phrases can be arbitrary. Can be used in languages other than English
Python
125
star
4

opinosis-summarization

This repo contains code and dataset for the Opinosis Summarization Framework
50
star
5

OpinRank

OpinRank Dataset. Dataset containing user reviews for entities namely cars and hotels. Full reviews from Tripadvisor (~259,000 reviews) and Edmunds (~42,230 reviews)
34
star
6

clinical-concepts

Discovering Related Clinical Concepts using Large Amounts of Clinical Notes. An unsupervised graphical approach to mine related concepts by leveraging the volume within large amounts of clinical notes.
22
star
7

spark-examples

Examples of code in spark
Python
9
star
8

stop-words

Stop word lists
4
star
9

hashtags_test

Test hashtags
2
star
10

Micropinion-Generation-Dataset

Dataset for Micropinion Generation. Dataset is based on user reviews from CNET. The reviews are on products from various categories like tv, cell phones, gps etc.
2
star
11

JavaPractice

Practice practice practice. Bubble sort, factorial, powerset, subarray, mergesort, remove duplicates, etc.
Java
1
star