Discover Surendra414/Language-Detection-From-Text---Bi-gram-based Open

Stars
1
Language
Python
Created over 6 years ago
Updated over 6 years ago

Surendra414

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Language-Detection-From-Text---Bi-gram-based It uses Bi-gram language model and bi-gram frequency addition classifier for language identification task. Trained over 6 languages namely German, English, Spanish, French, Italian and Dutch. The original source of the text corpus is wortschatz leipzig corpora. Both the train and test corpus were taken from this corpora. The training corpus consists of 30000 sentences from news/web domain. Test corpus 10000 unseen sentences from news/web domain. Also, the chosen six languages were such that the same languages are present in the LIGA twitter dataset which consists of 9066 tweets. Note : Directory path used for train and test corpus in code language-test.py, language-train.py and liga_test.py needs to be properly set accordingly.

Text-classification-and-clustering

It demonstrates the example of text classification and text clustering using K-NN and K-Means models based on tf-idf features.

Python

Predict-the-Happiness-HackerEarth-Challenge

It uses 2-layered fully connected/Dense Neural network model to predict whether the hotel reviews at TripAdvisor site are positive sentiment or negative sentiment. It is a python implementation utilizing Keras library for DNN. This problem statement came from a HackerEarth challenge: "Predict the Happiness" The accuracy score achieved was 88% when prediction file (sample_submisson.csv) is uploaded to their portal. The link for corpus/dataset download is given in blog-post.

Python

index

My_Site

HTML

Titanic-Sink-Analysis

The project is based on statistical analysis with R, which provides the survival prediction based on age,sex ratio,tickets,male,female,children etc.