• Stars
    star
    1
  • Language
    Python
  • Created over 6 years ago
  • Updated over 6 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Language-Detection-From-Text---Bi-gram-based It uses Bi-gram language model and bi-gram frequency addition classifier for language identification task. Trained over 6 languages namely German, English, Spanish, French, Italian and Dutch. The original source of the text corpus is wortschatz leipzig corpora. Both the train and test corpus were taken from this corpora. The training corpus consists of 30000 sentences from news/web domain. Test corpus 10000 unseen sentences from news/web domain. Also, the chosen six languages were such that the same languages are present in the LIGA twitter dataset which consists of 9066 tweets. Note : Directory path used for train and test corpus in code language-test.py, language-train.py and liga_test.py needs to be properly set accordingly.

More Repositories

1

Text-classification-and-clustering

It demonstrates the example of text classification and text clustering using K-NN and K-Means models based on tf-idf features.
Python
2
star
2

Predict-the-Happiness-HackerEarth-Challenge

It uses 2-layered fully connected/Dense Neural network model to predict whether the hotel reviews at TripAdvisor site are positive sentiment or negative sentiment. It is a python implementation utilizing Keras library for DNN. This problem statement came from a HackerEarth challenge: "Predict the Happiness" The accuracy score achieved was 88% when prediction file (sample_submisson.csv) is uploaded to their portal. The link for corpus/dataset download is given in blog-post.
Python
2
star
3

index

My_Site
HTML
1
star
4

Titanic-Sink-Analysis

The project is based on statistical analysis with R, which provides the survival prediction based on age,sex ratio,tickets,male,female,children etc.
R
1
star
5

ChatBot

This ChatBot is based on Python with NLTK. Its a basic chatbot.
Python
1
star
6

Sentiment-Analysis-using-tf-idf---Polarity-dataset

It uses machine learning models to do sentiment polarity analysis on movie reviews. In other words, to classify opinions expressed in a text review (document) in order to determine whether the reviewer’s sentiment towards the movie is positive or negative.
Python
1
star
7

Object-recognition

In this blog-post, we will demonstrate how to achieve 90% accuracy in object recognition task on CIFAR-10 dataset with help of following concepts: 1. Deep Network Architecture 2. Data Augmentation 3. Regularization
Python
1
star
8

Mail-Spam-Filtering

Mail-Spam-Filtering It uses machine learning models to predict whether the email is spam or ligitimate. Best thing would be to follow my blog-post for implementation. The description about the steps to build a spam filter from scratch can be read from my blog: https://appliedmachinelearning.wordpress.com/2017/01/23/nlp-blog-post/ It is a python implementation using Naive Bayes Classifier and Support Vector Machines from Scikit-learn ML library. The results has been shown on two publicly open corpus. Ling-spam corpus Euron-spam corpus The link for corpus/dataset download is given in blog-post. Note : Directory path used for training and testing models in lingspam_filter.py and euron-spamfilter.py needs to be properly set accordingly.
Python
1
star