There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Text-classification-and-clustering
It demonstrates the example of text classification and text clustering using K-NN and K-Means models based on tf-idf features.index
My_SiteTitanic-Sink-Analysis
The project is based on statistical analysis with R, which provides the survival prediction based on age,sex ratio,tickets,male,female,children etc.ChatBot
This ChatBot is based on Python with NLTK. Its a basic chatbot.Sentiment-Analysis-using-tf-idf---Polarity-dataset
It uses machine learning models to do sentiment polarity analysis on movie reviews. In other words, to classify opinions expressed in a text review (document) in order to determine whether the reviewerโs sentiment towards the movie is positive or negative.Object-recognition
In this blog-post, we will demonstrate how to achieve 90% accuracy in object recognition task on CIFAR-10 dataset with help of following concepts: 1. Deep Network Architecture 2. Data Augmentation 3. RegularizationMail-Spam-Filtering
Mail-Spam-Filtering It uses machine learning models to predict whether the email is spam or ligitimate. Best thing would be to follow my blog-post for implementation. The description about the steps to build a spam filter from scratch can be read from my blog: https://appliedmachinelearning.wordpress.com/2017/01/23/nlp-blog-post/ It is a python implementation using Naive Bayes Classifier and Support Vector Machines from Scikit-learn ML library. The results has been shown on two publicly open corpus. Ling-spam corpus Euron-spam corpus The link for corpus/dataset download is given in blog-post. Note : Directory path used for training and testing models in lingspam_filter.py and euron-spamfilter.py needs to be properly set accordingly.Language-Detection-From-Text---Bi-gram-based
Language-Detection-From-Text---Bi-gram-based It uses Bi-gram language model and bi-gram frequency addition classifier for language identification task. Trained over 6 languages namely German, English, Spanish, French, Italian and Dutch. The original source of the text corpus is wortschatz leipzig corpora. Both the train and test corpus were taken from this corpora. The training corpus consists of 30000 sentences from news/web domain. Test corpus 10000 unseen sentences from news/web domain. Also, the chosen six languages were such that the same languages are present in the LIGA twitter dataset which consists of 9066 tweets. Note : Directory path used for train and test corpus in code language-test.py, language-train.py and liga_test.py needs to be properly set accordingly.Love Open Source and this site? Check out how you can help us