• Stars
    star
    1
  • Language
    Jupyter Notebook
  • Created over 4 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Here i simply took SMSSpamClassifier data and using NLTK library along with RandomForest and GradientBoosting Classifier from sklearn.esemble, i classified if that SMS is spam or ham. Firstly, i cleaned raw data removing punctuations, stopwords and tokenizing along with stemming and lemmatizing. Then, moving forward i test vectoring with CountVectorize, N-grams and Tf-Idf vectorizer. And then, moving forward to feature enginnering i add two new features i.e. text length and percentage of punctuations on text.Then evaluate them if it was useful for detecting spam and if transformation was required. Finally it was applied to Gradient Boosting and Random forest classifier along with GridSearch and their performance was evaluated for selecting better hyperparameters using GridSearchCv.