SMSSpamClassifier-from-NLP-using-RandomForest-and-GradientBoosting-Classifier
Here i simply took SMSSpamClassifier data and using NLTK library along with RandomForest and GradientBoosting Classifier from sklearn.esemble, i classified if that SMS is spam or ham. Firstly, i cleaned raw data removing punctuations, stopwords and tokenizing along with stemming and lemmatizing. Then, moving forward i test vectoring with CountVectorize, N-grams and Tf-Idf vectorizer. And then, moving forward to feature enginnering i add two new features i.e. text length and percentage of punctuations on text.Then evaluate them if it was useful for detecting spam and if transformation was required. Finally it was applied to Gradient Boosting and Random forest classifier along with GridSearch and their performance was evaluated for selecting better hyperparameters using GridSearchCv.