There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Project-on-German-AFD-political-party-
How the vote sharing suddenly changes in German AFD political party, what are the main causes to change that sharing percentage for that i find some significant columns which are given in dataKoorimikiran369
oyo-rooms-project
LinearRegression-on-Boston-Dataset
Multiple-Regression-on-Hospital-Expenditure
In this repository for the given continuous data build the Multiple Linear regression modelCurrency-Notes-Detection-using-OpenCV
It is a Real Project of Image ProcessingBoosting-Concepts
Ada Boost classifier and Gradient BoostingLoan-Prediction
IRIS-DataSet-Explanation-with-EDA-and-ML-Algorithms
Explaining the IRIS dataset EDA with all plots in pandas and All the ML Classifier algorithmsNLP-Project-for-Spam-Detection
Telecom-churn-data
WebSrappingOn-BankBazaar
webscrping
Satistics_Lectures
Web-Scraping-about-data-for-COVID-19
extracting the data from a website using beautiful soupMultiple-Regression-on-Taxi-Fare-
Market-Basket-Analysis
Cluster-Analysis-for-mall-customers
K-means clustering an Agglomerative clusteringBreast-Cancer-Predictions
Inbuilt Dataset From scikit learnPrediction-on-Median-house-value
Random-Forest
Weather-Data-Clustering-using-k-Means
Naivy-Base-Model
Support-Vector-Machine
Quora-Question-Pairing
Quora Question Pair Similarity Over 100 million people visit Quora every month, so it's no surprise that many people ask similarly worded questions. Multiple questions with the same intent can cause seekers to spend more time finding the best answer to their question, and make writers feel they need to answer multiple versions of the same question. Quora values canonical questions because they provide a better experience to active seekers and writers, and offer more value to both of these groups in the long term. The main aim of the project is to predict whether a pair of questions are similar or not. Problem Statement: Identify which questions asked on Quora are duplicates of questions that have already been asked. Real world/Business Objectives and Constraints: The cost of a mis-classification can be very high. You would want a probability of a pair of questions to be duplicates so that you can choose any threshold of choice. No strict latency concerns. Interpretability is partially important. Tasks to perform: Import the General libraries, NLP module, and Machine learning modules Load the dataset Text Preprocessing: Removing html tags Removing Punctuations Performing stemming Removing Stop words Expanding contractions etc. Apply Tokenization Apply Stemming Apply Pos Tagging Apply Lemmatization Apply label encoding Feature Extraction Apply BOW Apply TFIDF vectorizer Apply Word2Vector vectorizer Apply Glove Data preprocessing Model Building Evaluate the model confusion matrix Classification report Data Overview Data will be in a file Train.csv Train.csv contains 5 columns : qid1, qid2, question1, question2, is_duplicate Size of Train.csv - 60MB Number of rows in Train.csv = 404,290 Mapping the real world problem to an ML problem Datalink: https://drive.google.com/file/d/10QDGTSI5PEV9e7CTpfzsXRpUwRIsJA-J/view?usp=sharing Type of Machine Learning Problem It is a binary classification problem, for a given pair of questions we need to predict if they are duplicate or not.Love Open Source and this site? Check out how you can help us