• Stars
    star
    1
  • Language
    Jupyter Notebook
  • Created over 4 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repository contains all about data preprocessing, EDA, Model Building

More Repositories

1

Project-on-German-AFD-political-party-

How the vote sharing suddenly changes in German AFD political party, what are the main causes to change that sharing percentage for that i find some significant columns which are given in data
Jupyter Notebook
2
star
2

Koorimikiran369

2
star
3

oyo-rooms-project

Jupyter Notebook
2
star
4

LinearRegression-on-Boston-Dataset

Jupyter Notebook
2
star
5

Multiple-Regression-on-Hospital-Expenditure

In this repository for the given continuous data build the Multiple Linear regression model
Jupyter Notebook
1
star
6

Currency-Notes-Detection-using-OpenCV

It is a Real Project of Image Processing
1
star
7

Boosting-Concepts

Ada Boost classifier and Gradient Boosting
Jupyter Notebook
1
star
8

Loan-Prediction

Jupyter Notebook
1
star
9

IRIS-DataSet-Explanation-with-EDA-and-ML-Algorithms

Explaining the IRIS dataset EDA with all plots in pandas and All the ML Classifier algorithms
Jupyter Notebook
1
star
10

NLP-Project-for-Spam-Detection

Jupyter Notebook
1
star
11

Telecom-churn-data

Jupyter Notebook
1
star
12

WebSrappingOn-BankBazaar

Jupyter Notebook
1
star
13

webscrping

1
star
14

Satistics_Lectures

Jupyter Notebook
1
star
15

Web-Scraping-about-data-for-COVID-19

extracting the data from a website using beautiful soup
Jupyter Notebook
1
star
16

Multiple-Regression-on-Taxi-Fare-

1
star
17

Market-Basket-Analysis

Jupyter Notebook
1
star
18

Cluster-Analysis-for-mall-customers

K-means clustering an Agglomerative clustering
Jupyter Notebook
1
star
19

Breast-Cancer-Predictions

Inbuilt Dataset From scikit learn
Jupyter Notebook
1
star
20

Prediction-on-Median-house-value

Jupyter Notebook
1
star
21

Random-Forest

Jupyter Notebook
1
star
22

Weather-Data-Clustering-using-k-Means

Jupyter Notebook
1
star
23

Naivy-Base-Model

HTML
1
star
24

Support-Vector-Machine

Jupyter Notebook
1
star
25

Quora-Question-Pairing

Quora Question Pair Similarity Over 100 million people visit Quora every month, so it's no surprise that many people ask similarly worded questions. Multiple questions with the same intent can cause seekers to spend more time finding the best answer to their question, and make writers feel they need to answer multiple versions of the same question. Quora values canonical questions because they provide a better experience to active seekers and writers, and offer more value to both of these groups in the long term. The main aim of the project is to predict whether a pair of questions are similar or not. Problem Statement: Identify which questions asked on Quora are duplicates of questions that have already been asked. Real world/Business Objectives and Constraints: The cost of a mis-classification can be very high. You would want a probability of a pair of questions to be duplicates so that you can choose any threshold of choice. No strict latency concerns. Interpretability is partially important. Tasks to perform: Import the General libraries, NLP module, and Machine learning modules Load the dataset Text Preprocessing: Removing html tags Removing Punctuations Performing stemming Removing Stop words Expanding contractions etc. Apply Tokenization Apply Stemming Apply Pos Tagging Apply Lemmatization Apply label encoding Feature Extraction Apply BOW Apply TFIDF vectorizer Apply Word2Vector vectorizer Apply Glove Data preprocessing Model Building Evaluate the model confusion matrix Classification report Data Overview Data will be in a file Train.csv Train.csv contains 5 columns : qid1, qid2, question1, question2, is_duplicate Size of Train.csv - 60MB Number of rows in Train.csv = 404,290 Mapping the real world problem to an ML problem Datalink: https://drive.google.com/file/d/10QDGTSI5PEV9e7CTpfzsXRpUwRIsJA-J/view?usp=sharing Type of Machine Learning Problem It is a binary classification problem, for a given pair of questions we need to predict if they are duplicate or not.
1
star