Discover @vaitybharati Open Source projects

vaitybharati

Stars
243
Global Rank 105,633 (Top 4 %)
Followers 360
Following 1
Registered about 4 years ago
Most used languages

Jupyter Notebook
89.5 %

R
10.5 %
Location 🇮🇳 India
Country Total Rank 2,763
Country Ranking

R
57

Jupyter Notebook
209

Assignment-04-Simple-Linear-Regression-2

Assignment-04-Simple-Linear-Regression-2. Q2) Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization. Correlation Analysis. Model Building. Model Testing. Model Predictions.

Jupyter Notebook

Assignment-1-Q24-Basic-Statistics-Level-1-

Q 24) A Government company claims that an average light bulb lasts 270 days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs last an average of 260 days, with a standard deviation of 90 days. If the CEO's claim were true, what is the probability that 18 randomly selected bulbs would have an average life of no more than 260 days

Jupyter Notebook

Assignment-11-Text-Mining-01-Elon-Musk

Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.

Jupyter Notebook

Assignment-05-Multiple-Linear-Regression-2

Assignment-05-Multiple-Linear-Regression-2. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model. R&D Spend -- Research and devolop spend in the past few years Administration -- spend on administration in the past few years Marketing Spend -- spend on Marketing in the past few years State -- states from which data is collected Profit -- profit of each state in the past few years.

Jupyter Notebook

Assignment-1-Q23-Basic-Statistics-Level-1-

Q 23) Calculate the t scores of 95% confidence interval, 96% confidence interval, 99% confidence interval for sample size of 25

Jupyter Notebook

Assignment-2-Set1-Q1-Basic-Statistic-Level-2-

Plot the data, find the outliers and find out μ,σ,σ^2

Jupyter Notebook

Multi-Linear-Reg

Jupyter Notebook

P27.-Supervised-ML---Multiple-Linear-Regression---Toyoto-Cars

Supervised-ML---Multiple-Linear-Regression---Toyota-Cars. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.

Jupyter Notebook

Assignment-1-Q9_a-Basic-Statistics-Level-1-

Q9) Calculate Skewness, Kurtosis & draw inferences on the following data Cars speed and distance Use Q9_a.csv

Jupyter Notebook

P24.-Supervised-ML---Simple-Linear-Regression---Newspaper-data

Supervised-ML---Simple-Linear-Regression---Newspaper-data. EDA and Visualization, Correlation Analysis, Model Building, Model Testing, Model predictions.

Jupyter Notebook

Assignment-07-Clustering-Hierarchical-Airlines-

Assignment-07-Clustering-Hierarchical-Airlines. Perform clustering (hierarchical) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.

Jupyter Notebook

Assignment-1-Q20-Basic-Statistics-Level-1-

Data _set: Cars.csv Calculate the probability of MPG of Cars for the below cases. MPG <- Cars$MPG a. P(MPG>38) b. P(MPG<40) c. P (20<MPG<50)

Jupyter Notebook

Assignment-05-Multiple-Linear-Regression-1

Multiple-Linear-Regression-1. Consider only the below columns and prepare a prediction model for predicting Price of Toyota Corolla.

Jupyter Notebook

P36.-Supervised-ML---Decision-Tree---C5.0-Entropy-Iris-Flower-

Supervised-ML-Decision-Tree-C5.0-Entropy-Iris-Flower-Using Entropy Criteria - Classification Model. Import Libraries and data set, EDA, Apply Label Encoding, Model Building - Building/Training Decision Tree Classifier (C5.0) using Entropy Criteria. Validation and Testing Decision Tree Classifier (C5.0) Model

Jupyter Notebook

P23.-EDA-1

EDA (Exploratory Data Analysis) -1: Loading the Datasets, Data type conversions,Removing duplicate entries, Dropping the column, Renaming the column, Outlier Detection, Missing Values and Imputation (Numerical and Categorical), Scatter plot and Correlation analysis, Transformations, Automatic EDA Methods (Pandas Profiling and Sweetviz).

Jupyter Notebook

Assignment-04-Simple-Linear-Regression-1

Assignment-04-Simple-Linear-Regression-1. Q1) Delivery_time -> Predict delivery time using sorting time. Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization, Feature Engineering, Correlation Analysis, Model Building, Model Testing and Model Predictions using simple linear regression.

Jupyter Notebook

Tableau-_Basics5

Tableau-_Basics Tutorial 4

Tableau_Basics8

Tableau_Basics Tutorial 8

Tableau_Basics2

Tableau_Basics2 tutorial

Tableau-_Basics3

Tableau-_Basics3 Tutorial

Probabilty-calc-2

Probability Calculation in Python

Jupyter Notebook

Tableau_Basics6

Tableau_Basics Tutorial 6

Tableau-_Basics4

Tableau-_Basics Tutorial 4

Tableau_Basics9

Tableau_Basics Tutorial 9

Survival-Analytics

Applying KaplanMeierFitter model on Time and Events

Jupyter Notebook

vaitybharati

Config files for my GitHub profile.

Tableau_Basics7

Tableau_Basics Tutorial 7

Assignment-1-Q21_b-Basic-Statistics-Level-1-

Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data set follows Normal Distribution

Jupyter Notebook

Assignment-1-Q7-Basic-Statistics-Level-1-

Q7) For Points,Score,Weigh: Find Mean, Median, Mode, Variance, Standard Deviation, and Range and also Comment about the values/ Draw some inferences. Use Q7.csv file

Jupyter Notebook

Assignment-1-Q9_b-Basic-Statistics-Level-1-

Q9) Calculate Skewness, Kurtosis & draw inferences on the following data SP and Weight(WT) Use Q9_b.csv

Jupyter Notebook

Tableau-Basics

Tableau basics tutorial

Assignment-1-Q12-Basic-Statistics-Level-1-

Below are the scores obtained by a student in tests 34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56. Find mean, median, variance, standard deviation. What can we say about the student marks?

Jupyter Notebook

Assignment-1-Q11-Basic-Statistics-Level-1-

Q11) Suppose we want to estimate the average weight of an adult male in Mexico. We draw a random sample of 2,000 men from a population of 3,000,000 men and weigh them. We find that the average person in our sample weighs 200 pounds, and the standard deviation of the sample is 30 pounds. Calculate 94%,98%,96% confidence interval?

Jupyter Notebook

Assignment-08-PCA-Data-Mining-Wine-

Assignment-08-PCA-Data-Mining-Wine data. Perform Principal component analysis and perform clustering using first 3 principal component scores (both heirarchial and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)

Jupyter Notebook

Assignment-2-Set2-Q5-Basic-Statistic-Level-2-

Consider a company that has two different divisions. The annual profits from the two divisions are independent and have distributions Profit1 ~ N(5, 3^2) and Profit2 ~ N(7, 4^2) respectively. Both the profits are in $ Million. Answer the following questions about the total profit of the company in Rupees. Assume that $1 = Rs. 45 A. Specify a Rupee range (centered on the mean) such that it contains 95% probability for the annual profit of the company. B. Specify the 5th percentile of profit (in Rupees) for the company C. Which of the two divisions has a larger probability of making a loss in a given year?

Jupyter Notebook

Assignment-1-Q22-Basic-Statistics-Level-1-

Q 22) Calculate the Z scores of 90% confidence interval,94% confidence interval, 60% confidence interval for Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data set

Jupyter Notebook

Assignment-1-Q21_a-Basic-Statistics-Level-1-

Q 21) Check whether the data follows normal distribution a) Check whether the MPG of Cars follows Normal Distribution

Jupyter Notebook

Assignment-03-Q1-Hypothesis-Testing-

A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions. Cutlets.csv

Jupyter Notebook

P34.-Unsupervised-ML---t-SNE-Data-Mining-Cancer-

Unsupervised-ML-t-SNE-Data-Mining-Cancer. Import Libraries, Import Dataset, Convert data to array format, Separate array into input and output components, TSNE implementation, Cluster Visualization

Jupyter Notebook

NN_Hyperparameter-Tuning

Tuning of Hyperparameters :- Batch Size and Epochs. Tuning of Hyperparameters:- Learning rate and Drop out rate. Tuning of Hyperparameters:- Activation Function and Kernel Initializer. Tuning of Hyperparameter :-Number of Neurons in activation layer. Training model with optimum values of Hyperparameters.

Jupyter Notebook

Assignment-03-Q3-Hypothesis-Testing-

Chi2 contengency independence test. Assume Null Hypothesis as Ho: Independence of categorical variables (male-female buyer rations are similar across regions (does not vary and are not related) Thus Alternate Hypothesis as Ha: Dependence of categorical variables (male-female buyer rations are NOT similar across regions (does vary and somewhat/significantly related)

Jupyter Notebook

P25.-Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data

Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data. EDA and data visualization, Correlation Analysis, Model Building, Model Testing, Model Prediction.

Jupyter Notebook

Assignment-06-Logistic-Regression

Assignment-06-Logistic-Regression. Output variable -> y y -> Whether the client has subscribed a term deposit or not Binomial ("yes" or "no") Attribute information For bank dataset Input variables: # bank client data: 1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has credit in default? (binary: "yes","no") 6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") # related with the last contact of the current campaign: 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) # other attributes: 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success") Output variable (desired target): 17 - y - has the client subscribed a term deposit? (binary: "yes","no") 8. Missing Attribute Values: None

Jupyter Notebook

Mysql-Students-table

Mysql-date-time

Datascience_python

Python code

Jupyter Notebook

Decision-Tree

Jupyter Notebook

Model-Validation-Methods

Jupyter Notebook

Basics-of-R-1

Basics-of-R-Tutorial 1

Mysql-Data-Manipulation

R Basics Tutorial-1

P03.-Pandas-3

Understanding Pandas, Visualization using Matplotlib, Plotting subplots

Jupyter Notebook

R_basics-homework-5_sept

R_basics - Visualizing Air Quality data

Hypothesis-Test

Hypothesis-Test in python

Jupyter Notebook

Ridge_Lasso_ElasticNet

Model Building and Testing using Ridge, Lasso and ElasticNet Methods

Jupyter Notebook

DB-Scan

Jupyter Notebook

Bagging-boosting-stacking

Jupyter Notebook

A15-Aczel-problems-practice-1-78-1-79-

Solution to Aczel problems practice (1-78, 1-79)

Jupyter Notebook

R_basics-homework

R_basics Functions

Simple-linear-Reg-1

Jupyter Notebook

Mysql-practice-tables

Hierarchical-Clustering

Jupyter Notebook

Visualization-Mat_Seaborn

Visualization using Matplotlib and Seaborn

Jupyter Notebook

R2 - Decision Making statements in R

R3 - Joins and Appling Functions in R

Confidence-Interval

Jupyter Notebook

KNN

K Nearest Neighbours in Python

Jupyter Notebook

R_basics_calc-2

R code 2

Classification_Case_study

Classification Project: Sonar rocks or mines

Jupyter Notebook

R-code-1a

A8-Aczel-problems-practice-1-48-1-51-1-53-

Jupyter Notebook

P14.-Confidence-Interval-for-Stocks

Find confidence intervals for Beml and Glaxo stocks. Confidence Interval Estimate

Jupyter Notebook

A5-Aczel-problems-practice-1-17-1-23-1-35-

Data: 23, 26, 29, 30, 32, 34, 37, 45, 57, 80, 102, 147, 210, 355, 782, 1209

Jupyter Notebook

Hypothesis-testing

Hypothesis Testing in Python

Jupyter Notebook

A17-Aczel-problems-practice-1-82-1-83-

Solution to Aczel problems practice (1-82, 1-83)

Jupyter Notebook

Day-3

R - Joins, Basic functions, and If else statements in R

R-code-2

P04.-Matplotlib-Visualization

Plotting two different categories- box plot, barplot, histogram. Plotting single category- Pie chart, bar chart. Different Plots- Scatter Plot, Histogram, Box Plot, Violin Plot

Jupyter Notebook

P07.-Chebyshev-s-practice

Chebyshev's Theorem 3/4th or 75% of observations lie 2 Standard deviations of mean i.e. mean+2SD and mean-2SD

Jupyter Notebook

Basics-of-R3

Basics-of-R Tutorial 3

P29.-Unsupervised-ML---Hierarchical-Clustering-Univ.-

Unsupervised-ML---Hierarchical-Clustering-University Data. Import libraries, Import dataset, Create Normalized data frame (considering only the numerical part of data), Create dendrograms, Create Clusters, Plot Clusters.

Jupyter Notebook

P01.-Pandas-1

Understanding Pandas, Importing datasets, Deriving Attributes, Performing Statistics

Jupyter Notebook

Anova

Jupyter Notebook

Matplotlip

MatPlotlib Python codes

Jupyter Notebook

P02.-Pandas-2

Understanding Pandas, Groupby Function, Filtering Function

Jupyter Notebook

A4-Aczel-problems-practice-1-16-1-22-1-34-

Following are the numbers of daily bids received by the government of a developing country from firms interested in winning a contract for the construction of a new port facility

Jupyter Notebook

Datascience_R

R code Tutorial

Reviews_Classification_Naive_Bayes

Data Cleaning, N-gram, WordCloud, Applying naive bayes for classification, Using TFIDF

Jupyter Notebook

EDA2

Exploratory Data Analysis Part-2

Jupyter Notebook

Normal-Distribution

Jupyter Notebook

Association-Rules

Jupyter Notebook

Probability-Calc

Probability Calculations for Normal distribution

Jupyter Notebook

P08.-Box-Plot-Practice

Box Plot - using dataframe in pandas Inserting Minor and Major gridlines Deriving LQ, UQ, IQR, Upper Whisker and Lower Whisker length

Jupyter Notebook

Forecasting_Data_Driven_Models

Splitting data, Moving Average, Time series decomposition plot, ACF plots and PACF plots, Evaluation Metric MAPE, Simple Exponential Method, Holt method, Holts winter exponential smoothing with additive seasonality and additive trend, Holts winter exponential smoothing with multiplicative seasonality and additive trend, Final Model by combining train and test

Jupyter Notebook

A7-Aczel-problems-practice-1-41-1-42-1-43-1-44-1-45-

Jupyter Notebook

A12-Aczel-problems-practice-1-71-1-72-1-73-

Solution to Aczel problems practice (1-71, 1-72, 1-73)

Jupyter Notebook

R_basics-homework-earthquake

R_basics- Earth Quake data

Inferential-Statistics

Inferential Statistics using Confidence Interval

Jupyter Notebook

Forecasting_Model_based_methods

Splitting data into Linear Model, Exponential, Qaudratic, Additive seasonality , Additive Seasonality Quadratic , Multiplicative Seasonality, Multiplicative Additive Seasonality. Prediction for new time period

Jupyter Notebook

100

P10.-Probability-Calc-2

Suppose GMAT scores can be reasonably modeled using a normal distribution with mean=711 and SD = 29. What is P(X<=680) What is P(697<=X<=740)

Jupyter Notebook

vaitybharati

Top repositories