Vaitybharati (@vaitybharati)
  • Stars
    star
    243
  • Global Rank 104,995 (Top 4 %)
  • Followers 360
  • Following 1
  • Registered about 4 years ago
  • Most used languages
    R
    10.5 %
  • Location 🇮🇳 India
  • Country Total Rank 2,763
  • Country Ranking
    R
    57

Top repositories

1

Assignment-04-Simple-Linear-Regression-2

Assignment-04-Simple-Linear-Regression-2. Q2) Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization. Correlation Analysis. Model Building. Model Testing. Model Predictions.
Jupyter Notebook
9
star
2

Assignment-1-Q24-Basic-Statistics-Level-1-

Q 24) A Government company claims that an average light bulb lasts 270 days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs last an average of 260 days, with a standard deviation of 90 days. If the CEO's claim were true, what is the probability that 18 randomly selected bulbs would have an average life of no more than 260 days
Jupyter Notebook
5
star
3

Assignment-11-Text-Mining-01-Elon-Musk

Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.
Jupyter Notebook
5
star
4

Assignment-05-Multiple-Linear-Regression-2

Assignment-05-Multiple-Linear-Regression-2. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model. R&D Spend -- Research and devolop spend in the past few years Administration -- spend on administration in the past few years Marketing Spend -- spend on Marketing in the past few years State -- states from which data is collected Profit -- profit of each state in the past few years.
Jupyter Notebook
4
star
5

Assignment-1-Q23-Basic-Statistics-Level-1-

Q 23) Calculate the t scores of 95% confidence interval, 96% confidence interval, 99% confidence interval for sample size of 25
Jupyter Notebook
3
star
6

Assignment-2-Set1-Q1-Basic-Statistic-Level-2-

Plot the data, find the outliers and find out μ,σ,σ^2
Jupyter Notebook
3
star
7

Multi-Linear-Reg

Multi-Linear-Reg
Jupyter Notebook
3
star
8

P27.-Supervised-ML---Multiple-Linear-Regression---Toyoto-Cars

Supervised-ML---Multiple-Linear-Regression---Toyota-Cars. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.
Jupyter Notebook
3
star
9

Assignment-1-Q9_a-Basic-Statistics-Level-1-

Q9) Calculate Skewness, Kurtosis & draw inferences on the following data Cars speed and distance Use Q9_a.csv
Jupyter Notebook
3
star
10

P24.-Supervised-ML---Simple-Linear-Regression---Newspaper-data

Supervised-ML---Simple-Linear-Regression---Newspaper-data. EDA and Visualization, Correlation Analysis, Model Building, Model Testing, Model predictions.
Jupyter Notebook
3
star
11

Assignment-07-Clustering-Hierarchical-Airlines-

Assignment-07-Clustering-Hierarchical-Airlines. Perform clustering (hierarchical) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airline’s frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.
Jupyter Notebook
3
star
12

Assignment-1-Q20-Basic-Statistics-Level-1-

Data _set: Cars.csv Calculate the probability of MPG of Cars for the below cases. MPG <- Cars$MPG a. P(MPG>38) b. P(MPG<40) c. P (20<MPG<50)
Jupyter Notebook
3
star
13

Assignment-05-Multiple-Linear-Regression-1

Multiple-Linear-Regression-1. Consider only the below columns and prepare a prediction model for predicting Price of Toyota Corolla.
Jupyter Notebook
3
star
14

P36.-Supervised-ML---Decision-Tree---C5.0-Entropy-Iris-Flower-

Supervised-ML-Decision-Tree-C5.0-Entropy-Iris-Flower-Using Entropy Criteria - Classification Model. Import Libraries and data set, EDA, Apply Label Encoding, Model Building - Building/Training Decision Tree Classifier (C5.0) using Entropy Criteria. Validation and Testing Decision Tree Classifier (C5.0) Model
Jupyter Notebook
3
star
15

P23.-EDA-1

EDA (Exploratory Data Analysis) -1: Loading the Datasets, Data type conversions,Removing duplicate entries, Dropping the column, Renaming the column, Outlier Detection, Missing Values and Imputation (Numerical and Categorical), Scatter plot and Correlation analysis, Transformations, Automatic EDA Methods (Pandas Profiling and Sweetviz).
Jupyter Notebook
3
star
16

Assignment-04-Simple-Linear-Regression-1

Assignment-04-Simple-Linear-Regression-1. Q1) Delivery_time -> Predict delivery time using sorting time. Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization, Feature Engineering, Correlation Analysis, Model Building, Model Testing and Model Predictions using simple linear regression.
Jupyter Notebook
3
star
17

Tableau-_Basics5

Tableau-_Basics Tutorial 4
2
star
18

Tableau_Basics8

Tableau_Basics Tutorial 8
2
star
19

Tableau_Basics2

Tableau_Basics2 tutorial
2
star
20

Tableau-_Basics3

Tableau-_Basics3 Tutorial
2
star
21

Probabilty-calc-2

Probability Calculation in Python
Jupyter Notebook
2
star
22

Tableau_Basics6

Tableau_Basics Tutorial 6
2
star
23

Tableau-_Basics4

Tableau-_Basics Tutorial 4
2
star
24

Tableau_Basics9

Tableau_Basics Tutorial 9
2
star
25

P25.-Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data

Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data. EDA and data visualization, Correlation Analysis, Model Building, Model Testing, Model Prediction.
Jupyter Notebook
2
star
26

Survival-Analytics

Applying KaplanMeierFitter model on Time and Events
Jupyter Notebook
2
star
27

vaitybharati

Config files for my GitHub profile.
2
star
28

Tableau_Basics7

Tableau_Basics Tutorial 7
2
star
29

Assignment-1-Q21_b-Basic-Statistics-Level-1-

Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data set follows Normal Distribution
Jupyter Notebook
2
star
30

Assignment-1-Q7-Basic-Statistics-Level-1-

Q7) For Points,Score,Weigh: Find Mean, Median, Mode, Variance, Standard Deviation, and Range and also Comment about the values/ Draw some inferences. Use Q7.csv file
Jupyter Notebook
2
star
31

Assignment-1-Q9_b-Basic-Statistics-Level-1-

Q9) Calculate Skewness, Kurtosis & draw inferences on the following data SP and Weight(WT) Use Q9_b.csv
Jupyter Notebook
2
star
32

Tableau-Basics

Tableau basics tutorial
2
star
33

Assignment-1-Q12-Basic-Statistics-Level-1-

Below are the scores obtained by a student in tests 34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56. Find mean, median, variance, standard deviation. What can we say about the student marks?
Jupyter Notebook
2
star
34

Assignment-1-Q11-Basic-Statistics-Level-1-

Q11) Suppose we want to estimate the average weight of an adult male in Mexico. We draw a random sample of 2,000 men from a population of 3,000,000 men and weigh them. We find that the average person in our sample weighs 200 pounds, and the standard deviation of the sample is 30 pounds. Calculate 94%,98%,96% confidence interval?
Jupyter Notebook
2
star
35

Assignment-08-PCA-Data-Mining-Wine-

Assignment-08-PCA-Data-Mining-Wine data. Perform Principal component analysis and perform clustering using first 3 principal component scores (both heirarchial and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)
Jupyter Notebook
2
star
36

Assignment-2-Set2-Q5-Basic-Statistic-Level-2-

Consider a company that has two different divisions. The annual profits from the two divisions are independent and have distributions Profit1 ~ N(5, 3^2) and Profit2 ~ N(7, 4^2) respectively. Both the profits are in $ Million. Answer the following questions about the total profit of the company in Rupees. Assume that $1 = Rs. 45 A. Specify a Rupee range (centered on the mean) such that it contains 95% probability for the annual profit of the company. B. Specify the 5th percentile of profit (in Rupees) for the company C. Which of the two divisions has a larger probability of making a loss in a given year?
Jupyter Notebook
2
star
37

Assignment-1-Q22-Basic-Statistics-Level-1-

Q 22) Calculate the Z scores of 90% confidence interval,94% confidence interval, 60% confidence interval for Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data set
Jupyter Notebook
2
star
38

Assignment-03-Q1-Hypothesis-Testing-

A F&B manager wants to determine whether there is any significant difference in the diameter of the cutlet between two units. A randomly selected sample of cutlets was collected from both units and measured? Analyze the data and draw inferences at 5% significance level. Please state the assumptions and tests that you carried out to check validity of the assumptions. Cutlets.csv
Jupyter Notebook
2
star
39

Assignment-1-Q21_a-Basic-Statistics-Level-1-

Q 21) Check whether the data follows normal distribution a) Check whether the MPG of Cars follows Normal Distribution
Jupyter Notebook
2
star
40

P34.-Unsupervised-ML---t-SNE-Data-Mining-Cancer-

Unsupervised-ML-t-SNE-Data-Mining-Cancer. Import Libraries, Import Dataset, Convert data to array format, Separate array into input and output components, TSNE implementation, Cluster Visualization
Jupyter Notebook
2
star
41

NN_Hyperparameter-Tuning

Tuning of Hyperparameters :- Batch Size and Epochs. Tuning of Hyperparameters:- Learning rate and Drop out rate. Tuning of Hyperparameters:- Activation Function and Kernel Initializer. Tuning of Hyperparameter :-Number of Neurons in activation layer. Training model with optimum values of Hyperparameters.
Jupyter Notebook
2
star
42

Assignment-03-Q3-Hypothesis-Testing-

Chi2 contengency independence test. Assume Null Hypothesis as Ho: Independence of categorical variables (male-female buyer rations are similar across regions (does not vary and are not related) Thus Alternate Hypothesis as Ha: Dependence of categorical variables (male-female buyer rations are NOT similar across regions (does vary and somewhat/significantly related)
Jupyter Notebook
2
star
43

Assignment-06-Logistic-Regression

Assignment-06-Logistic-Regression. Output variable -> y y -> Whether the client has subscribed a term deposit or not Binomial ("yes" or "no") Attribute information For bank dataset Input variables: # bank client data: 1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has credit in default? (binary: "yes","no") 6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") # related with the last contact of the current campaign: 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) # other attributes: 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success") Output variable (desired target): 17 - y - has the client subscribed a term deposit? (binary: "yes","no") 8. Missing Attribute Values: None
Jupyter Notebook
2
star
44

Mysql-Students-table

Mysql-Students-table
1
star
45

Mysql-date-time

Mysql-date-time
1
star
46

Datascience_python

Python code
Jupyter Notebook
1
star
47

Visualization-Mat_Seaborn

Visualization using Matplotlib and Seaborn
Jupyter Notebook
1
star
48

Decision-Tree

Decision-Tree
Jupyter Notebook
1
star
49

Model-Validation-Methods

Model-Validation-Methods
Jupyter Notebook
1
star
50

Basics-of-R-1

Basics-of-R-Tutorial 1
R
1
star
51

Mysql-Data-Manipulation

Mysql-Data-Manipulation
1
star
52

R1

R Basics Tutorial-1
R
1
star
53

P03.-Pandas-3

Understanding Pandas, Visualization using Matplotlib, Plotting subplots
Jupyter Notebook
1
star
54

R_basics-homework-5_sept

R_basics - Visualizing Air Quality data
R
1
star
55

Hypothesis-Test

Hypothesis-Test in python
Jupyter Notebook
1
star
56

Ridge_Lasso_ElasticNet

Model Building and Testing using Ridge, Lasso and ElasticNet Methods
Jupyter Notebook
1
star
57

DB-Scan

DB-Scan
Jupyter Notebook
1
star
58

Bagging-boosting-stacking

Bagging-boosting-stacking
Jupyter Notebook
1
star
59

A15-Aczel-problems-practice-1-78-1-79-

Solution to Aczel problems practice (1-78, 1-79)
Jupyter Notebook
1
star
60

R_basics-homework

R_basics Functions
R
1
star
61

Simple-linear-Reg-1

Simple-linear-Reg-1
Jupyter Notebook
1
star
62

Mysql-practice-tables

Mysql-practice-tables
1
star
63

Hierarchical-Clustering

Hierarchical-Clustering
Jupyter Notebook
1
star
64

R2

R2 - Decision Making statements in R
R
1
star
65

R3

R3 - Joins and Appling Functions in R
R
1
star
66

Confidence-Interval

Confidence-Interval
Jupyter Notebook
1
star
67

KNN

K Nearest Neighbours in Python
Jupyter Notebook
1
star
68

R_basics_calc-2

R code 2
R
1
star
69

Classification_Case_study

Classification Project: Sonar rocks or mines
Jupyter Notebook
1
star
70

R-code-1a

R-code-1a
R
1
star
71

A8-Aczel-problems-practice-1-48-1-51-1-53-

Jupyter Notebook
1
star
72

P14.-Confidence-Interval-for-Stocks

Find confidence intervals for Beml and Glaxo stocks. Confidence Interval Estimate
Jupyter Notebook
1
star
73

A10-Aczel-problems-practice-1-62-1-63-1-64-1-65-

Fortune published a list of the 10 largest “green companies”—those that follow environmental policies. Their annual revenues, in $ billions, are given below. Find the mean, variance, and standard deviation of the annual revenues.
Jupyter Notebook
1
star
74

A5-Aczel-problems-practice-1-17-1-23-1-35-

Data: 23, 26, 29, 30, 32, 34, 37, 45, 57, 80, 102, 147, 210, 355, 782, 1209
Jupyter Notebook
1
star
75

Hypothesis-testing

Hypothesis Testing in Python
Jupyter Notebook
1
star
76

A17-Aczel-problems-practice-1-82-1-83-

Solution to Aczel problems practice (1-82, 1-83)
Jupyter Notebook
1
star
77

Day-3

R - Joins, Basic functions, and If else statements in R
R
1
star
78

R-code-2

R-code-2
R
1
star
79

P04.-Matplotlib-Visualization

Plotting two different categories- box plot, barplot, histogram. Plotting single category- Pie chart, bar chart. Different Plots- Scatter Plot, Histogram, Box Plot, Violin Plot
Jupyter Notebook
1
star
80

P07.-Chebyshev-s-practice

Chebyshev's Theorem 3/4th or 75% of observations lie 2 Standard deviations of mean i.e. mean+2SD and mean-2SD
Jupyter Notebook
1
star
81

Basics-of-R3

Basics-of-R Tutorial 3
R
1
star
82

P29.-Unsupervised-ML---Hierarchical-Clustering-Univ.-

Unsupervised-ML---Hierarchical-Clustering-University Data. Import libraries, Import dataset, Create Normalized data frame (considering only the numerical part of data), Create dendrograms, Create Clusters, Plot Clusters.
Jupyter Notebook
1
star
83

P01.-Pandas-1

Understanding Pandas, Importing datasets, Deriving Attributes, Performing Statistics
Jupyter Notebook
1
star
84

Anova

Anova
Jupyter Notebook
1
star
85

Matplotlip

MatPlotlib Python codes
Jupyter Notebook
1
star
86

P02.-Pandas-2

Understanding Pandas, Groupby Function, Filtering Function
Jupyter Notebook
1
star
87

A4-Aczel-problems-practice-1-16-1-22-1-34-

Following are the numbers of daily bids received by the government of a developing country from firms interested in winning a contract for the construction of a new port facility
Jupyter Notebook
1
star
88

Datascience_R

R code Tutorial
R
1
star
89

Reviews_Classification_Naive_Bayes

Data Cleaning, N-gram, WordCloud, Applying naive bayes for classification, Using TFIDF
Jupyter Notebook
1
star
90

EDA2

Exploratory Data Analysis Part-2
Jupyter Notebook
1
star
91

Normal-Distribution

Normal-Distribution
Jupyter Notebook
1
star
92

Association-Rules

Association-Rules
Jupyter Notebook
1
star
93

Probability-Calc

Probability Calculations for Normal distribution
Jupyter Notebook
1
star
94

P08.-Box-Plot-Practice

Box Plot - using dataframe in pandas Inserting Minor and Major gridlines Deriving LQ, UQ, IQR, Upper Whisker and Lower Whisker length
Jupyter Notebook
1
star
95

Forecasting_Data_Driven_Models

Splitting data, Moving Average, Time series decomposition plot, ACF plots and PACF plots, Evaluation Metric MAPE, Simple Exponential Method, Holt method, Holts winter exponential smoothing with additive seasonality and additive trend, Holts winter exponential smoothing with multiplicative seasonality and additive trend, Final Model by combining train and test
Jupyter Notebook
1
star
96

A7-Aczel-problems-practice-1-41-1-42-1-43-1-44-1-45-

Jupyter Notebook
1
star
97

A12-Aczel-problems-practice-1-71-1-72-1-73-

Solution to Aczel problems practice (1-71, 1-72, 1-73)
Jupyter Notebook
1
star
98

R_basics-homework-earthquake

R_basics- Earth Quake data
R
1
star
99

Inferential-Statistics

Inferential Statistics using Confidence Interval
Jupyter Notebook
1
star
100

Forecasting_Model_based_methods

Splitting data into Linear Model, Exponential, Qaudratic, Additive seasonality , Additive Seasonality Quadratic , Multiplicative Seasonality, Multiplicative Additive Seasonality. Prediction for new time period
Jupyter Notebook
1
star