There are no reviews yet. Be the first to send feedback to the community and the maintainers!
Assignment-04-Simple-Linear-Regression-2
Assignment-04-Simple-Linear-Regression-2. Q2) Salary_hike -> Build a prediction model for Salary_hike Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization. Correlation Analysis. Model Building. Model Testing. Model Predictions.Assignment-1-Q24-Basic-Statistics-Level-1-
Q 24) A Government company claims that an average light bulb lasts 270 days. A researcher randomly selects 18 bulbs for testing. The sampled bulbs last an average of 260 days, with a standard deviation of 90 days. If the CEO's claim were true, what is the probability that 18 randomly selected bulbs would have an average life of no more than 260 daysAssignment-11-Text-Mining-01-Elon-Musk
Assignment-11-Text-Mining-01-Elon-Musk, Perform sentimental analysis on the Elon-musk tweets (Exlon-musk.csv), Text Preprocessing: remove both the leading and the trailing characters, removes empty strings, because they are considered in Python as False, Joining the list into one string/text, Remove Twitter username handles from a given twitter text. (Removes @usernames), Again Joining the list into one string/text, Remove Punctuation, Remove https or url within text, Converting into Text Tokens, Tokenization, Remove Stopwords, Normalize the data, Stemming (Optional), Lemmatization, Feature Extraction, Using BoW CountVectorizer, CountVectorizer with N-grams (Bigrams & Trigrams), TF-IDF Vectorizer, Generate Word Cloud, Named Entity Recognition (NER), Emotion Mining - Sentiment Analysis.Assignment-05-Multiple-Linear-Regression-2
Assignment-05-Multiple-Linear-Regression-2. Prepare a prediction model for profit of 50_startups data. Do transformations for getting better predictions of profit and make a table containing R^2 value for each prepared model. R&D Spend -- Research and devolop spend in the past few years Administration -- spend on administration in the past few years Marketing Spend -- spend on Marketing in the past few years State -- states from which data is collected Profit -- profit of each state in the past few years.Assignment-1-Q23-Basic-Statistics-Level-1-
Q 23) Calculate the t scores of 95% confidence interval, 96% confidence interval, 99% confidence interval for sample size of 25Assignment-2-Set1-Q1-Basic-Statistic-Level-2-
Plot the data, find the outliers and find out ฮผ,ฯ,ฯ^2Multi-Linear-Reg
Multi-Linear-RegP27.-Supervised-ML---Multiple-Linear-Regression---Toyoto-Cars
Supervised-ML---Multiple-Linear-Regression---Toyota-Cars. EDA, Correlation Analysis, Model Building, Model Testing, Model Validation Techniques, Collinearity Problem Check, Residual Analysis, Model Deletion Diagnostics (checking Outliers or Influencers) Two Techniques : 1. Cook's Distance & 2. Leverage value, Improving the Model, Model - Re-build, Re-check and Re-improve - 2, Model - Re-build, Re-check and Re-improve - 3, Final Model, Model Predictions.Assignment-1-Q9_a-Basic-Statistics-Level-1-
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data Cars speed and distance Use Q9_a.csvP24.-Supervised-ML---Simple-Linear-Regression---Newspaper-data
Supervised-ML---Simple-Linear-Regression---Newspaper-data. EDA and Visualization, Correlation Analysis, Model Building, Model Testing, Model predictions.Assignment-07-Clustering-Hierarchical-Airlines-
Assignment-07-Clustering-Hierarchical-Airlines. Perform clustering (hierarchical) for the airlines data to obtain optimum number of clusters. Draw the inferences from the clusters obtained. Data Description: The file EastWestAirlinescontains information on passengers who belong to an airlineโs frequent flier program. For each passenger the data include information on their mileage history and on different ways they accrued or spent miles in the last year. The goal is to try to identify clusters of passengers that have similar characteristics for the purpose of targeting different segments for different types of mileage offers.Assignment-1-Q20-Basic-Statistics-Level-1-
Data _set: Cars.csv Calculate the probability of MPG of Cars for the below cases. MPG <- Cars$MPG a. P(MPG>38) b. P(MPG<40) c. P (20<MPG<50)Assignment-05-Multiple-Linear-Regression-1
Multiple-Linear-Regression-1. Consider only the below columns and prepare a prediction model for predicting Price of Toyota Corolla.P36.-Supervised-ML---Decision-Tree---C5.0-Entropy-Iris-Flower-
Supervised-ML-Decision-Tree-C5.0-Entropy-Iris-Flower-Using Entropy Criteria - Classification Model. Import Libraries and data set, EDA, Apply Label Encoding, Model Building - Building/Training Decision Tree Classifier (C5.0) using Entropy Criteria. Validation and Testing Decision Tree Classifier (C5.0) ModelP23.-EDA-1
EDA (Exploratory Data Analysis) -1: Loading the Datasets, Data type conversions,Removing duplicate entries, Dropping the column, Renaming the column, Outlier Detection, Missing Values and Imputation (Numerical and Categorical), Scatter plot and Correlation analysis, Transformations, Automatic EDA Methods (Pandas Profiling and Sweetviz).Assignment-04-Simple-Linear-Regression-1
Assignment-04-Simple-Linear-Regression-1. Q1) Delivery_time -> Predict delivery time using sorting time. Build a simple linear regression model by performing EDA and do necessary transformations and select the best model using R or Python. EDA and Data Visualization, Feature Engineering, Correlation Analysis, Model Building, Model Testing and Model Predictions using simple linear regression.Tableau-_Basics5
Tableau-_Basics Tutorial 4Tableau_Basics8
Tableau_Basics Tutorial 8Tableau_Basics2
Tableau_Basics2 tutorialTableau-_Basics3
Tableau-_Basics3 TutorialProbabilty-calc-2
Probability Calculation in PythonTableau_Basics6
Tableau_Basics Tutorial 6Tableau-_Basics4
Tableau-_Basics Tutorial 4Tableau_Basics9
Tableau_Basics Tutorial 9Survival-Analytics
Applying KaplanMeierFitter model on Time and Eventsvaitybharati
Config files for my GitHub profile.Tableau_Basics7
Tableau_Basics Tutorial 7Assignment-1-Q21_b-Basic-Statistics-Level-1-
Check Whether the Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data set follows Normal DistributionAssignment-1-Q7-Basic-Statistics-Level-1-
Q7) For Points,Score,Weigh: Find Mean, Median, Mode, Variance, Standard Deviation, and Range and also Comment about the values/ Draw some inferences. Use Q7.csv fileAssignment-1-Q9_b-Basic-Statistics-Level-1-
Q9) Calculate Skewness, Kurtosis & draw inferences on the following data SP and Weight(WT) Use Q9_b.csvTableau-Basics
Tableau basics tutorialAssignment-1-Q12-Basic-Statistics-Level-1-
Below are the scores obtained by a student in tests 34,36,36,38,38,39,39,40,40,41,41,41,41,42,42,45,49,56. Find mean, median, variance, standard deviation. What can we say about the student marks?Assignment-1-Q11-Basic-Statistics-Level-1-
Q11) Suppose we want to estimate the average weight of an adult male in Mexico. We draw a random sample of 2,000 men from a population of 3,000,000 men and weigh them. We find that the average person in our sample weighs 200 pounds, and the standard deviation of the sample is 30 pounds. Calculate 94%,98%,96% confidence interval?Assignment-08-PCA-Data-Mining-Wine-
Assignment-08-PCA-Data-Mining-Wine data. Perform Principal component analysis and perform clustering using first 3 principal component scores (both heirarchial and k mean clustering(scree plot or elbow curve) and obtain optimum number of clusters and check whether we have obtained same number of clusters with the original data (class column we have ignored at the begining who shows it has 3 clusters)Assignment-2-Set2-Q5-Basic-Statistic-Level-2-
Consider a company that has two different divisions. The annual profits from the two divisions are independent and have distributions Profit1 ~ N(5, 3^2) and Profit2 ~ N(7, 4^2) respectively. Both the profits are in $ Million. Answer the following questions about the total profit of the company in Rupees. Assume that $1 = Rs. 45 A. Specify a Rupee range (centered on the mean) such that it contains 95% probability for the annual profit of the company. B. Specify the 5th percentile of profit (in Rupees) for the company C. Which of the two divisions has a larger probability of making a loss in a given year?Assignment-1-Q22-Basic-Statistics-Level-1-
Q 22) Calculate the Z scores of 90% confidence interval,94% confidence interval, 60% confidence interval for Adipose Tissue (AT) and Waist Circumference(Waist) from wc-at data setAssignment-1-Q21_a-Basic-Statistics-Level-1-
Q 21) Check whether the data follows normal distribution a) Check whether the MPG of Cars follows Normal DistributionP34.-Unsupervised-ML---t-SNE-Data-Mining-Cancer-
Unsupervised-ML-t-SNE-Data-Mining-Cancer. Import Libraries, Import Dataset, Convert data to array format, Separate array into input and output components, TSNE implementation, Cluster VisualizationNN_Hyperparameter-Tuning
Tuning of Hyperparameters :- Batch Size and Epochs. Tuning of Hyperparameters:- Learning rate and Drop out rate. Tuning of Hyperparameters:- Activation Function and Kernel Initializer. Tuning of Hyperparameter :-Number of Neurons in activation layer. Training model with optimum values of Hyperparameters.Assignment-03-Q3-Hypothesis-Testing-
Chi2 contengency independence test. Assume Null Hypothesis as Ho: Independence of categorical variables (male-female buyer rations are similar across regions (does not vary and are not related) Thus Alternate Hypothesis as Ha: Dependence of categorical variables (male-female buyer rations are NOT similar across regions (does vary and somewhat/significantly related)P25.-Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data
Supervised-ML---Simple-Linear-Regression---Waist-Circumference-Adipose-Tissue-Data. EDA and data visualization, Correlation Analysis, Model Building, Model Testing, Model Prediction.Assignment-06-Logistic-Regression
Assignment-06-Logistic-Regression. Output variable -> y y -> Whether the client has subscribed a term deposit or not Binomial ("yes" or "no") Attribute information For bank dataset Input variables: # bank client data: 1 - age (numeric) 2 - job : type of job (categorical: "admin.","unknown","unemployed","management","housemaid","entrepreneur","student", "blue-collar","self-employed","retired","technician","services") 3 - marital : marital status (categorical: "married","divorced","single"; note: "divorced" means divorced or widowed) 4 - education (categorical: "unknown","secondary","primary","tertiary") 5 - default: has credit in default? (binary: "yes","no") 6 - balance: average yearly balance, in euros (numeric) 7 - housing: has housing loan? (binary: "yes","no") 8 - loan: has personal loan? (binary: "yes","no") # related with the last contact of the current campaign: 9 - contact: contact communication type (categorical: "unknown","telephone","cellular") 10 - day: last contact day of the month (numeric) 11 - month: last contact month of year (categorical: "jan", "feb", "mar", ..., "nov", "dec") 12 - duration: last contact duration, in seconds (numeric) # other attributes: 13 - campaign: number of contacts performed during this campaign and for this client (numeric, includes last contact) 14 - pdays: number of days that passed by after the client was last contacted from a previous campaign (numeric, -1 means client was not previously contacted) 15 - previous: number of contacts performed before this campaign and for this client (numeric) 16 - poutcome: outcome of the previous marketing campaign (categorical: "unknown","other","failure","success") Output variable (desired target): 17 - y - has the client subscribed a term deposit? (binary: "yes","no") 8. Missing Attribute Values: NoneMysql-Students-table
Mysql-Students-tableMysql-date-time
Mysql-date-timeDatascience_python
Python codeDecision-Tree
Decision-TreeModel-Validation-Methods
Model-Validation-MethodsBasics-of-R-1
Basics-of-R-Tutorial 1Mysql-Data-Manipulation
Mysql-Data-ManipulationR1
R Basics Tutorial-1P03.-Pandas-3
Understanding Pandas, Visualization using Matplotlib, Plotting subplotsR_basics-homework-5_sept
R_basics - Visualizing Air Quality dataHypothesis-Test
Hypothesis-Test in pythonRidge_Lasso_ElasticNet
Model Building and Testing using Ridge, Lasso and ElasticNet MethodsDB-Scan
DB-ScanBagging-boosting-stacking
Bagging-boosting-stackingA15-Aczel-problems-practice-1-78-1-79-
Solution to Aczel problems practice (1-78, 1-79)R_basics-homework
R_basics FunctionsSimple-linear-Reg-1
Simple-linear-Reg-1Mysql-practice-tables
Mysql-practice-tablesHierarchical-Clustering
Hierarchical-ClusteringVisualization-Mat_Seaborn
Visualization using Matplotlib and SeabornR2
R2 - Decision Making statements in RR3
R3 - Joins and Appling Functions in RConfidence-Interval
Confidence-IntervalKNN
K Nearest Neighbours in PythonR_basics_calc-2
R code 2Classification_Case_study
Classification Project: Sonar rocks or minesR-code-1a
R-code-1aA8-Aczel-problems-practice-1-48-1-51-1-53-
P14.-Confidence-Interval-for-Stocks
Find confidence intervals for Beml and Glaxo stocks. Confidence Interval EstimateA5-Aczel-problems-practice-1-17-1-23-1-35-
Data: 23, 26, 29, 30, 32, 34, 37, 45, 57, 80, 102, 147, 210, 355, 782, 1209Hypothesis-testing
Hypothesis Testing in PythonA17-Aczel-problems-practice-1-82-1-83-
Solution to Aczel problems practice (1-82, 1-83)Day-3
R - Joins, Basic functions, and If else statements in RR-code-2
R-code-2P04.-Matplotlib-Visualization
Plotting two different categories- box plot, barplot, histogram. Plotting single category- Pie chart, bar chart. Different Plots- Scatter Plot, Histogram, Box Plot, Violin PlotP07.-Chebyshev-s-practice
Chebyshev's Theorem 3/4th or 75% of observations lie 2 Standard deviations of mean i.e. mean+2SD and mean-2SDBasics-of-R3
Basics-of-R Tutorial 3P29.-Unsupervised-ML---Hierarchical-Clustering-Univ.-
Unsupervised-ML---Hierarchical-Clustering-University Data. Import libraries, Import dataset, Create Normalized data frame (considering only the numerical part of data), Create dendrograms, Create Clusters, Plot Clusters.P01.-Pandas-1
Understanding Pandas, Importing datasets, Deriving Attributes, Performing StatisticsAnova
AnovaMatplotlip
MatPlotlib Python codesP02.-Pandas-2
Understanding Pandas, Groupby Function, Filtering FunctionA4-Aczel-problems-practice-1-16-1-22-1-34-
Following are the numbers of daily bids received by the government of a developing country from firms interested in winning a contract for the construction of a new port facilityDatascience_R
R code TutorialReviews_Classification_Naive_Bayes
Data Cleaning, N-gram, WordCloud, Applying naive bayes for classification, Using TFIDFEDA2
Exploratory Data Analysis Part-2Normal-Distribution
Normal-DistributionAssociation-Rules
Association-RulesProbability-Calc
Probability Calculations for Normal distributionP08.-Box-Plot-Practice
Box Plot - using dataframe in pandas Inserting Minor and Major gridlines Deriving LQ, UQ, IQR, Upper Whisker and Lower Whisker lengthForecasting_Data_Driven_Models
Splitting data, Moving Average, Time series decomposition plot, ACF plots and PACF plots, Evaluation Metric MAPE, Simple Exponential Method, Holt method, Holts winter exponential smoothing with additive seasonality and additive trend, Holts winter exponential smoothing with multiplicative seasonality and additive trend, Final Model by combining train and testA7-Aczel-problems-practice-1-41-1-42-1-43-1-44-1-45-
A12-Aczel-problems-practice-1-71-1-72-1-73-
Solution to Aczel problems practice (1-71, 1-72, 1-73)R_basics-homework-earthquake
R_basics- Earth Quake dataInferential-Statistics
Inferential Statistics using Confidence IntervalForecasting_Model_based_methods
Splitting data into Linear Model, Exponential, Qaudratic, Additive seasonality , Additive Seasonality Quadratic , Multiplicative Seasonality, Multiplicative Additive Seasonality. Prediction for new time periodP10.-Probability-Calc-2
Suppose GMAT scores can be reasonably modeled using a normal distribution with mean=711 and SD = 29. What is P(X<=680) What is P(697<=X<=740)P12.-C.I.E-using-z-values-Confidence-Interval-Estimate-
credit card launch example sample mean: 1990 sample SD: 2833 Pop SD: 2500 Pop mean: ? n=140 Q: Construct 95% confidence interval for mean card balance and interpret itLove Open Source and this site? Check out how you can help us