There are no reviews yet. Be the first to send feedback to the community and the maintainers!
iNeuron_FSDA
This repository contains all the files and folders taught in the class.Make use of it effectivelyanalyticswithanand
This repository contains all the codes,ppts,project & interview questions which I have used in my LIVE CLASS on YouTube and any other relevant documents and assignments related to the course.End-To-End-Data-Analytics-Project_Banking
This project is on the datasets manually created by me over a period of so many weeks which covers 1M records generated on a random basis involving 8 tables. A must project to showcase in your resume. Just Try It out.anandjha90
E-Commerce-Sales-Dashboard-in-Excel
MASTER-THE-ART-OF-EXTRACT-TRANSFORM-LOAD-WITH-MATILLION
This repository contains all the files and information about Mastering ETL Tool - MatillionReal-estate---PGP
Walmart_Retail_Analysis_Using_R
DESCRIPTION One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc. Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available. Dataset Description This is the historical data which covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields: Store - the store number Date - the week of sales Weekly_Sales - sales for the given store Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week Temperature - Temperature on the day of sale Fuel_Price - Cost of fuel in the region CPI – Prevailing consumer price index Unemployment - Prevailing unemployment rate Holiday Events Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13 Analysis Tasks Basic Statistics tasks Which store has maximum sales Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation Which store/s has good quarterly growth rate in Q3’2012 Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together Provide a monthly and semester view of sales in units and give insights Statistical Model For Store 1 – Build prediction models to forecast demandWomen-s-Clothing-E-Commerce-Reviews
Describe the data Descriptive statistics, data type, etc. Analyze the text comment/ review and share the findings Convert the ratings into 2 classes Class: Bad when Rating <=3 Class: Good otherwise Develop a model to predict the Rating class (created above) Focus on steps to build the model Which algorithm can be used and why? Share the findings of your analysisAnalyse-the-Federal-Aviation-Authority-Dataset-using-Pandas
DESCRIPTION Problem: Analyze the Federal Aviation Authority (FAA) dataset using Pandas to do the following: View aircraft make name state name aircraft model name text information flight phase event description type fatal flag 2. Clean the dataset and replace the fatal flag NaN with “No” 3. Find the aircraft types and their occurrences in the dataset 4. Remove all the observations where aircraft names are not available 5. Display the observations where fatal flag is “Yes”providing_github_knowledge
This repository is meat for all the github resources.High-value-customers-identification-for-an-E-Commerce-company
Data-Visualization
In this repository i have shown all the necessary graph plots associated with R and which will help in analysis of business data in real world.DataVisualizationUsingPython
In this demo,I have done all the data visualization such as Histograms,Pie Chart,Line Plot etc for the given dataset using python programming language.California-Housing-Price-Prediction
Income-Qualification
DESCRIPTION Identify the level of income qualification needed for the families in Latin America. Problem Statement Scenario: Many social programs have a hard time ensuring that the right people are given enough aid. It’s tricky when a program focuses on the poorest segment of the population. This segment of the population can’t provide the necessary income and expense records to prove that they qualify. In Latin America, a popular method called Proxy Means Test (PMT) uses an algorithm to verify income qualification. With PMT, agencies use a model that considers a family’s observable household attributes like the material of their walls and ceiling or the assets found in their homes to classify them and predict their level of need. While this is an improvement, accuracy remains a problem as the region’s population grows and poverty declines. The Inter-American Development Bank (IDB)believes that new methods beyond traditional econometrics, based on a dataset of Costa Rican household characteristics, might help improve PMT’s performance. Following actions should be performed: Identify the output variable. Understand the type of data. Check if there are any biases in your dataset. Check whether all members of the house have the same poverty level. Check if there is a house without a family head. Set poverty level of the members and the head of the house within a family. Count how many null values are existing in columns. Remove null value rows of the target variable. Predict the accuracy using random forest classifier. Check the accuracy using random forest with cross validation.BITS_WILP
This is BITS WILP repository for M.Tech in Data Science & Engineering...retail_analysis_end_to_end_project
This project involves entire data analytics workflow using retail dataAnalysing-Ad-Budgets-for-different-media-channels
MASTER-DATA-ANALYTICS-USING-CLOUD-TECHNOLOGIES-ML
This repository contains all the files and information about Master Data Analytics CourseBuilding-a-model-to-predict-Diabetes
DiabetesHealthCarePredictionAnalysis
Comcast-Telecom-Consumer-Complaints
DESCRIPTION Comcast is an American global telecommunication company. The firm has been providing terrible customer service. They continue to fall short despite repeated promises to improve. Only last month (October 2016) the authority fined them a $2.3 million, after receiving over 1000 consumer complaints. The existing database will serve as a repository of public customer complaints filed against Comcast. It will help to pin down what is wrong with Comcast's customer service. Data Dictionary Ticket #: Ticket number assigned to each complaint Customer Complaint: Description of complaint Date: Date of complaint Time: Time of complaint Received Via: Mode of communication of the complaint City: Customer city State: Customer state Zipcode: Customer zip Status: Status of complaint Filing on behalf of someone Analysis Task To perform these tasks, you can use any of the different Python libraries such as NumPy, SciPy, Pandas, scikit-learn, matplotlib, and BeautifulSoup. - Import data into Python environment. - Provide the trend chart for the number of complaints at monthly and daily granularity levels. - Provide a table with the frequency of complaint types. Which complaint types are maximum i.e., around internet, network issues, or across any other domains. - Create a new categorical variable with value as Open and Closed. Open & Pending is to be categorized as Open and Closed & Solved is to be categorized as Closed. - Provide state wise status of complaints in a stacked bar chart. Use the categorized variable from Q3. Provide insights on: Which state has the maximum complaints Which state has the highest percentage of unresolved complaints - Provide the percentage of complaints resolved till date, which were received through the Internet and customer care calls. The analysis results to be provided with insights wherever applicable.Analysing-Spam-Collection-Data
DESCRIPTION Problem: Analyze the given Spam Collection dataset to: View information on the spam data, View the length of messages, Define a function to eliminate stopwords, Apply Bag of Words, Apply tf-idf transformer, and Detect Spam with Naïve Bayes model.Analyse-NewYork-city-fire-department-Dataset
DESCRIPTION What to: A dataset in CSV format is given for the Fire Department of New York City. Analyze the dataset to determine: The total number of fire department facilities in New York city The number of fire department facilities in each borough The facility names in ManhattanSentiment-Analysis-using-NLP
DESCRIPTION What to: Analyze the Sentiment dataset using NLP to: View the observations, Verify the length of the messages and add it as a new column, Apply a transformer and fit the data in the bag of words, Print the shape for the transformer, and Check the model for predicted and expected values.Love Open Source and this site? Check out how you can help us