• Stars
    star
    3
  • Rank 3,963,521 (Top 79 %)
  • Language
    Jupyter Notebook
  • Created over 4 years ago
  • Updated over 4 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

DESCRIPTION Identify the level of income qualification needed for the families in Latin America. Problem Statement Scenario: Many social programs have a hard time ensuring that the right people are given enough aid. It’s tricky when a program focuses on the poorest segment of the population. This segment of the population can’t provide the necessary income and expense records to prove that they qualify. In Latin America, a popular method called Proxy Means Test (PMT) uses an algorithm to verify income qualification. With PMT, agencies use a model that considers a family’s observable household attributes like the material of their walls and ceiling or the assets found in their homes to classify them and predict their level of need. While this is an improvement, accuracy remains a problem as the region’s population grows and poverty declines. The Inter-American Development Bank (IDB)believes that new methods beyond traditional econometrics, based on a dataset of Costa Rican household characteristics, might help improve PMT’s performance. Following actions should be performed: Identify the output variable. Understand the type of data. Check if there are any biases in your dataset. Check whether all members of the house have the same poverty level. Check if there is a house without a family head. Set poverty level of the members and the head of the house within a family. Count how many null values are existing in columns. Remove null value rows of the target variable. Predict the accuracy using random forest classifier. Check the accuracy using random forest with cross validation.

More Repositories

1

iNeuron_FSDA

This repository contains all the files and folders taught in the class.Make use of it effectively
82
star
2

analyticswithanand

This repository contains all the codes,ppts,project & interview questions which I have used in my LIVE CLASS on YouTube and any other relevant documents and assignments related to the course.
48
star
3

End-To-End-Data-Analytics-Project_Banking

This project is on the datasets manually created by me over a period of so many weeks which covers 1M records generated on a random basis involving 8 tables. A must project to showcase in your resume. Just Try It out.
PLpgSQL
17
star
4

anandjha90

14
star
5

E-Commerce-Sales-Dashboard-in-Excel

10
star
6

Organization_Exploratory_Data_Analysis

Jupyter Notebook
9
star
7

MASTER-THE-ART-OF-EXTRACT-TRANSFORM-LOAD-WITH-MATILLION

This repository contains all the files and information about Mastering ETL Tool - Matillion
6
star
8

Real-estate---PGP

Jupyter Notebook
4
star
9

Walmart_Retail_Analysis_Using_R

DESCRIPTION One of the leading retail stores in the US, Walmart, would like to predict the sales and demand accurately. There are certain events and holidays which impact sales on each day. There are sales data available for 45 stores of Walmart. The business is facing a challenge due to unforeseen demands and runs out of stock some times, due to the inappropriate machine learning algorithm. An ideal ML algorithm will predict demand accurately and ingest factors like economic conditions including CPI, Unemployment Index, etc. Walmart runs several promotional markdown events throughout the year. These markdowns precede prominent holidays, the four largest of all, which are the Super Bowl, Labour Day, Thanksgiving, and Christmas. The weeks including these holidays are weighted five times higher in the evaluation than non-holiday weeks. Part of the challenge presented by this competition is modeling the effects of markdowns on these holiday weeks in the absence of complete/ideal historical data. Historical sales data for 45 Walmart stores located in different regions are available. Dataset Description This is the historical data which covers sales from 2010-02-05 to 2012-11-01, in the file Walmart_Store_sales. Within this file you will find the following fields: Store - the store number Date - the week of sales Weekly_Sales - sales for the given store Holiday_Flag - whether the week is a special holiday week 1 – Holiday week 0 – Non-holiday week Temperature - Temperature on the day of sale Fuel_Price - Cost of fuel in the region CPI – Prevailing consumer price index Unemployment - Prevailing unemployment rate Holiday Events Super Bowl: 12-Feb-10, 11-Feb-11, 10-Feb-12, 8-Feb-13 Labour Day: 10-Sep-10, 9-Sep-11, 7-Sep-12, 6-Sep-13 Thanksgiving: 26-Nov-10, 25-Nov-11, 23-Nov-12, 29-Nov-13 Christmas: 31-Dec-10, 30-Dec-11, 28-Dec-12, 27-Dec-13 Analysis Tasks Basic Statistics tasks Which store has maximum sales Which store has maximum standard deviation i.e., the sales vary a lot. Also, find out the coefficient of mean to standard deviation Which store/s has good quarterly growth rate in Q3’2012 Some holidays have a negative impact on sales. Find out holidays which have higher sales than the mean sales in non-holiday season for all stores together Provide a monthly and semester view of sales in units and give insights Statistical Model For Store 1 – Build prediction models to forecast demand
R
4
star
10

Women-s-Clothing-E-Commerce-Reviews

Describe the data Descriptive statistics, data type, etc. Analyze the text comment/ review and share the findings Convert the ratings into 2 classes Class: Bad when Rating <=3 Class: Good otherwise Develop a model to predict the Rating class (created above) Focus on steps to build the model Which algorithm can be used and why? Share the findings of your analysis
Jupyter Notebook
4
star
11

Analyse-the-Federal-Aviation-Authority-Dataset-using-Pandas

DESCRIPTION Problem: Analyze the Federal Aviation Authority (FAA) dataset using Pandas to do the following: View aircraft make name state name aircraft model name text information flight phase event description type fatal flag 2. Clean the dataset and replace the fatal flag NaN with “No” 3. Find the aircraft types and their occurrences in the dataset 4. Remove all the observations where aircraft names are not available 5. Display the observations where fatal flag is “Yes”
Jupyter Notebook
4
star
12

providing_github_knowledge

This repository is meat for all the github resources.
3
star
13

High-value-customers-identification-for-an-E-Commerce-company

R
3
star
14

Data-Visualization

In this repository i have shown all the necessary graph plots associated with R and which will help in analysis of business data in real world.
R
3
star
15

DataVisualizationUsingPython

In this demo,I have done all the data visualization such as Histograms,Pie Chart,Line Plot etc for the given dataset using python programming language.
Jupyter Notebook
3
star
16

California-Housing-Price-Prediction

Jupyter Notebook
3
star
17

BITS_WILP

This is BITS WILP repository for M.Tech in Data Science & Engineering...
3
star
18

retail_analysis_end_to_end_project

This project involves entire data analytics workflow using retail data
Jupyter Notebook
3
star
19

Analysing-Ad-Budgets-for-different-media-channels

Jupyter Notebook
2
star
20

MASTER-DATA-ANALYTICS-USING-CLOUD-TECHNOLOGIES-ML

This repository contains all the files and information about Master Data Analytics Course
2
star
21

Building-a-model-to-predict-Diabetes

Jupyter Notebook
2
star
22

DiabetesHealthCarePredictionAnalysis

Jupyter Notebook
2
star
23

Comcast-Telecom-Consumer-Complaints

DESCRIPTION Comcast is an American global telecommunication company. The firm has been providing terrible customer service. They continue to fall short despite repeated promises to improve. Only last month (October 2016) the authority fined them a $2.3 million, after receiving over 1000 consumer complaints. The existing database will serve as a repository of public customer complaints filed against Comcast. It will help to pin down what is wrong with Comcast's customer service. Data Dictionary Ticket #: Ticket number assigned to each complaint Customer Complaint: Description of complaint Date: Date of complaint Time: Time of complaint Received Via: Mode of communication of the complaint City: Customer city State: Customer state Zipcode: Customer zip Status: Status of complaint Filing on behalf of someone Analysis Task To perform these tasks, you can use any of the different Python libraries such as NumPy, SciPy, Pandas, scikit-learn, matplotlib, and BeautifulSoup. - Import data into Python environment. - Provide the trend chart for the number of complaints at monthly and daily granularity levels. - Provide a table with the frequency of complaint types. Which complaint types are maximum i.e., around internet, network issues, or across any other domains. - Create a new categorical variable with value as Open and Closed. Open & Pending is to be categorized as Open and Closed & Solved is to be categorized as Closed. - Provide state wise status of complaints in a stacked bar chart. Use the categorized variable from Q3. Provide insights on: Which state has the maximum complaints Which state has the highest percentage of unresolved complaints - Provide the percentage of complaints resolved till date, which were received through the Internet and customer care calls. The analysis results to be provided with insights wherever applicable.
Jupyter Notebook
2
star
24

Analysing-Spam-Collection-Data

DESCRIPTION Problem: Analyze the given Spam Collection dataset to: View information on the spam data, View the length of messages, Define a function to eliminate stopwords, Apply Bag of Words, Apply tf-idf transformer, and Detect Spam with Naïve Bayes model.
Jupyter Notebook
2
star
25

Analyse-NewYork-city-fire-department-Dataset

DESCRIPTION What to: A dataset in CSV format is given for the Fire Department of New York City. Analyze the dataset to determine: The total number of fire department facilities in New York city The number of fire department facilities in each borough The facility names in Manhattan
Jupyter Notebook
2
star
26

Sentiment-Analysis-using-NLP

DESCRIPTION What to: Analyze the Sentiment dataset using NLP to: View the observations, Verify the length of the messages and add it as a new column, Apply a transformer and fit the data in the bag of words, Print the shape for the transformer, and Check the model for predicted and expected values.
Jupyter Notebook
2
star