• Stars
    star
    227
  • Rank 175,900 (Top 4 %)
  • Language
    Jupyter Notebook
  • Created about 6 years ago
  • Updated over 5 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This repository contains the projects related to data collecting, assessing,cleaning,visualizations and analyzing

Data-Analytics-Projects:

Certificate : https://graduation.udacity.com/confirm/KUM3F4AJ

This repository is mainly for projects I have done under Udacity-Data-Analysis-Nanodegree.

Udacity online data analyst program prepares me for a career as a data analyst by helping me learn to clean and organize data, uncover patterns and insights, draw meaningful conclusions, and clearly communicate critical findings. I am developing proficiency in Python and its data analysis libraries (Numpy, pandas, Matplotlib) and SQL as I build a portfolio of projects .

Tips: For data science projects with python, I would recomend you to install numpy , pandas , scipy , scikit learn , matplotlib , seaborn thest basic libraries.

Part 1 - Intro to Data Analysis

Subjects Covered:

  • Anaconda: Learn to use Anaconda to manage packages and environments for use with Python
  • Jupyter Notebook: Learn to use this open-source web application
  • Data Analysis Process
  • NumPy for 1 and 2D Data
  • Pandas Series and Dataframes

Project 1: Explore Weather Trends with weather forecast data

In this project, I choose one of Udacity's curated datasets and investigate it using NumPy and pandas. I complete the entire data analysis process, starting by posing a question and finishing by sharing the findings. ( It may be better to place this section inside the readme of the project 1)

Project 2: Investigate a dataset called TMDb movie data.

I was provided a dataset reflecting data collected from an experiment. I used statistical techniques to answer questions about the data and report my conclusions and recommendations in a report.

Part 2 -Practical Statistics

Subjects Covered:

  • Probability
  • Conditional Probability
  • Binominal Distribution
  • Sampling Distribution and Central Limit Theorem
  • Descriptive Statistics
  • Inferential Statistics
  • Confidence Levels and Intervals
  • Hypothesis Testing
  • T-tests and A/B test
  • Regression
  • Multiple Linear Regression
  • Logistic Regression

Project 3: Analyze A/B Test Results with company ab_data.csv

Using Python, I gathered data from a variety of sources, assess its quality and tidiness, then clean it. I documented the wrangling efforts in a Jupyter Notebook, plus showcase them through analyses and visualizations using Python and SQL.By using AB Testing and regression methods to decide if the company should launch a new webpage or keep the old one.

Part 3 - Data Extraction and Wrangling

Subjects Covered:

  • GATHERING DATA:
    • Gather data from multiple sources, including gathering files, programmatically downloading files, web-scraping data, and accessing data from APIs
    • Import data of various file formats into pandas, including flat files (e.g. TSV), HTML files, TXT files, and JSON files
    • Store gathered data in a PostgreSQL database
  • ASSESSING DATA
    • Assess data visually and programmatically using pandas
    • Distinguish between dirty data (content or “quality” issues) and messy data (structural or “tidiness” issues)
    • Identify data quality issues and categorize them using metrics: validity, accuracy, completeness, consistency, and uniformity
  • CLEANING DATA
    • Identify each step of the data cleaning process (defining, coding,and testing)
    • Clean data using Python and pandas
    • Test cleaning code visually and programmatically using Python

Project 4 : Data Wrangle and Analyze with Tweet WeRateDogs data

Collect data from different sources and assess data visually and programmatically , clean data for visulizing data and finding insights later.

Part 4 - Data Visualisation

Subjects Covered:

  • Univariate exploration of data ( histogram , bar charts , Use axis limits and different scales )
  • Bivariate exploration of data ( scatter plots , clustered bar charts , violin and bar charts , faceting )
  • Multivariate exploration of data ( encodings , plot matrices , feature enginnering )
  • Explanatory Visulizations ( story telling with data , polish plots , create slide deck )

Project 5: Data Visulization with Diamond Data

Data visualization to a dataset involving the characteristics of diamonds and their prices.

Project 6: Communicate data finding with Ford Go Bike Sharing Data

In this project, I used Python’s data visualization tools to systematically explore the bike dataset for its properties and relationships between variables. Then, I created a presentation that communicates the findings to others.