• Stars
    star
    289
  • Rank 143,419 (Top 3 %)
  • Language
    Jupyter Notebook
  • License
    MIT License
  • Created over 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

collections of data science, machine learning and data visualization projects with pandas, sklearn, matplotlib, tensorflow2, Keras, various ML algorithms like random forest classifier, boosting, etc

Data Science, Machine Learning & Visualization Dojo

Collections of Data Science & ML projects and dojo where I practice Data Science, Machine Learning, Deep Learning and Data Visualization related skills, theories, probability, statistics, etc.

Built with

Machine Learing, Deep Learning, Data Science libraries

  • NumPy - package for scientific computing with Python
  • Pandas - fast, powerful, flexible and easy to use open source data analysis and manipulation tool
  • Pandas Profiling - generate reports from dataframe
  • Geo Pandas - support for geographic data to pandas objects.
  • Scikit-learn - Simple and efficient tools for predictive data analysis
  • TensorFlow - An end-to-end open source machine learning platform
  • Keras - Deep Learning framework
  • NLTK - Natural Language Toolkit
  • dlib - A toolkit for making real world machine learning and data analysis applications in C++
  • Face Recognition - The world's simplest facial recognition api for Python and the command line

Data Visualization libraries

  • Matplotlib - a comprehensive library for creating static, animated, and interactive visualizations in Python
  • Seaborn - statistical data visualization
  • Bokeh - interactive visualization library for modern web browsers
  • Plotly - The front-end for ML and data science models
  • Cufflinks - Productivity Tools for Plotly + Pandas

Turning into Web applications

  • Streamlit - The fastest way to build and share data apps
  • Flask - a micro web framework written in Python

Spark

  • Apache Spark - a unified analytics engine for large-scale data processing.
  • Spark with pyspark - PySpark is the collaboration of Apache Spark and Python
  • Databricks - Unified Data Analytics Platform - One cloud platform for massive scale data engineering and collaborative data science.

Tools and Datasources


Projects

Breast Cancer Tumor Diagnostic - Classification Project

Fandango movie ratings - Capstone Project

Data Analysis and Visualization Capstone project from Machine Learning and Datascience Masterclass Course.

  • This is the data behind the story Be Suspicious Of Online Movie Ratings, Especially Fandango’s
  • using data from 538
  • If you are planning on going out to see a movie, how well can you trust online reviews and ratings? Especially if the same company showing the rating also makes money by selling movie tickets.
  • Do they have a bias towards rating movies higher than they should be rated?
  • etc..

Supervised Learning Capstone Project - Cohort Analysis & Customer Churn Predictions

  • This project is to build a machine learning model to predict whether or not a customer will Churn or not.
  • Includes cohort analysis based on Telco subsriber's contract type, etc.

Predicting Heart Disease - Classification Project

Milestone project from Complete Machine Learning and Data Science - Zero to Mastery course.

Predicting Bulldozer Sale Price - Regression Project

Milestone project from Complete Machine Learning and Data Science - Zero to Mastery course.

Deep Learning ANN Project - Dog breed predictions

Project from Complete Machine Learning and Data Science - Zero to Mastery course.

911 Calls - Data Capstone Project

Data Analysis and Visualization Capstone project from Data Science and Machine Learning Bootcamp Course.

  • analyzing 911 calls data from kaggle
  • top 5 zips code for 911 calls
  • top 5 townships for 911 calls
  • most common Reason for a 911
  • different types of visualizations based on the findings
  • etc..

ML App - Random Forest Algorithm - ML Project

  • Machine learning app using streamlit, for building a regression model using the Random Forest algorithm.

Machine Learning & Data Science Projects

Masterclass Projects

Other Projects

Deep Learning Projects

Data Analysis and Visualization Projects

  • Data Visualization with Python - Project: Data analysis and Data Visualization using Pandas, Matplotlib for Countries's GDP, Life Expectancy comparison across continents, GDP per Capita Relative Growth, Population Reative Growth comparison etc.
  • Fuel Economy Case Study - Project: Analyzing Fuel Economy Data provied by EPA for distributions of greenhouse gas score, combined mpg in 2008 and 2018, correlation between displacement and combined mpg ,greenhouse gas score and combined mpg. Are more unique models using alternative fuels in 2018 compared to 2008? By how much? How much have vehicle classes improved in fuel economy (increased in mpg)? What are the characteristics of SmartWay vehicles? Have they changed over time? (mpg, greenhouse gas) What features are associated with better fuel economy (mpg)? What is the top vehicle which improved the most in terms of combined mpg from 2008 to 2018?
  • Wine Quality Case Study - Project: Analyzing wine data for the following points for wine businesses to model better wine. Is a certain type of wine (red or white) associated with higher quality? What level of acidity (pH value) receives the highest average rating? Do wines with higher alcoholic content receive better ratings? Do sweeter wines (more residual sugar) receive better ratings? White Vs Red Wine Proportions by Color & Quality
  • TV, Halftime Shows, and the Big Game - Project: Analyzing Superbowls data and answering questions like - What are the most extreme game outcomes? How does the game affect television viewership? How have viewership, TV ratings, and ad cost evolved over time? Who are the most prolific musicians in terms of halftime show performances?
  • Weather Trend - Project: Analyzing Global weather trends, Singapore weather trends, Comparing Global vs Singapore 10 years Moving Average trends
  • Real-time Insights from Social Media Data - Project: Analyzing Twitter data and answering questions like: What are gobal trend and local trends?, finding the common trends
  • frequency analysis on tweets and hashtags, etc.
  • Statistics From Stock Data: Analyzing google, apple and amzon stock prices and checking the rolling mean.
  • Android Play Store App Data Analysis - Project: Analyzing andriod play store data and answering questions like - How many apps are paid? How much money are they making? When were these apps released?

Bootcamps

RL - Practical AI with Python and Reinforcement Learning - JP - On Hold

  • 00. NumPy Crash Course
  • 01. Matplotlib Visualization
  • 02. Pandas and Scikit-learn
  • 03. ANNs
  • 04. CNNs
  • 05. Introduction to gym
  • 06. Classical Q Learning
  • 07. Deep Q Learning
  • 08. Deep Q Learning on Images
  • 09. Creating Custom Open AI Gym Environment

Tensorflow 2.0: Deep Learning and Artificial Intelligence - LP

  • Section 2 - Google Colab
  • Section 3 - Machine Learning and Neurons
  • Section 4 - Feedforward Artifical Neural Networks
  • Section 5 - CNN Convolutional Neural Networks
  • Section 6 - RNN - Recurrent Neural Networks, Time Series, Sequence Data
  • Section 7 - NLP
  • Section 8 - Recommender Systems
  • Section 9 - Transfer Learning for Computer Vision
  • Section 10 - GANs
  • Section 11 - Deep Reinforcement Learning (Theory)
  • Section 12 - Stock Trading Project with DL
  • Section 13: Advanced Tensorflow Usage
  • Section 14: Low - Level Tensorflow
  • Section 15: In-Depth: Loss Functions
  • Section 16: In-Depth: Gradient Descent
  • Section 17 - 21: Misc

DeepLearning.AI - Course 04.Sequences, Time Series and Predictions in Tensorflow

  • Week 01 - Sequences and Prediction
  • Week 02 - Deep Neural Networks for Time Series
  • Week 03 - Recurrent Neural Networks for Time Series
  • Week 04 - Real-world time series data

DeepLearning.AI - Course 03.Netural Language Processing in Tensorflow

  • Week 01 - Sentiment in Text
  • Week 02 - Word Embeddings
  • Week 03 - Sequence Models
  • Week 04 - Sequence Models and Literature

DeepLearning.AI - Course 02.Convolutional Neural Networks in TensorFlow

  • Week 01 - Exploring a Larger Dataset
  • Week 02 - Augmentation: A technique to avoid overfitting
  • Week 03 - Transfer Learning
  • Week 04 - Multiclass Classification

DeepLearning.AI - Course 01.Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning

  • Week 01 - A New Programming Paradigm
  • Week 02 - Introduction to Computer Vision
  • Week 03 - Enhancing Vision with CNN
  • Week 04 - Using Real-world images

Deep Learning TensorFlow Developer Certificate - ZTM - IN PROGRESS

  • 01. Introduction
  • 02. Deep Learning and Tensorflow Fundamentals
  • 03. Neural Network Regression with Tensorflow
  • 04. Neural Network Classification with Tensorflow
  • 05. Computer Vision and Convolutional Neural Networks in Tensorflow
  • 06. Transfer Learning - Feature Extraction
  • 07. Transfer Learning - Fine Tuning
  • 08. Transfer Learning - Scaling up
  • 09. Milestone Project 1 - Food Vision Big
  • 10. NLP Fundamentals in Tensorflow
  • 11. Milestone Project 2 - SkimLit
  • 12. Timseries Fundamentals + Milestone Project 3 - BitPredict
  • 13. Passing Tensorflow Certificate Exam
  • 15. Appendix - Machine Learning Primer
  • 16. Appendix - Machine Learning Framework
  • 14, 17-19. Misc

Complete Tensorflow 2 and Keras Deep Learning Bootcamp - JP

Machine Learning & Data Science Masterclass - JP

Complete Machine Learning and Data Science - Zero to Mastery

ML - Machine Learning & Data Science A-Z Hands-on Python - NS

  • 03. Preprocessing
  • 04. Machine Learning Types
  • 05. Supervised Learning - Classification
  • 06. Supervised Learning - Regression
  • 07. Unsupervised Learning - Clustering
  • 08. Hyper Parameters Optimization

Data Science and Machine Learning Bootcamp

Complete Data Science Bootcamp - 365

  • Part 1 - The Field of Data Science
  • Part 2 - Probability
  • Part 3 - Statistics (Descriptive & Inferential)
  • Part 4 - Python
  • Part 5 - Advanced Statistical Methods in Python / Machine Learning in Python
  • Part 6 - Mathematics
  • Part 7 - Deep Learning
  • Software Integration
  • Case Study - Absenteeism

Books

Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow (in progress)

  • The Fundamentals of Machine Learning
  • The Machine Learning Landscape
  • End-to-End Machine Learning Project
  • Classification
  • Training Models

The Hundreded page - Machine Learning book

  • Introduction
  • Notation and Definitions
  • Fundamental Algorithms
  • Anatomy of a Learning Algorithm
  • Basic Practice
  • Neural Networks and Deep Learning
  • Problems and Solutions
  • Advanced Practice
  • Unsupervised Learning
  • Unsupervised Learning - in-depth material
  • Other Forms of Learning
  • Conclusion

Advancing Machine Learning & Data Science Journey - (In Progress)

To skill up my ML & DS related skills in specific areas and topics:

Applied Machine Learning - Ensemble Learning

  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Preparing the Data
  • 03.Ensemble Learning
  • 04.Boosting
  • 05.Bagging
  • 06.Stacking
  • 07.Evaluation and Selection of Models

Applied Machine Learning - Feature Engineering

  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Intro to Feature Engineering
  • 03.Explore Data
  • 04.Create and Clean Features
  • 05.Prepare Features for Modelling
  • 06.Compare and Evaluate Models

Applied Machine Learning - Algorithms

  • Project: Titanic dataset
  • 01.Review of Foundation
  • 02.Logistic Regression
  • 03.Support Vector Machine
  • 04.Multi-layer Perceptron
  • 05.Random Forest
  • 06.Boosting
  • 07.Final Model Selection and Evaluation

Applied Machine Learning - Foundation

  • Project: Titanic dataset
  • 01.ML Basic
  • 02.Exploratory Data Analysis and Data Cleaning
  • 03.Evaluation - Measuring Success
  • 04.Optimizing a Model
  • 05.End to End Pipeline

ML - Mistakes to avoid in Machine Learning

  • Assuming Data is good to go
  • Neglecting to consult subject matter experts
  • Overtiffing your models
  • Not standardizing your data
  • Focusing on Wrong Factors
  • Data Leakage
  • Forgetting traditional statistics tools
  • Assuming Deployment is a breeze
  • Assuming Machine Learning is the answer
  • Developing in a silo
  • Not treating for imbalanced sampling
  • Interpreting your coefficients without properly treating for multicollinearity
  • Evaluating by accuracy alone
  • Giving overly technical presentations

Deep Learning , Machine Learning, AI & Data Science

Data Analysis, Manipulation & Data Visualization

Apache Spark & PySpark

Data Scientist Reading Materials

  • Supervised Learning
    • Lesson 01: Machine Learning Bird's Eye View
    • Lesson 02: Linear Regression
    • Lesson 03: Perceptron Algorithm
    • Lesson 04: Decision Trees
    • Lesson 05: Naive Bayes
    • Lesson 06: Support Vector Machines
    • Lesson 07: Ensemble Methods
    • Lesson 08: Model Evaluation Metrics
    • Lesson 09: Training and Tuning
    • Lesson 10: Finding Donors Project

Kaggle Courses

  • Python
  • Pandas
  • Data Cleaning
  • Introduction to Machine Learning
  • Machine Learning Intermediate
  • Feature Engineering
  • Machine Learning Explaniability
  • Data Visualization
  • Intro to Deep Learning
  • Intro to Game AI and Reinforcement Learning
  • Natural Language Processing
  • Micro-challenges
  • Computer Vision
  • Intro to SQL
  • Advanced SQL

Google ML courses

  • ML Crash Course
  • Problem Framing
  • Data Prep
  • Clustering
  • Recommendation
  • Testing and Debugging
  • GANs

Probability & Statistics (in progress)

  • Linear Regression Analysis
  • Multi Regression Analysis
  • Pratical Statistics
  • Excel Data Manipulation, Analysis and Visualization

Data Science Math Skills - Duke University

Topics include:

  • Set theory, including Venn diagrams
  • Properties of the real number line
  • etc

License

This project is licensed under the MIT License - see the LICENSE.md file for details

More Repositories

1

SQL-Data-Analysis-and-Visualization-Projects

SQL data analysis & visualization projects using MySQL, PostgreSQL, SQLite, Tableau, Apache Spark and pySpark.
Jupyter Notebook
1,224
star
2

MySQL-Tableau-for-Data-Analytics-and-Business-Intelligence

collection of SQL - Tableau integration projects for Data Analytics and Business Intelligence
TSQL
100
star
3

django-WEB-stock-portfolio-management

stock portfolio management system built with Django and deployed on Heroku
JavaScript
26
star
4

Data-Analysis-for-Digital-Music-Store

helping Digitial Music Store to optimize their business practices using PostgreSQL
21
star
5

Python-Projects-Dojo

Collections of python projects including machine learning projects, image and pdf processing, password checkers, sending emails, sms, web scraping,flask web app,selenium automation testing,etc
Jupyter Notebook
21
star
6

Web-Scraping-and-API-in-Python

Web Scraping and API in Python using beautifulsoup, requests, requests-xml, etc for processing multiple APIs and scraping multple sites such as youtube, soundcloud and many more.
Jupyter Notebook
16
star
7

SQL-for-Data-Analysis-Parch-and-Posey

SQL for Data Analysis using PostgresSQL - analyzing Parch&Posey fictional company
TSQL
15
star
8

Tableau_2020_A-Z_Hands-On

Tableau Projects for data analysis, data analytics and data visualaization on different data sets
12
star
9

ptyadana

Hi, this is my Github Profile readme.
7
star
10

django-WEB-dental-website

dental clinic website with django3 deployed on Heroku
CSS
6
star
11

DV-Data-Visualization-with-Python

Data analysis and Data Visualization of Countries's GDP, Life Expectancy comparison across continents, GDP per Capita Relative Growth, Population Reative Growth comparison etc using Pandas, Matplotlib.
Jupyter Notebook
6
star
12

ml-app

machine learning app using streamlit + scikit-learn for Random Forest algorithm.
Python
5
star
13

Tableau-Nutrition-Analysis

Analyzing Nutrition values and calories consumption of daily meals over period of times.
5
star
14

Whatsapp-Automation

Whatsapp chatbot automation with Python and Selenium for sending messages, photos
Jupyter Notebook
5
star
15

ML-Music-Recommender

Machine Learning Project for recommendations of music genre based on age and gender
Jupyter Notebook
4
star
16

iris-flower-ML-app

iris flower predictions Machine Learning app using Tensorflow, Keras, ScikitLearn, Flask deployed on Heroku
HTML
3
star
17

Tableau-Audiobooks-Sales-Analysis

Visualization of audiobooks sales analysis
2
star
18

django-WEB-flashcards

simple math flashcards created using django3 framework, hosted on Heroku
Python
2
star
19

django3-todo-app

Full Fledged todo app with django 3, python, sqlite
Python
2
star
20

web-projects

collections of htm, css, bootstraps, javascript, jquery projects
HTML
2
star
21

django-REST-API-books-info

A simple REST API for getting the lists of books information in JSON hosted on herokuapp
Python
2
star
22

join_us_app

Join Us simple web application with Node.js to sign up the mailing list
JavaScript
1
star
23

django-WEB-meeting-planner

simple meeting planner application using django3
Python
1
star
24

django-WEB-simple-todo

simple Todo app to track of your daily todo list by simply allowing you to add or complete the tasks.
Python
1
star
25

django-REST-API-user-profiles

Full Fledged User Proflile Management - Application + own Backend REST API with Python & Django
Python
1
star
26

django-REST-API-course-info

simple REST API for course info - to reterive the list, create, update, delete course
Python
1
star
27

django-WEB-video-rental

Video rental application and API using django 3 and Tastypie REST framework
Python
1
star
28

django3-password-generator-project

Simple Password Generator using Python and Django 3 framework hosted on pythonanywhere
Python
1
star