God-Level Data Science ML Full Stack
A collection of scientific methods, processes, algorithms, and systems to build stories & models. This roadmap contains 16 Chapters, whether you are a fresher in the field or an experienced professional who wants to transition into Data Science & AI
The Roadmap is divided into 16 Sections
Duration: 256 Hours of Learning (8 Months) and many more hours for practice and project building.
- Python Programming and Logic Building
- Data Structure & Algorithms
- Pandas Numpy Matplotlib
- Statistics
- Machine Learning
- ML Operations
- Natural Language Processing
- Computer Vision
- Data Visualization with Tableau
- Structured Query Language (SQL)
- Data Engineering
- Data System Design
- Five Major Capstone Projects
- Interview Preparations
- Git & GitHub
- Personal Branding and portfolio
Resources
Technology Stack
- Python
- Data Structures
- NumPy
- Pandas
- Matplotlib
- Seaborn
- Scikit-Learn
- Statsmodels
- Natural Language Toolkit ( NLTK )
- PyTorch
- OpenCV
- Tableau
- Structure Query Language ( SQL )
- PySpark
- Azure Fundamentals
- Azure Data Factory
- Databricks
- 5 Major Projects
- Git and GitHub
1 | Python Programming and Logic Building
I will prefer Python Programming Language. Python is the best for starting your programming journey. Here is the roadmap of python for logic building.
- Python basics, Variables, Operators, Conditional Statements
- List and Strings
- While Loop, Nested Loops, Loop Else
- For Loop, Break, and Continue statements
- Functions, Return Statement, Recursion
- Dictionary, Tuple, Set
- File Handling, Exception Handling
- Object-Oriented Programming
- Modules and Packages
2 | Data Structure & Algorithms
Data Structure is the most important thing to learn not only for data scientists but for all the people working in computer science. With data structure, you get an internal understanding of the working of everything in software.
Understand these topics
- Types of Algorithm Analysis
- Asymptotic Notation, Big-O, Omega, Theta
- Stacks
- Queues
- Linked List
- Trees
- Graphs
- Sorting
- Searching
- Hashing
3 | Pandas Numpy Matplotlib
Python supports n-dimensional arrays with Numpy. For data in 2-dimensions, Pandas is the best library for analysis. You can use other tools but tools have drag-and-drop features and have limitations. Pandas can be customized as per the need as we can code depending upon the real-life problem.
Numpy
- Vectors, Matrix
- Operations on Matrix
- Mean, Variance, and Standard Deviation
- Reshaping Arrays
- Transpose and Determinant of Matrix
- Diagonal Operations, Trace
- Add, Subtract, Multiply, Dot, and Cross Product.
Pandas
- Series and DataFrames
- Slicing, Rows, and Columns
- Operations on DataFrame
- Different ways to create DataFrame
- Read, Write Operations with CSV files
- Handling Missing values, replace values, and Regular Expression
- GroupBy and Concatenation
Matplotlib
- Graph Basics
- Format Strings in Plots
- Label Parameters, Legend
- Bar Chart, Pie Chart, Histogram, Scatter Plot
4 | Statistics
Descriptive Statistics
- Measure of Frequency and Central Tendency
- Measure of Dispersion
- Probability Distribution
- Gaussian Normal Distribution
- Skewness and Kurtosis
- Regression Analysis
- Continuous and Discrete Functions
- Goodness of Fit
- Normality Test
- ANOVA
- Homoscedasticity
- Linear and Non-Linear Relationship with Regression
Inferential Statistics
- t-Test
- z-Test
- Hypothesis Testing
- Type I and Type II errors
- t-Test and its types
- One way ANOVA
- Two way ANOVA
- Chi-Square Test
- Implementation of continuous and categorical data
5 | Machine Learning
The best way to master machine learning algorithms is to work with the Scikit-Learn framework. Scikit-Learn contains predefined algorithms and you can work with them just by generating the object of the class. These are the algorithm you must know including the types of Supervised and Unsupervised Machine Learning:
- Linear Regression
- Logistic Regression
- Decision Tree
- Gradient Descent
- Random Forest
- Ridge and Lasso Regression
- Naive Bayes
- Support Vector Machine
- KMeans Clustering
Other Concepts and Topics for ML
- Measuring Accuracy
- Bias-Variance Trade-off
- Applying Regularization
- Elastic Net Regression
- Predictive Analytics
- Exploratory Data Analysis
6 | MLOps
You can master any one of the cloud services provider from AWS, GCP and Azure. You can switch easily once you understand one of them.
We will focus on AWS - Amazon Web Services first
- Deploy ML models using Flask
- Amazon Lex - Natural Language Understanding
- AWS Polly - Voice Analysis
- Amazon Transcribe - Speech to Text
- Amazon Textract - Extract Text
- Amazon Rekognition - Image Applications
- Amazon SageMaker - Building and deploying models
- Working with Deep Learning on AWS
7 | Natural Language Processing
If you are interested in working with Text, you should do some of the work an NLP Engineer do and understand the working of Language models.
- Sentiment analysis
- POS Tagging, Parsing,
- Text preprocessing
- Stemming and Lemmatization
- Sentiment classification using Naive Bayes
- TF-IDF, N-gram,
- Machine Translation, BLEU Score
- Text Generation, Summarization, ROUGE Score
- Language Modeling, Perplexity
- Building a text classifier
- Identifying the gender
8 | Computer Vision
To work on image and video analytics we can master computer vision. To work on computer vision we have to understand images.
- PyTorch Tensors
- Understanding Pretrained models like AlexNet, ImageNet, ResNet.
- Neural Networks
- Building a perceptron
- Building a single layer neural network
- Building a deep neural network
- Recurrent neural network for sequential data analysis
Convolutional Neural Networks
- Understanding the ConvNet topology
- Convolution layers
- Pooling layers
- Image Content Analysis
- Operating on images using OpenCV-Python
- Detecting edges
- Histogram equalization
- Detecting corners
- Detecting SIFT feature points
9 | Data Visualization with Tableau
How to use it Visual Perception
- What is it, How it works, Why Tableau
- Connecting to Data
- Building charts
- Calculations
- Dashboards
- Sharing our work
- Advanced Charts, Calculated Fields, Calculated Aggregations
- Conditional Calculation, Parameterized Calculation
10 | Structured Query Language (SQL)
- Fundamental to SQL syntax and Installation
- Creating Tables, Modifiers
- Inserting and Retrieving Data, SELECT INSERT UPDATE DELETE
- Aggregating Data using Functions, Filtering and RegEX
- Subqueries, retrieve data based on conditions, grouping of Data.
- Practice Questions
- JOINs
- Advanced SQL concepts such as transactions, views, stored procedures, and functions.
- Database Design principles, normalization, and ER diagrams.
- Practice, Practice, Practice: Practice writing SQL queries on real-world datasets, and work on projects to apply your knowledge.
11 | Data Engineering
BigData
- What is BigData?
- How is BigData applied within Business?
PySpark
- Resilient Distributed Datasets
- Schema
- Lambda Expressions
- Transformations
- Actions
Data Modeling
- Duplicate Data
- Descriptive Analysis on Data
- Visualizations
- ML lib
- ML Packages
- Pipelines
Streaming
- Packaging Spark Applications
12 | Data System Design
What is system design?
- IP and OSI Model
- Domain Name System (DNS)
- Load Balancing
- Clustering
- Caching
- Availability, Scalability, Storage
Databases and DBMS
- SQL databases
- NoSQL databases
- SQL vs NoSQL databases
- Database Replication
- Indexes
- Normalization and Denormalization
- CAP theorem
System Design Interview
- URL Shortener
- Whatsapp, Twitter, Netflix, Uber
13 | Five Major Projects and Git
We follow project-based learning and we will work on all the projects in parallel.
14 | Interview Preperation
15 | Git & GitHub
Git & GitHub Course
- Understanding Git
- Commands and How to commit your first code?
- How to use GitHub?
- How to make your first open-source contribution?
- How to work with a team? - Part 1
- How to create your stunning GitHub profile?
- How to build your own viral repository?
- Building a personal landing page for your Portfolio for FREE
- How to grow followers on GitHub?
- How to work with a team? Part 2 - issues, milestone and projects
16 | Personal Profile & Portfolio
Resources
Datasets
Research Starting Point
Machine Learning
Deep Learning
Reinforcement Learning
Projects
Here is the list of project ideas
Notion Template
Data Science ML Full Stack ->Join the WhatsApp Community Group
https://chat.whatsapp.com/BSUPbYhzzM1BcJplcTTIxb
Socials
Join Telegram for Data Science ML AI Resources:
https://t.me/+sREuRiFssMo4YWJl
Connect with me on these platforms:
LinkedIn: https://www.linkedin.com/in/hemansnation/
YouTube: https://www.youtube.com/@Himanshu-Ramchandani
Twitter: https://twitter.com/hemansnation
GitHub: https://github.com/hemansnation
Instagram: https://www.instagram.com/masterdexter.ai/
AI Jobs LinkedIn Group:
https://www.linkedin.com/groups/12540639/
Medium Blog:
https://medium.com/@hemansnation
Notes on Data, Product, and AI - Newsletter:
https://www.linkedin.com/build-relation/newsletter-follow?entityUrn=7014799989251956736
Any Query?
Email Me Here: [email protected]