Introduction
Hello there. I am Shanmukha Sainath, a 4th (final) year undergraduate student from department of Electronics and Electrical Communication Engineering department, IIT Kharagpur. I will be joining KLA Corporation as an Associate Analyst in 2023.
Connect with me:
Why I made this?
Internet world is huge, so as resources to learn any new things. There are numerous free and paid resources to learn Machine Learning. Having many options in hand confuses and it's difficult to select best one (saying from experience). So, I have collected best resources to get started with Machine Learning and continue career in this field.
Feedback and suggestions are welcome :)
Prerequisites
- Mathematics
- Linear Algebra
- Matrix Algebra
- Probability and Statistics
- Calculus
- Programming Fundamentals
- Data Structures and Algorithms
- Programming Language
- Python
Details
18.06 Linear Algebra course by MIT is the best course to learn basics of Linear Algebra
Details
Matrices course by Khan Academy is the best course to learn basics of Matrix Algebra
Details
Statistics and Probability course by Khan Academy is best course available.
Details
Differential Calculus is the best course to learn basics of Differential Calculus.
Details
6.006 Intoduction to Algorithms is the course by MIT to learn basics of Data Structures and Algorithms.
Details
Python tutorial is best place to learn basic syntax of Python.
Machine Learning
- Courses
- Machine Learning Specialization by Andrew Ng (New Course
π ) : Coursera. - Machine Learning A-Z : Udemy.
- Books
- Pattern Recognition and Machine Learning by Christopher Bishop
- An Introduction to Statistical Learning by Gareth M. James, Daniela Witten, Trevor Hastie and Robert Tibshirani
- Hands on Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron
Deep Learning
- Courses
- Deep Learning Specialization by Andrew Ng : Coursera
- Deep Learning with PyTorch by Yann LeCun : YouTube
- Deep Learning with fast.ai by Jeremy Howard : fast.ai
- Books
- Deep Learning by Ian Goodfellow, Yoshua Bengio and Aaron Courville
- Deep Learning with Python by François Chollet
- Hands on Machine Learning with Scikit-Learn and TensorFlow by Aurelien Geron
- Dive into Deep Learning by Amazon scientists
Frameworks/Libraries
"No tutorial/course is better than Documentation :)"
But I am sharing other resources for some libraries to learn them quickly. Whenever you got stuck at some function or implementation. It is always better to refer documentation
/tutorials
/code
present in official website.
- NumPy
- Tabular data
- Pandas
- Image data
- OpenCV
- Pillow
- Text data
- NLTK
- SpaCy
- Matplotlib
- Seaborn
- Plotly
- Scikit-Learn
- fast.ai
- PyTorch
- TensorFlow
Working with Arrays
Details
NumPy is a library that enables Numerical Computing
in Python. In Machine Learning we always work with arrays. NumPy helps to operate these arrays using large number of functions available.
Data Preprocessing
Details
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation tool, built on top of the Python programming language. To know more about usage and advantages of Pandas visit Package Overview page
This will help to get used to some frequent operations done with Pandas.Details
OpenCV-Python is a library of Python bindings designed to solve computer vision problems. OpenCV-Python is a Python wrapper for the original OpenCV C++ implementation.
Refer to official tutorials for more details and implementation.Details
The Python Imaging Library
adds image processing capabilities to Python interpreter. This library provides extensive file format support, an efficient internal representation, and fairly powerful image processing capabilities.
Details
NLTK is a leading platform for building Python programs to work with human language data. It provides over 50 corpora and lexical resources such as WordNet, along with a suite of text processing functions for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries
This will help to get used to some frequent operations done with NLTK.Details
spaCy is an open-source software library for advanced Natural Language Processing, written in the programming languages Python and Cython.
This course by spaCy helps to get started with spaCy.Data Visualization
Details
Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python.
Refer to official tutorials for more details and implementation.Details
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level interface for drawing attractive and informative statistical graphics.
Refer to official tutorials for more details and implementation. Refer to gallery to knoe about various types of plots present in seaborn.Details
Plotly's Python graphing library makes interactive, publication-quality graphs. Examples of how to make line plots, scatter plots, area charts, bar charts, error bars, box plots, histograms, heatmaps, subplots, multiple-axes, polar charts, and bubble charts.
Machine Learning
Details
Scikit-learn is a free software machine learning library for the Python programming language. It features various classification
, regression
and clustering algorithms
. It is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Intro to ML with Scikit-Learn and 50 scikit-learn tips are best freely available courses provided by Data School
to learn Scikit-Learn
Deep Learning
Details
fastai is a deep learning library which provides practitioners with high-level components that can quickly and easily provide state-of-the-art results in standard deep learning domains, and provides researchers with low-level components that can be mixed and matched to build new approaches. Check About page for more information.
Refer to official tutorials for more details and implementation.Details
PyTorch is a Deep Learning framework developed by Meta
that enables fast, flexible experimentation and efficient production through a user-friendly front-end, distributed training, and ecosystem of tools and libraries.
Details
TensorFlow is a Deep Learning framework developed by Google
. It is a free and open-source software library for machine learning and artificial intelligence. It can be used across a range of tasks but has a particular focus on training and inference of deep neural networks.
What next ?
- Competitions
- Kaggle
- ML Contests
- List of ML hackathon platforms
- Research
- Papers with Code
Details
Kaggle is biggest data sceince community where one can share their work, particpate in competitions, learn from free courses and lot more.
To get more out of Kaggle, participate in any competition which is in field of your interest. Competitions are aminly divided into 3 categories Tabular
, Computer Vision
, NLP
. If there are no any active competitions attempt past competitions which interests you. If you got stuck at any point refer publicly avaliable notebooks / post in discussion forum. There are enoromous number of datasets available on Kaggle. You can also download datasets and start your own project
This website contains a list of ongoing ML competitions across various platforms
This blog written by Vetrivel PS has list of Data Science competition platforms.
Details
Papers with Code
is a free and open resource with Machine Learning papers, code, datasets, methods and evaluation tables.
Everything in PwC are divided into categories which makes it easy to get particular paper. Go to the category / field that interests you (Browse State-of-the-Art
). Select any paper based on benchmarked dataset / Most implemented / Libraries. You can also find code implementations in various frameworks.
Read the paper. Implement the algorithm/model with your favourite framework. Train it with dummy data to check. It's best way to get into research.
Other Resources
- YouTube Channels
- Blogs
- Research Papers
- Datasets
- University Lectures
-
CS231n : Computer Vision
-
CS224n : Natural Language Processing
-
CS224W : Machine Learning with Graphs
-
CS285 : Reinforcement Learning
- NewsLetters
- People/Pages to follow
- Medium
- Cloud GPUs
- Join these communities
-
Yannic Kilcher (Discord)
-
CORD.ai (Slack)
-
MLSpace: The Machine Learning Community (Abhishek Thakur) (Discord)