• Stars
    star
    393
  • Rank 109,518 (Top 3 %)
  • Language
  • Created almost 4 years ago
  • Updated about 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Probably the best curated list of data science books in Python

View on GitHub View on Website

Awesome Python Data Science Books Awesome

Probably the best curated list of data science books in Python.

Contents

Statistics

Practical Statistics for Data Scientists: 50 Essential Concepts - Peter Bruce & Andrew Bruce

Learn how to apply various statistical methods to data science and how to avoid their misuse. Understand what statistical concept is important and what is not.
Pattern Recognition and Machine Learning - Christopher M. Bishop
Learn approximate inference algorithms that permit fast approximate answers in situations where exact answers are not feasible. Familiarity with multivariate calculus and basic linear algebra is required
Think Bayes: Bayesian Statistics in Python - Allen B. Downey
Learn how to solve statistical problems with Python code instead of mathematical notations. Learn how to work with problems involving estimation, prediction, decision analysis, evidence, and hypothesis testing.
Probabilistic Programming & Bayesian Methods for Hackers - Cameron Davidson-Pilon
Learn Bayesian inference from a computational/understanding-first, and mathematics-second, point of view.
An Introduction to Statistical Learning - Gareth James, Daniela Witten, Trevor Hastie, & Rob Tibshirani
Learn key topics in statistical learning. This book is perfect for those who want a gentle introduction all popular machine learning algorithms.

Data Analysis

Storytelling with Data: A Data Visualization Guide for Business Professionals - Cole Nussbaumer Knaflic

Learn how to determine the appropriate type of graph for your situation, eliminate irrelevant information, and direct your audience's attention to the most important parts of your data.
Data Science from Scratch, 2nd Edition - Joel Grus
Learn data science libraries, frameworks, modules, tools and algorithms by implementing them from scratch.

Data Intuition

Head First Data Analysis: A learner's guide to big numbers, statistics, and good decisions - Michael Milton Knaflic

Learn how to determine which data sources to use for collecting information, distinguish signal from noise, cope with ambiguous information, design experiments to test hypothesis, organize your data using segmentation, and communicate the results of your analysis.
Data Mining Techniques: For Marketing, Sales, and Customer Relationship Management - Gordon S. Linoff & Michael J. A. Berry

Learn how to harness the newest data mining methods and techniques to prepare data for analysis and create the necessary infrastructure for data mining at your company. Learn core data mining techniques, including decision trees, neural networks, collaborative filtering, association rules, link analysis, survival analysis.
Behind Every Good Decision: How Anyone Can Use Business Analytics to Turn Data into Profitable Insight - Piyanka Jain & Puneet Sharma

Learn how to clarify the business question, lay out a hypothesis-driven plan, convert relevant data to insights, and make decisions that make an impact.
The Book of Why: The New Science of Cause and Effect - Judea Pearl & Dana Mackenzie

Learn how to explore the world that is and the worlds that could have been by understanding causality. Learn to answer hard questions, like whether a drug cured an illness.
Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy - Cathy O'Neil

Learn how the models being used today reinforce discrimination, prop up the lucky and punish the downtrodden. The book empower us to ask tough questions, uncover the truth, and demand change.
Business Analytics: The Science of Data - Driven Decision Making - U Dinesh Kumar

Learn the foundations of data science and components of analytics such as descriptive, predictive and prescriptive analytics topics using examples from several industries, as well as nine analytics case studies. The book gives equal importance to theory and practice with examples across industries and the case studies provide a deeper understanding of analytics techniques and deployment of analytics-driven solutions.

Feature Engineering

Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists - Alice Zheng & Amanda Casari
Learn techniques for extracting and transforming features into formats for machine-learning models through practical application with exercises using tools such as numpy, Pandas, Scikit-learn, and Matplotlib.
Python Data Science Handbook - Jake VanderPlas
Learn how to manipulate, transform, and clean data; visualize different types of data; and use data to build statistical or machine learning models using IPython, NumPy, Pandas, Matplotlib, Scikit-Learn, and other related tools.
Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython - Wes McKinney
Learn how to manipulate, process, clean, and crunch datasets in Python and how to work with time series data through real-world problems using Jupyter Notebook, Numpy, pandas, matplotlib.

Machine Learning

The Hundred-Page Machine Learning Book - Andriy Burkov
Learn everything you really need to know in Machine Learning in a hundred page.
Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2 - Sebastian Raschka & Vahid Mirjalili
Learn all the essential machine learning techniques in depth. Learn how to use scikit-learn for machine learning and TensorFlow for deep learning.
Machine Learning for Algorithmic Trading: Predictive models to extract signals from market and alternative data for systematic trading strategies with Python - Stefan Jansen
Learn end-to-end machine learning for the trading workflow, from the idea and feature engineering to model optimization, strategy design, and backtesting.
Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems - Aurélien Géron
Learn a range of techniques, starting with simple linear regression and progressing to deep neural networks using concrete examples, minimal theory, and two production-ready Python frameworks—Scikit-Learn and TensorFlow.
Building Machine Learning Powered Applications: Going from Idea to Product - Emmanuel Ameisen
Learn the skills necessary to design, build, and deploy applications powered by machine learning. Learn the tools, best practices, and challenges involved in building a real-world ML application.
Machine Learning Yearning - Andrew Ng
Learn how to align on ML strategies in a team setting, as well as how to set up development (dev) sets and test sets.
Approaching (Almost) Any Machine Learning Problem - Abhishek Thakur
Learn how and what you should use to solve machine learning and deep learning problems. Appropriate for those who have some theoretical knowledge of machine learning and deep learning.
Machine Learning Engineering - Andriy Burkov
Learn best practices and design patterns of building reliable machine learning solutions tha scale.
Interpretable Machine Learning - Christoph Molnar
Learn the concepts of interpretability, interpretable models, and general methods for interpreting black box models. Learn in depth the strengths and weaknesses of each method and how their outputs can be interpreted.
Building Machine Learning Pipelines - Hannes Hapke & Catherine Nelson
Learn the steps of automating a machine learning pipeline using the TensorFlow ecosystem.
Introduction to Machine Learning with Python - Andreas C. Müller & Sarah Guido
Learn to create a successful machine-learning application with Python and the scikit-learn library.

Time Series

Introduction to Time Series Forecasting With Python - Jason Brownlee
Learn how to load and prepare data, evaluate model skill, and implement forecasting models for time series data.This book cuts through the math and specialized methods for time series forecasting.
Practical Time Series Analysis - Aileen Nielsen
Learn to solve the most common data engineering and analysis challenges in time series, using both traditional statistical and modern machine learning techniques.

Natural Language Processing

Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit - Steven Bird & Ewan Klein
Learn how to predict text, filter email to automatic summarization and translation, and learn how to write Python programs that work with large collections of unstructured text.
Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems - Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta & Harshit Surana
Learn how to adapt your solutions for different industry verticals such as healthcare, social media, and retail. Understand tasks and solution approaches within NLP and best practices around deployment for NLP systems.
Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning - Delip Rao & Brian McMahan
Learn the basics of the PyTorch, traditional NLP concepts and methods, neural networks, embeddings, sequence prediction, and design patterns for building production NLP systems.
Transformers for Natural Language Processing: Build innovative deep neural network architectures for NLP with Python, PyTorch, TensorFlow, BERT, RoBERTa, and more - Denis Rothman
Learn in detail the deep learning for machine translations, speech-to-text, text-to-speech, language modeling, question answering, and many more NLP domains with transformers.

Deep Learning

Deep Learning for Coders with Fastai and PyTorch: AI Applications Without a PhD - Jeremy Howard & Sylvain Gugger
Learn how to train a model on a wide range of tasks in deep learning with little math background and minimal code using fastai and Pytorch. Written by the creators of fastai.
Deep Learning (Adaptive Computation and Machine Learning series) - Ian Goodfellow, Yoshua Bengio, Aaron Courville & Francis Bach
Learn mathematical and conceptual background, deep learning techniques used by practitioners in industry, including deep feedforward networks, regularization, optimization algorithms, convolutional networks, sequence modeling, and practical methodology, and other theoretical topics.
Deep Learning with PyTorch - Eli Stevens, Luca Antiga, and Thomas Viehmann
Learn how to create deep learning and neural network systems with PyTorch and learn best practices for the entire deep learning pipeline for advanced projects.
Long Short-Term Memory Networks With Python - Jason Brownlee
Learn what LSTMs are, and how to develop a suite of LSTM models using Keras and TensorFlow 2. This book cuts through the math, research papers and patchwork descriptions about LSTMs.
Practical Deep Learning Book for Cloud, Mobile & Edge: Real-World AI and Computer Vision Projects Using Python, Keras and TensorFlow - Anirudh Koul, Siddha Ganju, & Meher Kasam
Learn how to build practical computer vision based deep learning applications that can be deployed on the cloud, mobile, browsers, or edge devices using a hands-on approach.
Deep Learning Illustrated - Jon Krohn
Learn essential concepts in deep learning through visualization with little math.

Code Optimization

Effective Python: 59 Specific Ways to Write Better Python (Effective Software Development Series) - Brett Slatkin
Learn how to choose the most efficient and effective way to accomplish key tasks when multiple options exist, and how to write Python code that's easier to understand, maintain, and improve.
Python Tricks: A Buffet of Awesome Python Features - Dan Bader
Learn best practices and little-known tricks to round out your Python knowledge.
Python High Performance Programming - Gabriele Lanaro
Learn how to identify and sove the bottlenecks in your applications, write efficient numerical code in NumPy and Cython, and adapt your programs to run on multiple processors with parallel programming.
Python Cookbook - David Beazley & Brian K. Jones
Learn the core Python language as well as tasks common to a wide variety of application domains such as data structures and algorithms, classes and objects, metaprogramming, modules and packages, testing, debugging, and exceptions.

Scraping

Web Scraping with Python: Collecting Data from the Modern Web - Ryan Mitchell
Learn how to query web servers, request data, and parse it to extract the information you need using tools such as requests, BeautifulSoup, Scrapy, APIs and how to store, read, and clean the data you scrape.

Career in data science

Build a Career in Data Science - Emily Robinson & Jacqueline Nolis
Learn how to how to land your first job to the lifecycle of a data science project, and how to become a manager.

How to Contribute

Contributions are always welcome! If you know some interesting books or other categories that should be here but are not, feel free to contribute! To contribute, follow four steps below:

  1. Fork the repo
  2. Add new resources using the same markdown format.
  3. Start the book summary with "Learn..."
  4. Submit the pull request

That's it. As soon as I review your pull request, your resources will be added to this page.

Alternatively, you can create an issue with book recommendation.

More Repositories

1

Data-science

Collection of useful data science topics along with articles, videos, and code
Jupyter Notebook
4,031
star
2

Efficient_Python_tricks_and_tools_for_data_scientists

Efficient Python Tricks and Tools for Data Scientists
Jupyter Notebook
1,422
star
3

data-science-template

Template for a data science project
Python
683
star
4

machine-learning-articles

List of interesting articles on different topics of machine learning and deep learning
HTML
163
star
5

reproducible-data-science

Tutorials on creating a reproducible and maintainable data science project
Jupyter Notebook
136
star
6

rich-dataframe

Create animated and pretty Pandas Dataframe
Python
117
star
7

cicd-mlops-demo

Demo for CI/CD in a machine learning project
Python
91
star
8

Machine-learning-pipeline

Example machine learning pipeline with MLflow and Hydra
Python
86
star
9

top-github-scraper

Scape top GitHub repositories and users based on keywords
HTML
79
star
10

Python-data-science-code-snippet

Useful data science and Python code snippets at Data Science Simplified
Jupyter Notebook
67
star
11

khuyentran1401

49
star
12

prefect2-mlops-demo

Demo on how to use Prefect 2 in an ML project
Python
40
star
13

employee-future-prediction

Demo for Using GitHub Actions in MLOps
Jupyter Notebook
40
star
14

prefect-mlops-recipes

Tutorials/use cases of using Prefect in an ML project.
39
star
15

hydra-demo

Python
31
star
16

analyze_github_feed

Create a local dashboard to visualize and filter your GitHub feed
Python
29
star
17

prefect-docker

Demo on how to use Prefect with Docker
Python
26
star
18

prefect-dvc

Python
24
star
19

python_snippet

Python and data science snippets on the command line
Python
21
star
20

Task-scheduler-problem

Jupyter Notebook
21
star
21

detect-data-drift-pipeline

A pipeline to detect data drift and retrain the model when there is drift
Python
20
star
22

hydra_demo

Demo of Hydra
Python
18
star
23

dog_classifier

A simple app to classify dogs using fastai and streamlit.
Jupyter Notebook
17
star
24

same-stats-different-graphs

Create datasets with different graphs but the same statistics
Python
17
star
25

kedro_demo

A demo of a data science project using Kedro
Python
16
star
26

Numerical-Optimization-Machine-learning

Codes for popular numerical optimization methods and machine learning algorithms
Jupyter Notebook
13
star
27

Applied-Integer-Programming-with-Python

Jupyter Notebook
13
star
28

prefect-alert

A decorator that sends alert when a Prefect flow fails
Python
12
star
29

linear-programming-with-PuLP

Jupyter Notebook
12
star
30

Voronoi-diagram

Implementation of voronoi diagram with incremental algorithm
Jupyter Notebook
12
star
31

iris-prefect

Python
11
star
32

aboutKhuyen

Website showing some of my accomplishments
9
star
33

kdtree-implementation

Python
8
star
34

prefect-course

Python
6
star
35

Data-science-videos

Videos for Data Science Simplified YouTube Channel
Jupyter Notebook
6
star
36

Extract-text-from-article

Jupyter Notebook
6
star
37

dagshub-demo

Demo of DagsHub
Jupyter Notebook
6
star
38

strip_interactive

Strip and execute interactive Python string in a Python script
Python
6
star
39

Web-Scrapping-Wikipedia

Jupyter Notebook
5
star
40

Speed-Dating

Explore the factors that make a yes/no in a fast dating setup
Python
5
star
41

dbt-demo

Demo for dbt
5
star
42

dataset

4
star
43

Cython

HTML
4
star
44

atoti_project

An example data science project using atoti
Python
4
star
45

Author-Profiling

Jupyter Notebook
3
star
46

animated_bar_chart

Jupyter Notebook
3
star
47

google_trend

Jupyter Notebook
3
star
48

deploy_atoti

Python
3
star
49

test-gpt-commit

Python
3
star
50

refactor_function

5 Steps to Transform Messy Functions into Production-Ready Code
Python
3
star
51

Suicide-rates

Jupyter Notebook
2
star
52

MNIST-gradient-descent

Implementation of gradient descent from scratch with binary cross entrophy loss
Jupyter Notebook
2
star
53

nyc_property_sales

Jupyter Notebook
2
star
54

creative-developer-blog

SCSS
2
star
55

github_analysis

Analyze top users on Github
2
star
56

world-population-prediction

Simple prediction of world population using Linear Regression
Jupyter Notebook
2
star
57

code_image_to_text

Python
2
star
58

Web-scrape-Ghibli-Movie-Database

Jupyter Notebook
2
star
59

my-ds

Python
2
star
60

talk_demos

Collections of code for meetups and conferences
Python
2
star
61

dbt-mage

Python
2
star
62

EPS-Y

Jupyter Notebook
2
star
63

Game-of-Thrones-And-Graph

https://towardsdatascience.com/how-to-visualize-social-network-with-graph-theory-4b2dc0c8a99f
Jupyter Notebook
2
star
64

visualize_github

Jupyter Notebook
1
star
65

pandas-processors

Python
1
star
66

mlops-kestra-workflow

Demo on an automated model training workflows triggered by S3
Python
1
star
67

Computational-Geometry

Jupyter Notebook
1
star
68

KNN-and-Bayes-Classifier

The implementation and comparison of Optimal Bayes with symmetric loss and KNN Classifier
Python
1
star
69

Time-Python-Objects

Jupyter Notebook
1
star
70

analyze-happiness-report

Jupyter Notebook
1
star
71

predict-heart-disease

Jupyter Notebook
1
star
72

Non-negative-least-squares

Jupyter Notebook
1
star
73

Recursion-examples

Jupyter Notebook
1
star
74

voyce

Python
1
star
75

Web-scrapping

In this repository, I use Beautiful Soup to extract data from websites to gain insights for further analysis
Jupyter Notebook
1
star
76

Union-algorithms

Jupyter Notebook
1
star
77

pretty-text

A package to create pretty text in 1 command line
Python
1
star
78

Ghibli-scrape-analysis

Jupyter Notebook
1
star
79

Flask

Python
1
star
80

python-project

Makefile
1
star
81

article-analysis

Jupyter Notebook
1
star
82

zenml_example

An end-to-end project using ZenML
1
star
83

customed-nitpick

Customized configurations for nitpick
1
star
84

Sample_datapane_script

This repo shows how to use Datapane create a simple script to see the rank of the authors or publications with respect to publishing frequency
Jupyter Notebook
1
star
85

gapminder

Jupyter Notebook
1
star