Free data science resources
Overview
The goal of this page is to gather resources and learning materials across a broad range of popular data science topics and arrange them thematically. Resources have been selected because they are
- High quality
- Free of charge
- Donโt require readers to sign up
Remember that material that is offered freely on the web is paid for by the authorโs time - if you find a resource particularly useful, consider supporting them in whatever way they prefer. If you find this page useful please share it and spread the word! If you find a mistake or broken link, please file an issue or submit a pull request.
Key to resource types
๐ = Course๐ = Tutorial or blog post๐ = Book or book chapterโถ๏ธ = Video or webinar๐ง = Podcast or audio recording๐ฅ = Community or user forum๐ = Journal or technical article๐ก = Cheat sheetโ = List
Software & Programming
Getting started with R
๐ Modern Dive: Getting Started by Chester Ismay and Albert Y. Kim.- The very first of first steps. Install R & RStudio and what to do after that.
๐ RYouWithMe: Basic Basics by Lisa Williams, RLadies Sydney.- Tour of RStudio, installing and using packages and getting data into RStudio.
๐ Teacups, Statistics and Giraffes by Hasse Walum and Desirรฉe de Leon.- Accessible introduction to R and statistics with interactive coding exercises.
โถ๏ธ A Gentle Introduction to Tidy Statistics in R by Thomas Mock, RStudio.- Webinar covering exploratory data analysis, tidyverse, statistical testing and plotting.
๐ The R Bootcamp by Ted Laderas and Jessica Minnier.- A tidyverse-centric interactive course for data manipulation, graphics, data reshaping, and statistical modelling.
๐ RStudio Primers by RStudio.- Interactive tutorials from RStudio covering data manipulation, visualisation and programming with R.
๐ Swirl: Learn R, in R by Ismael Fernรกndez, Nick Carchedi and Sean Kross.- Learn R with interactive courses in the console.
๐ Using R for Data Journalism by Andrew Ba Tran.- Video supported intro course with emphasis on wrangling and visualisation.
๐ R for Data Science by Garrett Grolemund and Hadley Wickham.- Comprehensive guide to using R programming for data science workflows.
๐ Introduction to Data Science: Data Analysis and Prediction Algorithms with R by Rafael A. Irizarry.- Introduction to data science focused topics in R: visualisation, wrangling, prediction and workflow.
๐ก Base R Cheat Sheet by Mhairi McNeill.- Quick overview of basic R functionality.
Advancing with R
๐ Tidynomicon - A Brief Introduction to R for People Who Count From Zero by Greg Wilson.- An introduction to R for Python users.
๐ Hands-on Programming with R by Garrett Grolemund.- A friendly introduction to the R language for non-programmers.
๐ R Cookbook: Proven Recipes for Data Analysis, Statistics, and Graphics by James (JD) Long, Paul Teetor.- Recipes and worked examples for performing core tasks in R.
๐ R package primer: a minimal tutorial by Karl Broman.- Overview of R packages development.
๐ R Packages by Hadley Wickham and Jennifer Bryan.- Comprehensive guide to how R packages work and how to write your own.
๐ Efficient R programming by Colin Gillespie and Robin Lovelace.- Comprehensive introduction to writing faster and more efficient R code.
๐ Advanced R by Hadley Wickham.- Get deeper into R programming fundamentals, object oriented and functional programming concepts and a lot more. A must-read for experience R users!
โถ๏ธ RStudio Webinars by RStudio.- Recordings of past RStudio webinars covering a variety of R and data science content.
๐ An Introduction to R by W. N. Venables, D. M. Smith and the R Core Team.- Introduction to R written by the R-Core team.
๐ /๐ Data science for economists by Grant McDermott.- Slides and code examples covering wide ranging introduction to data science in R.
๐ /๐ Big Data in Economics by Grant McDermott.- Notes cover the use of R with shell, GitHub, web scraping, docker and cloud compute.
๐ Handling Strings with R by Gaston Sanchez and Chitra Venkatesh.- Detailed introduction to strings, manipulation, regex and text wrangling.
โถ๏ธ R Package Development by John Muschelli.- 6-part video series on the basics of R package development,
testing and building a
pkgdown
site.
- 6-part video series on the basics of R package development,
testing and building a
Getting started with Python
๐ Install Python and Anaconda by Anaconda.- The most commonly used package and environment manager for Python and how to install it.
๐ Free interactive introduction to Python and pandas by ?.- Beginners introduction to Python, pandas and data analysis via an interactive course.
๐ Quick reference to Python in a single script and notebook by Kevin Markham.- Comprehensive reference guides for Python programming via notebooks and script examples.
๐ /โถ๏ธ An Introduction to Python and Programming by Alexander Hess.- Python course for aspiring data scientists via notebooks, videos and exercises.
๐ A Whirlwind Tour of Python by Jake VanderPlas.- A fast-paced introduction to essential features of the Python language for those already familiar with another language.
๐ Learn Python by Ron Reiter.- Interactive online courses and tutorials for a wide range of Python topics.
๐ก Pandas Cheat Sheet by the Pandas development team.- 2-page quick reference to the most commonly used
pandas
functions.
- 2-page quick reference to the most commonly used
๐ Getting Started in pandas by the Pandas development team.- Tutorials and quick start guides from the
pandas
development team.
Advancing with Python
๐ Python Data Science Handbook by Jake VanderPlas.- Online book with comprehensive coverage of IPython, numpy, pandas, matplotlib and machine learning with scikit-learn.
๐ Python for Everybody: Exploring Data Using Python 3 by Charles R. Severance.- Python ebook with a focus on programming fundamentals. Translations available in several languages.
๐ Python Packaging User Guide by the Python Packaging Authority (PyPA).- A collection of tutorials and references to help you distribute and install Python packages with modern tools.
Shell
๐ Learn Shell by Ron Reiter.- A browser-based interactive Shell tutorial covering basics through to advanced topics.
๐ The Unix Shell by Software Carpentry.- Tutorials and examples of how to use the unix shell.
๐ Beginners/BashScripting by Ubuntu Documentation.- Introduction to using the shell for OS navigation and scripting.
โถ๏ธ How to Write a Shell Script using Bash Shell in Ubuntu by FS Tutorial- Short video showing how to write a first shell script using vim.
๐ /โถ๏ธ The Missing Semester of Your CS Education by Anish Athalye, Jon Gjengset and Jose Javier Gonzalez Ortiz- Videos and notes on using shell and version control.
๐ The Art of the Command Line by Joshua Levy- Useful list of bash commands and explanations, all laid out on a single page!
๐ ExplainShell.com by Idan Kamara- Handy utility - type in a shell command and get an explanation of what it does.
Regular expressions
๐ RegexOne: Learn Regular Expressions with simple, interactive exercises. by RegexOne- Simple, browser based course with interactive exercises.
๐ Regular Expressions 101: Online Regular Expression Tester and Debugger by Firas Dib- Very handy tool to test regular expressions against test strings.
๐ก Data Science Cheat Sheet: Python Regular Expressions by Dataquest- PDF cheat-sheet for standard regular expression syntax.
๐ก Regular Expressions Cheat Sheet by Dave Child- PDF cheat-sheet for standard regular expression syntax.
Git
๐ Happy Git and GitHub for the useR by Jenny Bryan, the STAT 545 TAs and Jim Hester- If you are an R user and new to git, this is currently the best place to start.
๐ An introduction to Git and how to use it with RStudio by Franรงois Michonneau- Conceptual overview of what git is and how to use it, with particular emphasis on Github and its use with RStudio.
๐ก Git Cheat Sheet by GitHub- A list of the main git shell commands.
๐ Pro Git by Scott Chacon and Ben Straub- Free ebook covering more advanced usage of git - good once youโre confident with the basics.
๐ Oh Shit Git! by Katie Sylor-Miller- Light-hearted troubleshooting guide for when things inevitably go wrong!
๐ Step-by-step guide to contributing on GitHub by Kevin Markham- Detailed guide on how to contribute to open source software projects using git and Github.
Spark
๐ก PySpark Cheat Sheet by Kevin Schaich๐ Mastering Spark with R by Javier Luraschi, Kevin Kuo and Edgar Ruizโถ๏ธ R & Spark: How to Analyze Data Using RStudioโs Sparklyr by Nathan Stephens๐ A Gentle Introduction to Spark by DataBricks
SQL
๐ /๐ The SQL Tutorial for Data Analysis by mode.com. Tutorials and interactive exercises teaching fundamentals of SQL.๐ SQLBolt: Learn SQL with simple, interactive exercises.๐ /๐ SQLZoo: SQL Tutorial. Wikibook with interactive exercises.๐ Intro to SQL: Querying and managing data by Khan Academy๐ LearnSQLOnline by Ron Reiter
Docker
๐ An Introduction to Docker for R Users by Colin Fay๐ R Docker tutorial by Jemma Stachelekโถ๏ธ Docker and Python: making them play nicely and securely for Data Science and ML by Tania Allard at PyCon 2020
Markdown, LaTeX and publishing
๐ R Markdown: The Definitive Guide by Yihui Xie, J. J. Allaire, Garrett Grolemund๐ bookdown: Authoring Books and Technical Documents with R Markdown by Yihui Xie๐ The Not So Short Introduction to LaTeX 2ฮต by Tobias Oetiker๐ LaTeX for Beginners by UoE IS Services
Machine Learning
Theory
๐ The Elements of Statistical Learning: Data Mining, Inference, and Prediction by Trevor Hastie, Robert Tibshirani and Jerome Friedman (2017)๐ Computer Age Statistical Inference: Algorithms, Evidence and Data Science by Bradley Efron and Trevor Hastie (2017).- A statistical approach to data science and machine learning.
๐ Mathematics for Machine Learning by Marc Peter Deisenroth, A. Aldo Faisal, Cheng Soon Ong- Covers the underpinning theory to many ML algorithms, a useful reference for practitioners.
๐ distill.pub by multiple contributors, edited by Shan Carter and Chris Olah- Online scientific journal publishing very high-quality, interactive articles on ML. On hiatus as of 2021.
๐ Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman- Book based on Stanford Computer Science course CS246: Mining Massive Datasets.
๐ Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani- ISLR is still one of the most important books for getting started in practical ML.
Interpretability
๐ Interpretable Machine Learning: A Guide for Making Black Box Models Explainable by Christoph Molnar (2022)- A highly practical introduction to IML, required reading if you are new to the topic.
โ Awesome: Machine Learning Interpretability by Patrick Hall- A big list of MLI resources with >2.5k github stars.
Guides, tutorials and courses
๐ Machine Learning Crash Course with TensorFlow APIs by Google- fast-paced, practical introduction to machine learning, with video lectures, real-world case studies, and hands-on practice exercises.
๐ Tidymodels Tutorials by RStudio- Variety to beginners guides to solving common ML tasks with Rโs tidymodels.
๐ Supervised Machine Learning Case Studies in R by Julia Silge.- Easy-to-follow in-browser beginnerโs guide to using Rโs tidymodels for practical ML.
๐ /๐ฎ Introduction to machine learning with scikit-learn by Justin Markham- Bite size study videos and python notebooks by Justin Markhamโs Data School.
๐ scikit-learn User Guide by scikit-learn- sci-kit learnโs documentation are very thorough and a great standalone learning resource!
๐ Introduction to Machine Learning for Coders by Jeremy Howard.- 24 hours of videos and supporting notes from a Kaggle superstar.
Data Science Practice
Software development
๐ Software development skills for data scientists by Trey Causey๐ Hidden Technical Debt in Machine Learning Systems๐ How rOpenSci uses Code Review to Promote Reproducible Science by Noam Ross, Scott Chamberlain, Karthik Ram and Maรซlle Salmon๐ Best Practices for Computational Science: Software Infrastructure and Environments for Reproducible and Extensible Research by Victoria Stodden and Sheila Miguez๐ Journalism as a Professional Model for Data Science by Brian C. Keegan๐ Cookiecutter Data Science by drivendata
Hiring and building teams
๐ The Care and Feeding of Data Scientists: How to Build, Manage and Retain a Data Science Team by Michelangelo DโAgostino and Katie Malone๐ง The Care and Feeding of Data Scientists: Becoming a Data Science Manager on Linear Digressions podcast by Katie Malone and Ben Jaffe๐ Models for integrating data science teams within companies by Pardis Noorzad๐ง Building Effective Data Science Teams with Kobi Abayomi, Gregory Berg, Elaine McVey, Jacqueline Nolis, Nasir Uddin and Julia Silge๐ Building a data team at a mid-stage startup: a short story by Erik Bernhardsson๐ Hiring a data scientist by Mikhail Popov, Wikimedia
Agile data science
๐ Agile Data Science with R: A workflow by Edwin Thoen๐ Data Science and Agile (What works, and what doesnโt) by Eugene Yan๐ Data Science Best Practices: Run your data science team like an engineering team by Leonard Austin๐ Organizing machine learning projects: project management guidelines by Jeremy Jordan
Ethics and fairness
๐ Ethics of Artificial Intelligence and Robotics by Stanford Encyclopedia of Philosophy๐ The Responsible Machine Learning Principles: A practical framework to develop AI responsibly by The Institute for Ethical AI & Machine Learning๐ A Code of Ethics for Data Science by DJ Patil๐ The Ethical Data Scientist by Cathy Oโ Neil๐ An ethics checklist for data scientists by drivendata๐ Fairness and machine learning: Limitations and Opportunities by Solon Barocas, Moritz Hardt, Arvind Narayanan๐ Practical Data Ethics by fast.ai
MLOps
๐ MLOps: Continuous delivery and automation pipelines in machine learning by Google Cloud๐ Using GitHub Actions for MLOps & Data Science by Hamel Husain, The Github Blog๐ Continuous Delivery for Machine Learning: Automating the end-to-end lifecycle of Machine Learning applications by Danilo Sato, Arif Wider and Christoph Windheuser๐ Monitoring Machine Learning Models in Production: A Comprehensive Guide by Christopher Samiullah๐ What are Azure Machine Learning pipelines? by Microsoft๐ Getting started with Kubeflow Pipelines by Amy Unruh, Google Cloud๐ Continuous Machine Learning (CML) is CI/CD for Machine Learning Projects by DVC.org๐ Data Science Workflows by David Neuzerling๐ Monitoring Machine Learning Models in Production A Comprehensive Guide by Christopher Samiullah
ML Platforms
๐ The problem with AI developer tools for enterprises (and what IKEA has to do with it) by Clemens Mewald๐ 5 Reasons Organizations Shouldnโt Build Their Own AI Platforms by dataiku
Style Guides
๐ Udacity Git Commit Message Style Guide by Udacity๐ The tidyverse style guide by Hadley Wickham๐ The Google R Style Guide by Google๐ The Google Python Style Guide by Google๐ PEP 8 โ Style Guide for Python Code by Guido van Rossum, Barry Warsaw, Nick Coghlan
Developing interactive applications
๐ฎ /๐ Learn Shiny by RStudio๐ A gRadual intRoduction to Shiny by Ted Laderas and Jessica Minnier๐ Interactive web-based data visualization with R, plotly, and shiny by Carson Sievert๐ Dashboards by Yihui Xie, J. J. Allaire, Garrett Grolemund. Chapter 5 from โR Markdown: The Definitive Guideโ.๐ Leaflet for R by RStudio๐ Dash User Guide by Plotly๐ Getting Started with Streamlit by streamlit
Visualisation
๐ Fundamentals of Data Visualization by Claus O. Wilke๐ ggplot2: Elegant Graphics for Data Analysis by Hadley Wickham๐ 3D Mapping and Visualization with R and Rayshader by Tyler Morgan-Wall
Time series analysis
๐ Forecasting: Principles and Practice by Rob J Hyndman and George Athanasopoulos๐ 11 Classical Time Series Forecasting Methods in Python (Cheat Sheet) by Jason Brownlee
Generalised Additive Modelling (GAMs)
๐ GAMs in R by Noam Ross Interactive course introducing Generalised Additive Models (GAMs).๐ Resources for Learning About and Using GAMs in R by Noam Ross
Statistics
๐ Statistical Inference via Data Science: A Modern Dive into R and the tidyverse by Chester Ismay and Albert Y. Kim๐ Think Stats Exploratory Data Analysis in Python by Allen B. Downey๐ Learning statistics with R: A tutorial for psychology students and other beginners Danielle Navarro๐ Probabilistic Programming & Bayesian Methods for Hackers by Cameron Davidson-Pilon๐ From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science by Norm Matloff๐ Theory of Statistics by James E. Gentle๐ Core Statistics by Simon Wood
Spatial analysis
๐ Geocomputation with R by Robin Lovelace, Jakub Nowosad, Jannes Muenchow๐ Spatial Data Science by Edzer Pebesma and Roger Bivand๐ Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny by Paula Moraga
Data Science community groups
Python groups
๐ฅ PyData Meetup Groups๐ฅ PyLadies by PyLadies
R groups
๐ฅ Directory of R User Groups by Jumping Rivers๐ฅ Complete list of R-Ladies groups by R-Ladies Global.๐ฅ R for Data Science Online Learning Community- The R4DS Online Learning Community is a community of R learners at all skill levels working together to improve their skills.
๐ฅ Tidy Tuesday- A weekly podcast and community activity brought to you by the R4DS Online Learning Community.
๐ฅ SatRdays SatRdays +R-focused conferences that are held on Saturdays.
Natural language processing
๐ Text Mining with R: A Tidy Approach by Julia Silge and David Robinson๐ Advanced NLP with SpaCy by Ines Montani๐ 100 Must read papers in NLP by Masato Hagiwara๐ Stanford CS 124: From Languages to Information by Dan Jurafsky๐ Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit by Steven Bird, Ewan Klein, and Edward Loper.๐ A Code-First Intro to Natural Language Processing by fast.ai- The course is taught in Python with Jupyter Notebooks, using libraries such as sklearn, nltk, pytorch, and fastai.
๐ Speech and Language Processing by Dan Jurafsky and James H. Martinโถ๏ธ BERT Research Series by Chris McCormick