Michael's Data Science Curriculum
Michael's Data Science Curriculum by Michael A. Alcorn is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Table of Contents
- Math
- Statistics/Probability Theory
- Econometrics
- Algorithms
- Machine Learning/Artificial Intelligence
- Other Topics
Curriculum
- Michael's Guide to Becoming a Data Scientist
- Math
- Over the course of my math studies, I learned that the French have a reputation for teaching math a particular way from a young age, and I think I'm a fan.
- Calculus
- Textbooks
- Calculus, Vol. 1: One-Variable Calculus, with an Introduction to Linear Algebra (Apostol) - Apostol's textbooks are classics for a reason. They are a great introduction to calculus and do an excellent job of developing intuition for limits, derivatives, and integrals.
- Calculus, Vol. 2: Multi-Variable Calculus and Linear Algebra with Applications to Differential Equations and Probability (Apostol)
- Courses
- MA101: Single-Variable Calculus I (Saylor Academy) - good introduction to calculus. Provided links to different types of learning material from different sources, which I think encourages learning. Seeing/hearing different perspectives on the same material seems to help crystallize concepts often. Really delves into the "why" of differentiation and integration.
- MA102: Single-Variable Calculus II (Saylor Academy) - first two units were entirely redundant with the end of MA101. Unit 3 was fairly painful, with a lot of memorization of integration "tricks". Course felt like somewhat of a hodgepodge of different mathematical concepts without it being entirely clear why they were covered together (e.g., series, differential equations).
- 18.02: Multivariable Calculus (MIT) - enjoyed the first two parts, but I got lazy around double line integrals. The physical applications of concepts like curl and flux are interesting, but I was ready to move on to linear algebra. As an aside, MIT uses a two course sequence to teach calculus and I agree that's the way it should be done.
- Multivariable Calculus (Khan Academy) - Khan Academy produces extremely high quality content for a variety of subjects, and its offerings for multivariable calculus are no different.
- Textbooks
- Linear Algebra
- Textbooks
- Linear Algebra and Its Applications (Strang) - linear algebra is probably my favorite math course. There is a lot of geometric intuition that is really satisfying, and the fact that linear algebra has tons of applications today doesn't hurt. Strang is a great teacher and writer, and his texts are classics for a reason.
- Linear Algebra Done Right (Axler)
- Courses
- Coding the Matrix: Linear Algebra through Computer Science Applications (Coursera) - an excellent introduction to linear algebra that develops a deep understanding of the subject beyond just matrices and vectors.
- Textbooks
- Advanced
- These subjects are only necessary for those who want to achieve a deeper understanding of the math behind machine learning/probability.
- Differential Equations
- Textbooks
- Ordinary Differential Equations (Tenenbaum and Pollard)
- Matrix Differential Equations with Applications in Statistics and Econometrics (Magnus and Neudecker) - if someone was trying to learn differential equations quickly and knew some linear algebra, I would tell them to skip ordinary differential equations and to go straight to this book. Almost all interesting problems today involve many variables interacting in complex ways, so partial differential equations are the more relevant differential equations, and, in my opinion, they can be taught entirely independently of ODEs.
- Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry, and Engineering (Strogatz)
- Courses
- Introduction to Differential Equations (edX) - highly enjoyable three course sequence on differential equations. Differential equations was the only math subject that I didn't initially enjoy. I tried a number of different textbooks, but nothing was engaging me. Part of my problem was that there are a number of "tricks" involved in coming up with analytical solutions to differential equations that I didn't find particularly satisfying. I also didn't ever feel like I was gaining new insight or intuition when studying the subject in isolation, which is not how I felt about other math subjects. When I first made this curriculum, I suggested differential equations would be more enjoyable if it were taught in combination with a science - something like a "Differential Equations in Physics" course - especially since most differential equations theory was motivated by the study of a physical system in the first place (as many of the biographical sketches of scientists and mathematicians in the Tenenbaum and Pollard textbook allude to). Indeed, revisiting differential equations after seeing some of the concepts pop up in other courses made the subject much more interesting to me.
- Nonlinear Dynamics: Mathematical and Computational Approaches (Complexity Explorer) - an excellent course for developing some intuition for nonlinear dynamics. The only downside of the course is that there wasn't as much mathematical depth as I would've liked (by design, it seems), so it should be considered a complement to the Strogatz textbook.
- Textbooks
- Analysis
- Textbooks
- Principles of Mathematical Analysis (Rudin) - analysis delves into the "why" of math and does so by deriving the existence of various mathematical concepts (e.g., the real and complex numbers, differentiation) from very simple initial building blocks (i.e., sets, addition, multiplication, and limits). Like abstract algebra, it is an interesting subject, but it's probably only practically relevant to individuals pursuing math-y subjects at higher levels. Subjects that use some analysis for derivations/proofs will generally provide the necessary background.
- Textbooks
- Topology/Differential Geometry
- Textbooks
- Courses
- What is a Tensor? - an excellent introduction to differential geometry. Not super useful unless you're studying spacetime.
- Tensor Calculus, Multilinear Algebra and Differential Geometry
- Abstract Algebra
- Textbooks
- Algebra (Artin) - this was the first math subject I encountered where it was clear that it was mostly for people who wanted to pursue math at a higher level (just to clarify, the subject I'm discussing here is typically referred to as "abstract algebra", so it's not the same subject you learned in grade school). That is, there was very little that could be taken from this subject and applied in non-mathematical settings. With that being said, the subject is really interesting, and there are some applications in computer science (e.g., monads and monoids, which are concepts in category theory, also come up in functional programming).
- Abstract Algebra: Theory and Applications (Judson) (free!)
- Textbooks
- Statistics/Probability Theory
- General
- Textbooks
- All of Statistics: A Concise Course in Statistical Inference (Wasserman)
- OpenIntro Statistics (free!)
- Causal Inference (Hernán and Robins) (draft: free!)
- Forecasting: Principles and Practice (Hyndman and Athanasopoulos) (free!)
- Probability Theory: The Logic of Science (Jaynes) (draft: free!)
- Courses
- Textbooks
- Bayesian
- Textbooks
- Bayesian Data Analysis (Gelman et al.)
- Doing Bayesian Data Analysis: A Tutorial with R, JAGS, and Stan (Kruschke)
- Think Bayes (Downey) (free!)
- Probabilistic Programming & Bayesian Methods for Hackers (Davidson-Pilon) (free!)
- Data Analysis Using Regression and Multilevel-Hierarchical Models (Gelman and Hill)
- Textbooks
- Advanced
- Statistical Learning Theory
- Textbooks
- Statistical Learning Theory (Vapnik)
- Textbooks
- Theory of Probability
- Courses
- Statistical Learning Theory
- General
- Econometrics
- I used to have the Econometrics section below the Other section, but I decided to move it here because I've come to the conclusion that thinking like an economist is one of the most valuable skills a data scientist can have. Economists, like data scientists, often work with observational data, which can make estimating the causal effect of any particular intervention rather challenging. As a result, economists tend to be very careful when using models to investigate causality—a skill that is sometimes underrated by those in the data science community.
- Textbooks
- Mostly Harmless Econometrics: An Empiricist's Companion (Angrist and Pischke)
- Introductory Econometrics: A Modern Approach (Wooldridge)
- Econonmetric Analayis of Cross Section and Panel Data (Wooldridge)
- Econometric Analysis (Greene)
- Econometrics (Hayashi)
- Fundamental Mathematics of Mathematical Economics (Wainwright and Chang)
- Courses
- Applied Econometrics (New York University)
- Econometric Analysis of Panel Data (New York University)
- Undergraduate Course
- Graduate Course
- Mathematical Methods for Economic Theory (University of Toronto)
- Algorithms
- Textbooks
- Algorithms (Sedgewick and Wayne)
- Introduction to Algorithms (Cormen et al.)
- Courses
- Algorithms Specialization (Coursera)
- Introdcution to Algorithms (MIT)
- Bioinformatics Specialization (Coursera) - while the subject matter might not be relevant to everyone, I think most computer scientists would benefit from taking this course as it really gets you thinking about efficient algorithms in a real world context.
- Textbooks
- Machine Learning/Artificial Intelligence
- General
- Textbooks
- Pattern Recognition and Machine Learning (Bishop)
- Machine Learning: A Probabilistic Perspective (Murphy)
- An Introduction to Statistical Learning (James et al.)
- Machine Learning (Mitchell)
- Advanced Data Analysis from an Elementary Point of View (Shalizi) (draft: free!)
- The Elements of Statistical Learning (Hastie, Tibshirani, and Friedman) (free!)
- Artificial Intelligence: A Modern Approach (Russell and Norvig)
- Courses
- Machine Learning (Udacity)
- Machine Learning (Coursera)
- Learning From Data (edX) - warning: this one is more on the theoretical side.
- Artificial Intelligence for Robotics (Udacity)
- Artificial Intelligence (edX)
- Textbooks
- Natural Language Processing
- Textbooks
- Foundations of Statistical Natural Language Processing (Manning and Schütze)
- Speech and Language Processing (Jurafsky and Martin)
- Courses
- Natural Language Processing (Coursera) - no longer offered. Course notes available here.
- Natural Language Processing (Coursera) (different one) - no longer offered on Coursera, but can be found on the professor's website.
- CS224n: Natural Language Processing with Deep Learning (Stanford)
- Textbooks
- Advanced
- Deep Learning
- Textbooks
- Deep Learning (Goodfellow, Bengio, and Courville) (draft: free!)
- Neural Networks and Deep Learning (Nielsen) (free!)
- Courses
- Deep Learning Specialization (Coursera)
- Neural Networks for Machine Learning (Coursera)
- IFT6266 – H2015 Representation Learning (Université de Montréal)
- CS231n: Convolutional Neural Networks for Visual Recognition (Stanford)
- Textbooks
- Reinforcement Learning
- Textbooks
- Reinforcement Learning: An Introduction (Sutton and Barto) (draft: free!)
- Courses
- COMPM050/COMPGI13: Reinforcement Learning (University College London)
- Reinforcement Learning (Udacity)
- CS 294: Deep Reinforcement Learning (University of California, Berkeley)
- CS234: Reinforcement Learning (Stanford)
- Textbooks
- Probabilistic Graphical Models
- Textbooks
- Probabilistic Graphical Models: Principles and Techniques (Koller and Friedman)
- Courses
- Probabilistic Graphical Models (Coursera)
- Textbooks
- Deep Learning
- General
- Other Topics
- Big Data
- Textbooks
- Mining of Massive Datasets (Leskovec et al.) (free!)
- Courses
- Data Science and Engineering with Spark (edX)
- Introduction to Hadoop and MapReduce (Udacity)
- Algorithms for Big Data (Indiana University)
- CMPSCI 711: More Advanced Algorithms (University of Massachusetts)
- COSC 548: Streaming Algorithms (Georgetown University)
- Functional Programming in Scala Specialization (Coursera)
- Textbooks
- Social Networks and Game Theory
- Textbooks
- Networks, Crowds and Markets: Reasoning about a Highly Connected World (Easley and Kleinberg) (draft: free!)
- Courses
- Networks, Crowds and Markets (edX)
- Game Theory (Coursera)
- Textbooks
- Advanced
- Optimization
- Textbooks
- An Introduction to Numerical Analysis (Süli and Mayers) - my first applied math subject, it felt like a continuation of linear algebra (in fact, there was a fair amount of overlap). Probably not necessary unless you see yourself implementing numerical solvers in the future.
- Convex Optimization (Boyd and Vandenberghe) (free!)
- Courses
- Discrete Optimization (Coursera) - a challenging but extremely rewarding course.
- Convex Optimization (Stanford Online)
- Machine Learning 10-725: Convex Optimization (Carnegie Mellon University)
- Textbooks
- Information Theory
- Textbooks
- Elements of Information Theory (Cover and Thomas)
- Information Theory, Inference, and Learning Algorithms (MacKay) (free!)
- Textbooks
- Digital Signal Processing
- Textbooks
- Discrete-Time Signal Processing (Oppenheim and Schafer)
- Foundations of Signal Processing (Vetterli, Kovačević, and Goyal) (draft: free!)
- Courses
- Digital Signal Processing (Coursera)
- Textbooks
- Optimization
- Big Data