Python Computing for Data Science
A Graduate Seminar Course at UC Berkeley (AY 250)
Campbell Hall: Monday 4:10 - 7:00 PM SPRING 2022
Synopsis
Python has become the de facto superglue language for modern scientific computing. In this course we will learn Pythonic interactions with databases, imaging processing, advanced statistical and numerical packages, web frameworks, machine-learning, and parallelism. Each week will involve lectures and coding projects. In the final capstone project, students will build a working codebase useful for their own research domain.
This class is for any student working in a quantitative discipline and with familiarity with Python. Those who completed the Python Bootcamp or equivalent will be eligible. You should follow the steps to install the Anaconda 3-2021-* distribution as well as git
.
Course Schedule
Date | Content | Reading | Leader |
---|---|---|---|
Jan 24 Online only | Numpy, Scipy, & Pandas |
- scipy ยงยง 1.3, 1.5, 2.2 - numpy - skim chap 4/5 of McKinney |
Josh |
Jan 31 | Data visualization (Matplotlib, Bokeh, Altair) | - Skim Tufte's Visualization book - colormap talk (Scipy 2015) |
Josh |
Feb 7 | Application building and Testing | None | Josh |
Feb 14 | Parallelism (asyncio, dask, ray, jax) | None | Josh |
Feb 21 | Holiday (no class) | ||
Feb 28 | Database interaction (sqlite, postgres, SQLAlchemy), Large datasets (xarray, HDF5) |
None | Josh |
Mar 7 | Machine Learning I (sklearn: regression, classification; dask-learn, auto-ml) | None | Josh |
Mar |
Machine Learning II (keras [tensorflow]) | Deep Learning with Keras | Josh |
Mar 21 | Spring Break | ||
Apr 1 Friday 10-1pm |
Web frameworks & RESTful APIs, Flask | None | Josh |
Apr 4 | No lecture | ||
Apr 11 | Bayesian programming & Symbolic math | Probabalistic Programming eBook install: pip install pymc3 |
Josh |
Apr 18 | Image processing (OpenCV, skimage) | None | Stefan van der Walt |
Apr 25 | Speeding it up (Numba, Cython, wrapping legacy code) | None | Josh |
Onward | final project work |
Useful Books
- Elegant Scipy (UC Berkeley Library link)
Sidebar Concepts
Throughout these lectures we will be peppering in sidebar knowledge concepts:
- Jupyter & JuypterLab
- using git & github
- Docker
- Data science workflows
- reproducible research
- application building
- debugging
- testing
Workflow
Each Monday we will be introducing a reasonably self-contained topic with two back-to-back lectures. In between a short (~20 minute) breakout coding session will be conducted. Homeworks will require you to write a large (several hundred line) codebase.
Help sessions will be conducted interactively on the Piazza site for the course. There is also an in-person help session every TBD. Email Josh with any questions.
Contact
Email us at [email protected] or contact the professor directly ([email protected]). You can also contact the GSI, Ellianna Abrahams, at ([email protected]. Auditing is not permitted by the University but those wishing to sit in on a class or two should contact the professor before attending.