Data Analysis at Scale in the Cloud
Course taught at Duke MIDS, Spring 2020-2022 by Noah Gift.
- This is the course syllabus.
- These are the projects in the course
- This the week by week calendar
- This is the rubric for grading assignments
- This is the grading for the course
- This is the FAQ
- A complete online book with screencast videos is available here.
- Coursera Course, Building Cloud Computing Solutions at Scale Specialization, can be found here: https://www.coursera.org/specializations/building-cloud-computing-solutions-at-scale
Guest Lecture 2022-Async
GPT 3:
- Book: https://learning.oreilly.com/library/view/gpt-3/9781098113612/
- Interview: https://learning.oreilly.com/videos/52-weeks-of/021822022VIDEOPAIML/
- Shubham Saboo
- Sandra Kublik
Prequel Material
These resources could be helpful before starting this course.
Duke/Coursera: Foundations of Data Engineering Course (Launching early 2022)
Course1: Python and Pandas for Data Engineering
Course2: Linux and Bash for Data Engineering
Github Repos for Projects in Course
Week1: Using Linux
Week2: Using Bash
- Lesson 1: Create and Use .bashrc
- Lesson 2: Sourcing shell variables from a script
- Lesson3: Using stdout and stdin
Week3: Building Bash Scripts
- Lesson 1: Build a for loop in Bash
- Lesson 2: Truncate large files with Bash
- Lesson 3: Building a command-line tool for data processing
- Lesson 4: Build Bash CLI with options
Week4: Composing File and Data Management Solutions with Linux
- Lesson 1: Understand the search commands
- Lesson 2: Setting permissions
- Lesson 3: Using regex to process text from file
- Lesson 4: Search the filesystem with find
Course3: Python and SQL for Data Engineering
Course4: Building Data Engineering Solutions with Python for Web Applications, Command-Line Tools and Notebooks
Sequel Material
These resources could be helpful after starting this course.
Duke/Coursera: Applied Data Engineering Course (Launching late 2022)
Github Repos Referenced Duke Coursera Course
Course 1: Cloud Computing Foundations
- Practice Markdown
- Github Actions-Pytest
- Google App Engine Continuous Delivery
- Hello World Flask
- Hugo Continuous Delivery on AWS
Course 2: Cloud Computing Building Blocks
- Lint Dockerfile
- [Flask Change Microservice]
Lecture Topics:
Getting Started: [Week1]
Cloud Computing Foundations: [Week2]
Virtualization and Containers: [Week3 & Week 4]
Challenges and Opportunities in Distributed Computing: [Week 5 & Week 6]
Cloud Storage [Week 7 & Week 8]
Serverless [Week 9 & Week 10]
MLOps, Big Data and Edge Computer Vision [Week 11 & Week 12 & Week 13]
General
Student Example Projects
A practical guide to Data Science, Machine Learning Engineering and Data Engineering
Read Cloud Computing for Data Book
Free book Developing-on-AWS-with-CSharp
Next Steps: Take Coursera MLOps Course
- Take the Specialization
- Cloud Computing Foundations
- Cloud Virtualization, Containers and APIs
- Cloud Data Engineering
- Cloud Machine Learning Engineering and MLOps
Text and Code License
The text and code content of notebooks and documents is released under the CC-BY-NC-ND license