How To Become a Data Engineer
Useful articles
- The AI Hierarchy of Needs
- The Rise of Data Engineer
- The Downfall of the Data Engineer
- A Beginner’s Guide to Data Engineering
- Functional Data Engineering — a modern paradigm for batch data processing
- How to become a Data Engineer Ru, En
- Introduction to Apache Airflow Ru, En
Talks
- Data Engineering Principles - Build frameworks not pipelines by Gatis Seja
- Functional Data Engineering - A Set of Best Practices by Maxime Beauchemin
- Advanced Data Engineering Patterns with Apache Airflow by Maxime Beauchemin
- Creating a Data Engineering Culture by Jesse Anderson
- Streaming 101: Hello Streaming by Josh Fischer
Algorithms & Data Structures
- Algorithmic Toolbox in Russian
- Data Structures in Russian
- Data Structures & Algorithms Specialization on Coursera
- Algorithms Specialization from Stanford on Coursera
SQL
- Comprehensive SQL Tutorial by Mode Analytics
- SQL Practice on Leetcode
- Modern SQL a website about modern SQL syntax
- Introduction to Window Functions En, Ru
Programming
- Scala School by Twitter
- Fluent Python intermediate level book about Python
- Intro to Scala in Russian on Stepik by Tinkoff Bank
- The Hitchhiker’s Guide to Python by Kenneth Reitz & Tanya Schlusser
- Learn Python 3 The Hard Way by Zed A. Shaw
Databases
- Intro to Database Systems by Carnegie Mellon University
- Advanced Database Systems by Carnegie Mellon University
- On Disk IO
Distributed Systems
- Distributed systems for fun and profit by Mikito Takada
- Distributed Systems by Maarten van Steen & Andrew S. Tanenbaum
- CSE138: Distributed Systems by Lindsey Kuper
- CS 436: Distributed Computer Systems by University of Waterloo
- MIT 6.824: Distributed Systems by Robert Morris from MIT
- Distributed consensus reading list maintained by Heidi Howard from University of Cambridge
Books
- Design Data-Intensive Applications by Martin Kleppmann
- Introduction to Algorithms by Thomas Cormen
- The Data Warehouse Toolkit: The Definitive Guide to Dimensional Modeling
- Star Schema The Complete Reference
- Database Internals: A Deep Dive into How Distributed Data Systems Work
- Streaming Systems: The What, Where, When, and How of Large-Scale Data Processing
- A Philosophy of Software Design
- Grokking Streaming Systems by Josh Fischer & Ning Wang
- Guide to High Performance Distributed Computing by K.G. Srinivasa & Anil Kumar Muppalla
- Data Pipelines with Apache Airflow by Bas P. Harenslak and Julian Rutger de Ruiter
Courses
- Data Engineering on Google Cloud Platform Specialization by Google
- Data Engineer Nanodegree by Udacity
- Data Engineering with Python by DataCamp
Blogs
- Martin Kleppmann author of Designing Data-Intensive Application
- BaseDS by Vaidehi Joshi about Distributed Systems
Tools
- Apache Airflow is a platform to programmatically author, schedule and monitor workflows in Python
- Apache Spark is a unified analytics engine for large-scale data processing
- Apache Kafka is a distributed streaming platform
- Luigi is a Python package that helps you build complex pipelines of batch jobs.
- Dagster.io is a system for building modern data applications.
- Prefect includes everything you need to create and run data applications.
- Metaflow build and manage real-life data science projects with ease
- lakeFS build repeatable, atomic and versioned data lake operations – from complex ETL jobs to data science and analytics.
Cloud Platforms
Communities
- data Engineering - telegram chat about data engineering
- Data Engineering Subreddit - subreddit about data engineering
Data Engineering Jobs
Other
Newsletters & Digests
- DataEng Telegram channel - Telegram channel about data engineering (rus/eng)
- Data Engineering Weekly
- SF Data Weekly - A weekly email of useful links for people interested in building data platforms
- Data Elixir - Data Elixir is an email newsletter that keeps you on top of the tools and trends in Data Science.
- Data Governance, Privacy and Security - DbAdmin News is a news letter on the technology behind Data Governance, Security and Privacy