• Stars
    star
    613
  • Rank 73,175 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created over 4 years ago
  • Updated almost 2 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Engineering with Python, published by Packt

Learn Data Engineering with Python

Learn Amazon SageMaker

This is the code repository for Data Engineering with Python, published by Packt.

Work with massive datasets to design data models and automate data pipelines using Python

What is this book about?

Data engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python.

The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You’ll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You’ll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you’ll build architectures on which you’ll learn how to deploy data pipelines.

By the end of this Python book, you’ll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.

This book covers the following exciting features:

  • Understand how data engineering supports data science workflows
  • Discover how to extract data from files and databases and then clean, transform, and enrich it
  • Configure processors for handling different file formats as well as both relational and NoSQL databases
  • Find out how to implement a data pipeline and dashboard to visualize results
  • Use staging and validation to check data before landing in the warehouse
  • Build real-time pipelines with staging areas that perform validation and handle failures
  • Get to grips with deploying pipelines in the production environment

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

import datetime as dt
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
import pandas as pd

Following is what you need for this book: This book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.

With the following software and hardware list you can run all code files present in the book (Chapter 2-15).

Software and Hardware List

Chapter Software required OS required
2 - 15 Python 3.x, Spark 3.x, Nifi 1.x, PostgreSQL 13.x, Elasticsearch 7.x, Kibana 7.x, Apache Kafka 2.x Windows, Mac OS X, and Linux (Any)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Paul Crickard is the author of Leaflet.js Essentials and co-author of Mastering Geospatial Analysis with Python and the Chief Information Officer at the Second Judicial District Attorney’s Office in Albuquerque, New Mexico.

With a Master's degree in Political Science and a background in Community, and Regional Planning, he combines rigorous social science theory and techniques to technology projects. He has Presented at the New Mexico Big Data and Analytics Summit and the ExperienceIT NM Conference. He has given talks on data to the New Mexico Big Data Working Group, Sandia National Labs, and the New Mexico Geographic Information Council.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781839214189

More Repositories

1

Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
Python
2,831
star
2

The-Kaggle-Book

Code Repository for The Kaggle Book, Published by Packt Publishing
Jupyter Notebook
2,144
star
3

Advanced-Deep-Learning-with-Keras

Advanced Deep Learning with Keras, published by Packt
Python
1,790
star
4

Hands-On-Machine-Learning-for-Algorithmic-Trading

Hands-On Machine Learning for Algorithmic Trading, published by Packt
Jupyter Notebook
1,424
star
5

Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original

Machine Learning for Algorithmic Trading, Second Edition - published by Packt
Jupyter Notebook
1,207
star
6

Node.js-Design-Patterns-Third-Edition

Node.js Design Patterns Third Edition, published by Packt
JavaScript
1,162
star
7

Deep-Reinforcement-Learning-Hands-On-Second-Edition

Deep-Reinforcement-Learning-Hands-On-Second-Edition, published by Packt
Jupyter Notebook
1,122
star
8

Deep-Learning-with-Keras

Code repository for Deep Learning with Keras published by Packt
Jupyter Notebook
1,047
star
9

Learning-JavaScript-Data-Structures-and-Algorithms-Third-Edition

Learning JavaScript Data Structures and Algorithms (Third Edition), published by Packt
JavaScript
1,037
star
10

Learn-CUDA-Programming

Learn CUDA Programming, published by Packt
Cuda
975
star
11

40-Algorithms-Every-Programmer-Should-Know

40 Algorithms Every Programmer Should Know, published by Packt
Python
949
star
12

3D-Graphics-Rendering-Cookbook

3D Graphics Rendering Cookbook, published by Packt.
C++
847
star
13

Vulkan-Cookbook

Code repository for Vulkan Cookbook by Packt
C++
823
star
14

Linux-Kernel-Programming

Linux Kernel Programming, published by Packt
Makefile
819
star
15

Django-4-by-example

Django 4 by example (4th Edition) published by Packt
Python
800
star
16

Learn-Algorithmic-Trading

Learn Algorithmic Trading, Published by Packt
Python
793
star
17

Causal-Inference-and-Discovery-in-Python

Causal Inference and Discovery in Python by Packt Publishing
Jupyter Notebook
734
star
18

Django-3-by-Example

Django 3 by Example (3rd Edition) published by Packt
Python
715
star
19

Python-for-Finance-Cookbook

Python for Finance Cookbook, published by Packt
Jupyter Notebook
709
star
20

Node.js_Design_Patterns_Second_Edition_Code

Code repository for Node.js Design Patterns Second Edition, published by Packt
JavaScript
706
star
21

Modern-Computer-Vision-with-PyTorch

Modern Computer Vision with PyTorch, published by Packt
Jupyter Notebook
704
star
22

Hands-On-Graph-Neural-Networks-Using-Python

Hands-On Graph Neural Networks Using Python, published by Packt
Jupyter Notebook
690
star
23

Hands-on-Exploratory-Data-Analysis-with-Python

Hands-on Exploratory Data Analysis with Python, published by Packt
Jupyter Notebook
676
star
24

Hands-On-Domain-Driven-Design-with-.NET-Core

Hands-On Domain-Driven Design with .NET Core, published by Packt
C#
636
star
25

Pandas-Cookbook

Pandas Cookbook, published by Packt
Jupyter Notebook
623
star
26

Java-Coding-Problems

Java Coding Problems, published by Packt
Java
615
star
27

Learn-Data-Structures-and-Algorithms-with-Golang

Learn Data Structures and Algorithms with Golang, published by Packt
Go
604
star
28

Learning-OpenCV-4-Computer-Vision-with-Python-Third-Edition

Learning OpenCV 4 Computer Vision with Python 3 – Third Edition, published by Packt
Python
593
star
29

Hands-On-GPU-Accelerated-Computer-Vision-with-OpenCV-and-CUDA

Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt
C++
593
star
30

Mastering-Embedded-Linux-Programming-Third-Edition

Mastering Embedded Linux Programming Third Edition, published by Packt
C
572
star
31

Django-2-by-Example

Django 2 by Example (2nd Edition) published by Packt
Python
567
star
32

TensorFlow-Machine-Learning-Cookbook

Code repository for TensorFlow Machine Learning Cookbook by Packt
Python
552
star
33

Transformers-for-Natural-Language-Processing

Transformers for Natural Language Processing, published by Packt
Jupyter Notebook
547
star
34

Clean-Code-in-Python

Clean Code in Python, published by Packt
Python
541
star
35

Mastering-Graphics-Programming-with-Vulkan

C++
539
star
36

Mastering-OpenCV-4-Third-Edition

Mastering OpenCV 4, Third Edition, published by Packt publishing
Assembly
531
star
37

Cpp17-STL-Cookbook

Code files by Packt
C++
524
star
38

Hands-On-Data-Structures-and-Algorithms-with-Rust

Hands-On Data Structures and Algorithms with Rust, published by Packt
Rust
504
star
39

Software-Architecture-with-Cpp

Software Architecture with C++, published by Packt
C++
493
star
40

Getting-Started-with-TensorFlow

Getting Started with TensorFlow, published by Packt
Python
491
star
41

Linux-Device-Drivers-Development

Linux Device Drivers Development, published by Packt
C
482
star
42

Python-Machine-Learning-Second-Edition

Python Machine Learning - Second Edition, published by Packt
Jupyter Notebook
477
star
43

Modern-CMake-for-Cpp

Modern CMake for C++, published by Packt
Dockerfile
472
star
44

Learn-LLVM-12

Learn LLVM 12, published by Packt
C++
471
star
45

Python-3-Object-Oriented-Programming-Third-Edition

Python 3 Object-Oriented Programming – Third Edition, published by Packt
Python
469
star
46

Full-Stack-React-Projects-Second-Edition

Full-Stack React Projects - Second Edition, published by Packt
JavaScript
463
star
47

Hands-On-Microservices-with-Spring-Boot-and-Spring-Cloud

Hands-On Microservices with Spring Boot and Spring Cloud, published by Packt
Java
459
star
48

Python-Feature-Engineering-Cookbook

Python Feature Engineering Cookbook, published by Packt
Jupyter Notebook
458
star
49

Deep-Learning-with-PyTorch

Deep Learning with PyTorch, published by Packt
Jupyter Notebook
451
star
50

Interpretable-Machine-Learning-with-Python

Interpretable Machine Learning with Python, published by Packt
Jupyter Notebook
439
star
51

Mastering-Python-for-Finance-Second-Edition

Mastering Python for Finance – Second Edition, published by Packt
Jupyter Notebook
432
star
52

Modern-Time-Series-Forecasting-with-Python

Modern Time Series Forecasting with Python, published by Packt
Jupyter Notebook
428
star
53

Hands-On-Machine-Learning-with-CPP

Hands-On Machine Learning with C++, published by Packt
C++
425
star
54

Hands-On-Software-Engineering-with-Golang

Hands-On Software Engineering with Golang, published by Packt
Go
425
star
55

Python-Machine-Learning-Cookbook

Code files for Python-Machine-Learning-Cookbook
Python
416
star
56

Artificial-Intelligence-with-Python

Code repository for Artificial Intelligence with Python, published by Packt
Python
408
star
57

Mastering-Python-Design-Patterns-Second-Edition

Mastering-Python-Design-Patterns-Second-Edition, published by Packt
Python
404
star
58

Go-Design-Patterns

This is the code repository for the book, Go Design Patterns, published by Packt
Go
399
star
59

Python-Algorithmic-Trading-Cookbook

Python Algorithmic Trading Cookbook, published by Packt
Jupyter Notebook
395
star
60

Mastering-Go-Second-Edition

Mastering Go Second Edition, published by Packt
Go
394
star
61

Learn-OpenCV-4-By-Building-Projects-Second-Edition

Learn OpenCV 4 By Building Projects, Second Edition, published by Packt
C++
378
star
62

Hands-On-Computer-Vision-with-TensorFlow-2

Hands-On Computer Vision with TensorFlow 2, published by Packt
Jupyter Notebook
366
star
63

Hands-On-Design-Patterns-with-CPP

Hands-On Design Patterns with C++, published by Packt
C
362
star
64

Mastering-OpenCV-4-with-Python

Mastering OpenCV 4 with Python, published by Packt
Python
362
star
65

Hands-On-Microservices-with-Rust

Hands-On Microservices with Rust 2018, published by Packt
Rust
357
star
66

Machine-Learning-for-Finance

Machine Learning for Finance, published by Packt
Jupyter Notebook
355
star
67

Python-Machine-Learning-Blueprints

Code repository for Python Machine Learning Blueprints, published by Packt
Jupyter Notebook
349
star
68

Practical-Time-Series-Analysis

Practical Time-Series Analysis, published by Packt
Jupyter Notebook
345
star
69

Machine-Learning-for-Algorithmic-Trading-Bots-with-Python

Jupyter Notebook
337
star
70

Python-Artificial-Intelligence-Projects-for-Beginners

Python Artificial Intelligence Projects for Beginners, published by Packt
Jupyter Notebook
337
star
71

Effective-Python-Penetration-Testing

Effective Python Penetration Testing by Packt Publishing
Python
334
star
72

Micro-State-Management-with-React-Hooks

Micro State Management with React Hooks, published by Packt
TypeScript
329
star
73

Event-Driven-Architecture-in-Golang

Event-Driven Architecture in Golang, published by Packt
Go
329
star
74

The-Azure-Cloud-Native-Architecture-Mapbook

The Azure Cloud Native Architecture Mapbook, published by Packt
C#
324
star
75

Hands-On-Intelligent-Agents-with-OpenAI-Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
Python
322
star
76

Hands-On-Reactive-Programming-in-Spring-5

Hands-On Reactive Programming in Spring 5, published by Packt
Java
320
star
77

Python-GUI-Programming-Cookbook-Second-Edition

Python GUI Programming Cookbook, Second Edition, published by Packt
Python
316
star
78

Godot-Game-Engine-Projects

Godot Game Engine Projects, published by Packt
GDScript
315
star
79

Computer-Vision-with-OpenCV-3-and-Qt5

Computer Vision with OpenCV 3 and Qt5, published by Packt
C++
314
star
80

Deep-Learning-with-TensorFlow-2-and-Keras

Deep Learning with TensorFlow 2 and Keras, published by Packt
Jupyter Notebook
312
star
81

Mastering-Transformers

Mastering Transformers, published by Packt
Jupyter Notebook
307
star
82

OpenGL-4-Shading-Language-Cookbook-Third-Edition

OpenGL 4 Shading Language Cookbook - Third Edition, published by Packt
C
307
star
83

Building-Data-Science-Applications-with-FastAPI

Building Data Science Applications with FastAPI, Published by Packt
Python
306
star
84

PyTorch-Computer-Vision-Cookbook

PyTorch Computer Vision Cookbook, Published by Packt
Jupyter Notebook
306
star
85

Hands-on-Python-for-Finance

Hands-on Python for Finance published by Packt.
Jupyter Notebook
304
star
86

Learning-PySpark

Code repository for Learning PySpark by Packt
Jupyter Notebook
303
star
87

Neural-Network-Projects-with-Python

Neural Network Projects with Python, Published by Packt
Python
303
star
88

Building-Python-Microservices-with-FastAPI

Building Python Microservices with FastAPI, published by Packt
Python
301
star
89

Machine-Learning-for-Cybersecurity-Cookbook

Machine Learning for Cybersecurity Cookbook, published by Packt
Jupyter Notebook
301
star
90

Mastering-Machine-Learning-for-Penetration-Testing

Mastering Machine Learning for Penetration Testing, published by Packt
Python
298
star
91

Learning-Vuejs-2

This is the code repository for Learning Vue.js 2, published by Packt.
JavaScript
296
star
92

CPP-Data-Structures-and-Algorithms

C++ Data Structures and Algorithms, published by Packt
C++
295
star
93

Full-Stack-React-TypeScript-and-Node

Full-Stack React, TypeScript, and Node, published by Packt
TypeScript
289
star
94

Bioinformatics-with-Python-Cookbook-Second-Edition

Bioinformatics with Python Cookbook Second Edition, published by Packt
OpenEdge ABL
287
star
95

Kotlin-Design-Patterns-and-Best-Practices

Kotlin Design Patterns and Best Practices - Second Edition, published by Packt
Kotlin
285
star
96

Pandas-Cookbook-Second-Edition

Pandas Cookbook Second Edition, published by Packt
Jupyter Notebook
283
star
97

The-Modern-Cpp-Challenge

The Modern C++ Challenge, published by Packt
C
276
star
98

Network-Programming-with-Rust

Network Programming with Rust, published by Packt
Rust
275
star
99

Full-Stack-React-Projects

Full-Stack React Projects, published by Packt
JavaScript
274
star
100

JavaScript-from-Beginner-to-Professional

JavaScript from Beginner to Professional, Published by Packt
HTML
274
star