• Stars
    star
    537
  • Rank 79,501 (Top 2 %)
  • Language
    Python
  • License
    MIT License
  • Created almost 4 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Data Engineering with Python, published by Packt

Learn Data Engineering with Python

Learn Amazon SageMaker

This is the code repository for Data Engineering with Python, published by Packt.

Work with massive datasets to design data models and automate data pipelines using Python

What is this book about?

Data engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python.

The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You’ll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You’ll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you’ll build architectures on which you’ll learn how to deploy data pipelines.

By the end of this Python book, you’ll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.

This book covers the following exciting features:

  • Understand how data engineering supports data science workflows
  • Discover how to extract data from files and databases and then clean, transform, and enrich it
  • Configure processors for handling different file formats as well as both relational and NoSQL databases
  • Find out how to implement a data pipeline and dashboard to visualize results
  • Use staging and validation to check data before landing in the warehouse
  • Build real-time pipelines with staging areas that perform validation and handle failures
  • Get to grips with deploying pipelines in the production environment

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

import datetime as dt
from datetime import timedelta
from airflow import DAG
from airflow.operators.bash_operator import BashOperator
from airflow.operators.python_operator import PythonOperator
import pandas as pd

Following is what you need for this book: This book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.

With the following software and hardware list you can run all code files present in the book (Chapter 2-15).

Software and Hardware List

Chapter Software required OS required
2 - 15 Python 3.x, Spark 3.x, Nifi 1.x, PostgreSQL 13.x, Elasticsearch 7.x, Kibana 7.x, Apache Kafka 2.x Windows, Mac OS X, and Linux (Any)

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Paul Crickard is the author of Leaflet.js Essentials and co-author of Mastering Geospatial Analysis with Python and the Chief Information Officer at the Second Judicial District Attorney’s Office in Albuquerque, New Mexico.

With a Master's degree in Political Science and a background in Community, and Regional Planning, he combines rigorous social science theory and techniques to technology projects. He has Presented at the New Mexico Big Data and Analytics Summit and the ExperienceIT NM Conference. He has given talks on data to the New Mexico Big Data Working Group, Sandia National Labs, and the New Mexico Geographic Information Council.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781839214189

More Repositories

1

Deep-Reinforcement-Learning-Hands-On

Hands-on Deep Reinforcement Learning, published by Packt
Python
2,750
star
2

The-Kaggle-Book

Code Repository for The Kaggle Book, Published by Packt Publishing
Jupyter Notebook
2,056
star
3

Advanced-Deep-Learning-with-Keras

Advanced Deep Learning with Keras, published by Packt
Python
1,700
star
4

Hands-On-Machine-Learning-for-Algorithmic-Trading

Hands-On Machine Learning for Algorithmic Trading, published by Packt
Jupyter Notebook
1,280
star
5

Node.js-Design-Patterns-Third-Edition

Node.js Design Patterns Third Edition, published by Packt
JavaScript
1,162
star
6

Machine-Learning-for-Algorithmic-Trading-Second-Edition_Original

Machine Learning for Algorithmic Trading, Second Edition - published by Packt
Jupyter Notebook
1,083
star
7

Deep-Learning-with-Keras

Code repository for Deep Learning with Keras published by Packt
Jupyter Notebook
1,047
star
8

Deep-Reinforcement-Learning-Hands-On-Second-Edition

Deep-Reinforcement-Learning-Hands-On-Second-Edition, published by Packt
Jupyter Notebook
1,028
star
9

Learning-JavaScript-Data-Structures-and-Algorithms-Third-Edition

Learning JavaScript Data Structures and Algorithms (Third Edition), published by Packt
JavaScript
1,007
star
10

40-Algorithms-Every-Programmer-Should-Know

40 Algorithms Every Programmer Should Know, published by Packt
Python
910
star
11

Learn-CUDA-Programming

Learn CUDA Programming, published by Packt
Cuda
849
star
12

3D-Graphics-Rendering-Cookbook

3D Graphics Rendering Cookbook, published by Packt.
C++
847
star
13

Vulkan-Cookbook

Code repository for Vulkan Cookbook by Packt
C++
784
star
14

Linux-Kernel-Programming

Linux Kernel Programming, published by Packt
Makefile
759
star
15

Learn-Algorithmic-Trading

Learn Algorithmic Trading, Published by Packt
Python
730
star
16

Django-4-by-example

Django 4 by example (4th Edition) published by Packt
Python
718
star
17

Django-3-by-Example

Django 3 by Example (3rd Edition) published by Packt
Python
710
star
18

Node.js_Design_Patterns_Second_Edition_Code

Code repository for Node.js Design Patterns Second Edition, published by Packt
JavaScript
706
star
19

Python-for-Finance-Cookbook

Python for Finance Cookbook, published by Packt
Jupyter Notebook
665
star
20

Pandas-Cookbook

Pandas Cookbook, published by Packt
Jupyter Notebook
623
star
21

Hands-on-Exploratory-Data-Analysis-with-Python

Hands-on Exploratory Data Analysis with Python, published by Packt
Jupyter Notebook
619
star
22

Java-Coding-Problems

Java Coding Problems, published by Packt
Java
615
star
23

Hands-On-Domain-Driven-Design-with-.NET-Core

Hands-On Domain-Driven Design with .NET Core, published by Packt
C#
602
star
24

Modern-Computer-Vision-with-PyTorch

Modern Computer Vision with PyTorch, published by Packt
Jupyter Notebook
585
star
25

Hands-On-GPU-Accelerated-Computer-Vision-with-OpenCV-and-CUDA

Hands-On GPU Accelerated Computer Vision with OpenCV and CUDA, published by Packt
C++
584
star
26

Django-2-by-Example

Django 2 by Example (2nd Edition) published by Packt
Python
567
star
27

Learning-OpenCV-4-Computer-Vision-with-Python-Third-Edition

Learning OpenCV 4 Computer Vision with Python 3 – Third Edition, published by Packt
Python
562
star
28

Learn-Data-Structures-and-Algorithms-with-Golang

Learn Data Structures and Algorithms with Golang, published by Packt
Go
557
star
29

Causal-Inference-and-Discovery-in-Python

Causal Inference and Discovery in Python by Packt Publishing
Jupyter Notebook
555
star
30

TensorFlow-Machine-Learning-Cookbook

Code repository for TensorFlow Machine Learning Cookbook by Packt
Python
552
star
31

Transformers-for-Natural-Language-Processing

Transformers for Natural Language Processing, published by Packt
Jupyter Notebook
539
star
32

Mastering-OpenCV-4-Third-Edition

Mastering OpenCV 4, Third Edition, published by Packt publishing
Assembly
520
star
33

Cpp17-STL-Cookbook

Code files by Packt
C++
514
star
34

Clean-Code-in-Python

Clean Code in Python, published by Packt
Python
513
star
35

Hands-On-Graph-Neural-Networks-Using-Python

Hands-On Graph Neural Networks Using Python, published by Packt
Jupyter Notebook
500
star
36

Getting-Started-with-TensorFlow

Getting Started with TensorFlow, published by Packt
Python
491
star
37

Hands-On-Data-Structures-and-Algorithms-with-Rust

Hands-On Data Structures and Algorithms with Rust, published by Packt
Rust
486
star
38

Linux-Device-Drivers-Development

Linux Device Drivers Development, published by Packt
C
482
star
39

Python-Machine-Learning-Second-Edition

Python Machine Learning - Second Edition, published by Packt
Jupyter Notebook
477
star
40

Mastering-Graphics-Programming-with-Vulkan

C++
469
star
41

Learn-LLVM-12

Learn LLVM 12, published by Packt
C++
465
star
42

Mastering-Embedded-Linux-Programming-Third-Edition

Mastering Embedded Linux Programming Third Edition, published by Packt
C
460
star
43

Python-3-Object-Oriented-Programming-Third-Edition

Python 3 Object-Oriented Programming – Third Edition, published by Packt
Python
453
star
44

Hands-On-Microservices-with-Spring-Boot-and-Spring-Cloud

Hands-On Microservices with Spring Boot and Spring Cloud, published by Packt
Java
452
star
45

Software-Architecture-with-Cpp

Software Architecture with C++, published by Packt
C++
447
star
46

Full-Stack-React-Projects-Second-Edition

Full-Stack React Projects - Second Edition, published by Packt
JavaScript
445
star
47

Python-Feature-Engineering-Cookbook

Python Feature Engineering Cookbook, published by Packt
Jupyter Notebook
442
star
48

Deep-Learning-with-PyTorch

Deep Learning with PyTorch, published by Packt
Jupyter Notebook
437
star
49

Interpretable-Machine-Learning-with-Python

Interpretable Machine Learning with Python, published by Packt
Jupyter Notebook
423
star
50

Python-Machine-Learning-Cookbook

Code files for Python-Machine-Learning-Cookbook
Python
416
star
51

Modern-CMake-for-Cpp

Modern CMake for C++, published by Packt
Dockerfile
411
star
52

Artificial-Intelligence-with-Python

Code repository for Artificial Intelligence with Python, published by Packt
Python
408
star
53

Hands-On-Software-Engineering-with-Golang

Hands-On Software Engineering with Golang, published by Packt
Go
406
star
54

Mastering-Python-for-Finance-Second-Edition

Mastering Python for Finance – Second Edition, published by Packt
Jupyter Notebook
394
star
55

Go-Design-Patterns

This is the code repository for the book, Go Design Patterns, published by Packt
Go
394
star
56

Mastering-Python-Design-Patterns-Second-Edition

Mastering-Python-Design-Patterns-Second-Edition, published by Packt
Python
389
star
57

Mastering-Go-Second-Edition

Mastering Go Second Edition, published by Packt
Go
384
star
58

Hands-On-Machine-Learning-with-CPP

Hands-On Machine Learning with C++, published by Packt
C++
377
star
59

Learn-OpenCV-4-By-Building-Projects-Second-Edition

Learn OpenCV 4 By Building Projects, Second Edition, published by Packt
C++
367
star
60

Hands-On-Computer-Vision-with-TensorFlow-2

Hands-On Computer Vision with TensorFlow 2, published by Packt
Jupyter Notebook
366
star
61

Mastering-OpenCV-4-with-Python

Mastering OpenCV 4 with Python, published by Packt
Python
362
star
62

Hands-On-Microservices-with-Rust

Hands-On Microservices with Rust 2018, published by Packt
Rust
354
star
63

Hands-On-Design-Patterns-with-CPP

Hands-On Design Patterns with C++, published by Packt
C
353
star
64

Python-Machine-Learning-Blueprints

Code repository for Python Machine Learning Blueprints, published by Packt
Jupyter Notebook
349
star
65

Practical-Time-Series-Analysis

Practical Time-Series Analysis, published by Packt
Jupyter Notebook
345
star
66

Machine-Learning-for-Algorithmic-Trading-Bots-with-Python

Jupyter Notebook
337
star
67

Machine-Learning-for-Finance

Machine Learning for Finance, published by Packt
Jupyter Notebook
336
star
68

Effective-Python-Penetration-Testing

Effective Python Penetration Testing by Packt Publishing
Python
334
star
69

Python-Algorithmic-Trading-Cookbook

Python Algorithmic Trading Cookbook, published by Packt
Jupyter Notebook
325
star
70

Hands-On-Intelligent-Agents-with-OpenAI-Gym

Code for Hands On Intelligent Agents with OpenAI Gym book to get started and learn to build deep reinforcement learning agents using PyTorch
Python
322
star
71

Python-Artificial-Intelligence-Projects-for-Beginners

Python Artificial Intelligence Projects for Beginners, published by Packt
Jupyter Notebook
321
star
72

Hands-On-Reactive-Programming-in-Spring-5

Hands-On Reactive Programming in Spring 5, published by Packt
Java
320
star
73

Micro-State-Management-with-React-Hooks

Micro State Management with React Hooks, published by Packt
TypeScript
317
star
74

The-Azure-Cloud-Native-Architecture-Mapbook

The Azure Cloud Native Architecture Mapbook, published by Packt
C#
315
star
75

Godot-Game-Engine-Projects

Godot Game Engine Projects, published by Packt
GDScript
315
star
76

Modern-Time-Series-Forecasting-with-Python

Modern Time Series Forecasting with Python, published by Packt
Jupyter Notebook
315
star
77

Python-GUI-Programming-Cookbook-Second-Edition

Python GUI Programming Cookbook, Second Edition, published by Packt
Python
312
star
78

Computer-Vision-with-OpenCV-3-and-Qt5

Computer Vision with OpenCV 3 and Qt5, published by Packt
C++
305
star
79

Learning-PySpark

Code repository for Learning PySpark by Packt
Jupyter Notebook
303
star
80

Deep-Learning-with-TensorFlow-2-and-Keras

Deep Learning with TensorFlow 2 and Keras, published by Packt
Jupyter Notebook
302
star
81

PyTorch-Computer-Vision-Cookbook

PyTorch Computer Vision Cookbook, Published by Packt
Jupyter Notebook
302
star
82

Mastering-Machine-Learning-for-Penetration-Testing

Mastering Machine Learning for Penetration Testing, published by Packt
Python
298
star
83

Learning-Vuejs-2

This is the code repository for Learning Vue.js 2, published by Packt.
JavaScript
296
star
84

Building-Data-Science-Applications-with-FastAPI

Building Data Science Applications with FastAPI, Published by Packt
Python
295
star
85

OpenGL-4-Shading-Language-Cookbook-Third-Edition

OpenGL 4 Shading Language Cookbook - Third Edition, published by Packt
C
295
star
86

Mastering-Transformers

Mastering Transformers, published by Packt
Jupyter Notebook
289
star
87

Neural-Network-Projects-with-Python

Neural Network Projects with Python, Published by Packt
Python
289
star
88

Bioinformatics-with-Python-Cookbook-Second-Edition

Bioinformatics with Python Cookbook Second Edition, published by Packt
OpenEdge ABL
287
star
89

Hands-on-Python-for-Finance

Hands-on Python for Finance published by Packt.
Jupyter Notebook
284
star
90

Full-Stack-React-TypeScript-and-Node

Full-Stack React, TypeScript, and Node, published by Packt
TypeScript
282
star
91

CPP-Data-Structures-and-Algorithms

C++ Data Structures and Algorithms, published by Packt
C++
279
star
92

The-Modern-Cpp-Challenge

The Modern C++ Challenge, published by Packt
C
276
star
93

The-Complete-Coding-Interview-Guide-in-Java

The Complete Coding Interview Guide in Java, published by Packt
Java
272
star
94

Pandas-Cookbook-Second-Edition

Pandas Cookbook Second Edition, published by Packt
Jupyter Notebook
271
star
95

Full-Stack-React-Projects

Full-Stack React Projects, published by Packt
JavaScript
271
star
96

Machine-Learning-for-Cybersecurity-Cookbook

Machine Learning for Cybersecurity Cookbook, published by Packt
Jupyter Notebook
270
star
97

Natural-Language-Processing-with-TensorFlow

Natural Language Processing with TensorFlow, published by Packt
Jupyter Notebook
269
star
98

50-Projects-In-50-Days---HTML-CSS-JavaScript

50 Projects In 50 Days - HTML, CSS & JavaScript, by Packt Publishing
CSS
269
star
99

Hands-On-Image-Processing-with-Python

Jupyter Notebook
264
star
100

Mastering-Distributed-Tracing

"Mastering Distributed Tracing" by Yuri Shkuro, published by Packt
Java
264
star