• Stars
    star
    440
  • Rank 98,359 (Top 2 %)
  • Language
  • License
    MIT License
  • Created about 3 years ago
  • Updated over 1 year ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

!!! Retired !!!

This repo has been replaced by dp-203-azure-data-engineer and will be archived in the coming weeks. Please use the exercises in the new repo for all ILT deliveries.


DP-203T00: Data Engineering on Azure

Welcome to the course DP-203: Data Engineering on Azure. To support this course, we will need to make updates to the course content to keep it current with the Azure services used in the course. We are publishing the lab instructions and lab files on GitHub to allow for open contributions between the course authors and MCTs to keep the content current with changes in the Azure platform.

Lab overview

The following is a summary of the lab objectives for each module:

Day 1

Module 00: Lab environment setup

Complete the lab environment setup for this course.

Module 01: Explore compute and storage options for data engineering workloads

This lab teaches ways to structure the data lake, and to optimize the files for exploration, streaming, and batch workloads. The student will learn how to organize the data lake into levels of data refinement as they transform files through batch and stream processing. The students will also experience working with Apache Spark in Azure Synapse Analytics. They will learn how to create indexes on their datasets, such as CSV, JSON, and Parquet files, and use them for potential query and workload acceleration using Spark libraries including Hyperspace and MSSParkUtils.

Module 02: Run interactive queries using Azure Synapse Analytics serverless SQL pools

In this lab, students will learn how to work with files stored in the data lake and external file sources, through T-SQL statements executed by a serverless SQL pool in Azure Synapse Analytics. Students will query Parquet files stored in a data lake, as well as CSV files stored in an external data store. Next, they will create Azure Active Directory security groups and enforce access to files in the data lake through Role-Based Access Control (RBAC) and Access Control Lists (ACLs).

Module 03: Data Exploration and Transformation in Azure Databricks

This lab teaches you how to use various Apache Spark DataFrame methods to explore and transform data in Azure Databricks. You will learn how to perform standard DataFrame methods to explore and transform data. You will also learn how to perform more advanced tasks, such as removing duplicate data, manipulate date/time values, rename columns, and aggregate data. They will provision the chosen ingestion technology and integrate this with Stream Analytics to create a solution that works with streaming data.

Day 2

Module 04: Explore, transform, and load data into the Data Warehouse using Apache Spark

This lab teaches you how to explore data stored in a data lake, transform the data, and load data into a relational data store. You will explore Parquet and JSON files and use techniques to query and transform JSON files with hierarchical structures. Then you will use Apache Spark to load data into the data warehouse and join Parquet data in the data lake with data in the dedicated SQL pool.

Module 05: Ingest and load data into the data warehouse

This lab teaches students how to ingest data into the data warehouse through T-SQL scripts and Synapse Analytics integration pipelines. The student will learn how to load data into Synapse dedicated SQL pools with PolyBase and COPY using T-SQL. The student will also learn how to use workload management along with a Copy activity in a Azure Synapse pipeline for petabyte-scale data ingestion.

Module 06: Transform data with Azure Data Factory or Azure Synapse Pipelines

This lab teaches students how to build data integration pipelines to ingest from multiple data sources, transform data using mapping data flows and notebooks, and perform data movement into one or more data sinks.

Day 3

Module 07: Integrate data from Notebooks with Azure Data Factory or Azure Synapse Pipelines

In the lab, the students will create a notebook to query user activity and purchases that they have made in the past 12 months. They will then add the notebook to a pipeline using the new Notebook activity and execute this notebook after the Mapping Data Flow as part of their orchestration process. While configuring this the students will implement parameters to add dynamic content in the control flow and validate how the parameters can be used.

Module 08: End-to-end security with Azure Synapse Analytics

In this lab, students will learn how to secure a Synapse Analytics workspace and its supporting infrastructure. The student will observe the SQL Active Directory Admin, manage IP firewall rules, manage secrets with Azure Key Vault and access those secrets through a Key Vault linked service and pipeline activities. The student will understand how to implement column-level security, row-level security, and dynamic data masking when using dedicated SQL pools.

Module 09: Support Hybrid Transactional Analytical Processing (HTAP) with Azure Synapse Link

This lab teaches you how Azure Synapse Link enables seamless connectivity of an Azure Cosmos DB account to a Synapse workspace. You will understand how to enable and configure Synapse link, then how to query the Azure Cosmos DB analytical store using Apache Spark and SQL Serverless.

Day 4

Module 10: Real-time Stream Processing with Stream Analytics

This lab teaches you how to process streaming data with Azure Stream Analytics. You will ingest vehicle telemetry data into Event Hubs, then process that data in real time, using various windowing functions in Azure Stream Analytics. You will output the data to Azure Synapse Analytics. Finally, you will learn how to scale the Stream Analytics job to increase throughput.

Module 11: Create a Stream Processing Solution with Event Hubs and Azure Databricks

This lab teaches you how to ingest and process streaming data at scale with Event Hubs and Spark Structured Streaming in Azure Databricks. You will learn the key features and uses of Structured Streaming. You will implement sliding windows to aggregate over chunks of data and apply watermarking to remove stale data. Finally, you will connect to Event Hubs to read and write streams.

How should I use these files relative to the released MOC files?

  • The instructor handbook and PowerPoints are still going to be your primary source for teaching the course content.

  • These files on GitHub are designed to be used in conjunction with the student handbook, but are in GitHub as a central repository so MCTs and course authors can have a shared source for the latest lab files.

  • the lab instructions for each module are found in the /Instructions/Labs folder. Each subfolder within this location refers to each module. For example, Lab01 relates to module01 etc. A README.md file exists in each folder with the lab instructions that the students will then follow.

  • It will be recommended that for every delivery, trainers check GitHub for any changes that may have been made to support the latest Azure services, and get the latest files for their delivery.

  • Please note that some of the images that you see in these lab instructions will not neccessarily reflect the state of the lab environment that you will be using in this course. For example, while browsing for files in a data lake, you may see adiitional folders in the images that may not exist in your environment. This is by design, and your lab instructions will still work.

What about changes to the student handbook?

  • We will review the student handbook on a quarterly basis and update through the normal MOC release channels as needed.

How do I contribute?

  • Any MCT can submit a issues to the code or content in the GitHub repro, Microsoft and the course author will triage and include content and lab code changes as needed.

Classroom Materials

It is strongly recommended that MCTs and Partners access these materials and in turn, provide them separately to students. Pointing students directly to GitHub to access Lab steps as part of an ongoing class will require them to access yet another UI as part of the course, contributing to a confusing experience for the student. An explanation to the student regarding why they are receiving separate Lab instructions can highlight the nature of an always-changing cloud-based interface and platform. Microsoft Learning support for accessing files on GitHub and support for navigation of the GitHub site is limited to MCTs teaching this course only.

What are we doing?

  • To support this course, we will need to make frequent updates to the course content to keep it current with the Azure services used in the course. We are publishing the lab instructions and lab files on GitHub to allow for open contributions between the course authors and MCTs to keep the content current with changes in the Azure platform.

  • We hope that this brings a sense of collaboration to the labs like we've never had before - when Azure changes and you find it first during a live delivery, go ahead and make an enhancement right in the lab source. Help your fellow MCTs.

How should I use these files relative to the released MOC files?

  • The instructor handbook and PowerPoints are still going to be your primary source for teaching the course content.

  • These files on GitHub are designed to be used in conjunction with the student handbook, but are in GitHub as a central repository so MCTs and course authors can have a shared source for the latest lab files.

  • It will be recommended that for every delivery, trainers check GitHub for any changes that may have been made to support the latest Azure services, and get the latest files for their delivery.

What about changes to the student handbook?

  • We will review the student handbook on a quarterly basis and update through the normal MOC release channels as needed.

How do I contribute?

  • Any MCT can submit a pull request to the code or content in the GitHub repro, Microsoft and the course author will triage and include content and lab code changes as needed.

  • You can submit bugs, changes, improvement and ideas. Find a new Azure feature before we have? Submit a new demo!

Notes

Classroom Materials

It is strongly recommended that MCTs and Partners access these materials and in turn, provide them separately to students. Pointing students directly to GitHub to access Lab steps as part of an ongoing class will require them to access yet another UI as part of the course, contributing to a confusing experience for the student. An explanation to the student regarding why they are receiving separate Lab instructions can highlight the nature of an always-changing cloud-based interface and platform. Microsoft Learning support for accessing files on GitHub and support for navigation of the GitHub site is limited to MCTs teaching this course only.

More Repositories

1

AZ-104-MicrosoftAzureAdministrator

AZ-104 Microsoft Azure Administrator
PowerShell
3,408
star
2

AZ-204-DevelopingSolutionsforMicrosoftAzure

AZ-204: Developing solutions for Microsoft Azure
C#
2,198
star
3

AZ500-AzureSecurityTechnologies

Microsoft Azure Security Technologies
Bicep
835
star
4

AI-102-AIEngineer

Lab files for AI-102 - AI Engineer
C#
699
star
5

mslearn-dp100

Lab files for Azure Machine Learning exercises
Jupyter Notebook
614
star
6

DP100

Labs for Course DP-100: Designing and Implementing Data Science Solutions on Microsoft Azure
Jupyter Notebook
614
star
7

AZ400-DesigningandImplementingMicrosoftDevOpsSolutions

AZ-400 Course Repository for Labs and Demos.
529
star
8

AZ-303-Microsoft-Azure-Architect-Technologies

PowerShell
475
star
9

mslearn-ai900

Lab files for AI-900: Azure AI Fundamentals
Jupyter Notebook
451
star
10

PL-300-Microsoft-Power-BI-Data-Analyst

PowerShell
422
star
11

DA-100-Analyzing-Data-with-Power-BI

TSQL
375
star
12

AZ-304-Microsoft-Azure-Architect-Design

Dockerfile
358
star
13

dp-203-azure-data-engineer

Exercise files for Microsoft Data Engineer curriculum
PowerShell
351
star
14

Principles-of-Machine-Learning-Python

Principles of Machine Learning Python
Jupyter Notebook
348
star
15

Lab-Demo-Recordings

Recordings and Demos of Labs
324
star
16

AI-900-AIFundamentals

Includes labs for AI Fundamentals.
PowerShell
319
star
17

DP-900T00A-Azure-Data-Fundamentals

DP-900 ILT lab instructions
Jupyter Notebook
272
star
18

SC-200T00A-Microsoft-Security-Operations-Analyst

PowerShell
255
star
19

PL-900-Microsoft-Power-Platform-Fundamentals

PL-900 Microsoft Power Platform Fundamentals
245
star
20

AZ-700-Designing-and-Implementing-Microsoft-Azure-Networking-Solutions

Bicep
211
star
21

mslearn-openai

C#
210
star
22

Data-Science-Essentials

Course files for the Microsoft data Science Essentials Course
209
star
23

AZ-305-DesigningMicrosoftAzureInfrastructureSolutions

AZ-305: Designing Microsoft Azure Infrastructure Solutions
201
star
24

MS-500-Microsoft-365-Security

MS-500 Microsoft 365 Security Administrator courses
201
star
25

SC-300-Identity-and-Access-Administrator

PowerShell
195
star
26

DP-203T00-Data-Engineering-on-Microsoft-Azure

C#
183
star
27

mslearn-fabric

This repository hosts content related to Microsoft Fabric content on Microsoft Learn.
Jupyter Notebook
159
star
28

mslearn-azure-ml

Jupyter Notebook
158
star
29

PL-400_Microsoft-Power-Platform-Developer

C#
155
star
30

EntityFramework

Entity Framework and MVC MVA
C#
140
star
31

PL-200-Power-Platform-Functional-Consultant

PL-200 Power Platform Functional Consultant
PowerShell
139
star
32

PL-100-Microsoft-Power-Platform-App-Maker

139
star
33

DP-300T00-Administering-Relational-Databases-on-Azure

TSQL
137
star
34

WebAPIDesign

Web API Design MVA
C#
134
star
35

MS-700-Managing-Microsoft-Teams

124
star
36

MVAAngular

Slides and demos from AngularJS MVA
HTML
124
star
37

AZ-100-MicrosoftAzureInfrastructureDeployment

121
star
38

eShopOnWeb

Repository maintained by AZ-400 course and Learn content community. Project used for AZ-400 Labs. Forked from: https://github.com/dotnet-architecture/eShopOnWeb Sample - ASP.NET Core 8.0 reference application, powered by Microsoft, demonstrating a layered application architecture with monolithic deployment model.
C#
120
star
39

Research-Methods-for-Data-Science-with-Python

Research Methods for Data Science with Python
Jupyter Notebook
120
star
40

AZ-140-Configuring-and-Operating-Microsoft-Azure-Virtual-Desktop

109
star
41

mslearn-ai-fundamentals

Azure AI Fundamentals exercises
99
star
42

SC-900-Microsoft-Security-Compliance-and-Identity-Fundamentals

98
star
43

Programming-in-R-for-Data-Science

R
97
star
44

Introduction-to-Data-Analysis-using-Excel

89
star
45

Deep-Learning-Explained

This repository contains the lab files for Microsoft course DAT236x: Deep Learning Explained
Jupyter Notebook
89
star
46

dp-420-cosmos-db-dev

DP-420: Designing and Implementing Cloud-Native Applications Using Microsoft Azure Cosmos DB
C#
80
star
47

Speech-Recognition

Hosted files for DEV287x-Speech Recognition Systems
Roff
74
star
48

Principles-of-Machine-Learning-R

Principles of Machine Learning R
Jupyter Notebook
73
star
49

Data-Analysis-for-Absolute-Beginners

First course of the Data Analysis Professional Program, Data Analysis: A Practical Approach for Absolute Beginners
66
star
50

DP-500-Azure-Data-Analyst

Lab files for DP-500
TSQL
66
star
51

SC-100-Microsoft-Cybersecurity-Architect

65
star
52

AZ-101-MicrosoftAzureIntegrationandSecurity

PowerShell
64
star
53

AZ-120-Planning-and-Administering-Microsoft-Azure-for-SAP-Workloads

Labs for AZ-120 Planning and administering Microsoft Azure for SAP workloads
64
star
54

Implementing-ETL

57
star
55

Reinforcement-Learning-Explained

This repository contains the lab files for Microsoft course DAT257x: Reinforcement Learning Explained
Jupyter Notebook
54
star
56

MS-600-Building-Applications-and-Solutions-with-Microsoft-365-Core-Services

C#
51
star
57

Essential-Math-R

Essential-Math-R
Jupyter Notebook
50
star
58

Django

Repository for the Django MVA
Python
46
star
59

MCT-User-Guide

GitHub User Guide for MCTs
45
star
60

Processing-Big-Data-with-Hadoop-in-Azure-HDInsight

Shared files for Processing Big Data with Hadoop in Azure HDInsight course
44
star
61

AZ-040T00-Automating-Administration-with-PowerShell

44
star
62

INF99X-SampleCourse

INF99X: Sample Course (Sample Repository)
44
star
63

Essential-Math

Lab files for Essential Math course
Jupyter Notebook
43
star
64

databricks-intro

An introduction to analyzing data using Spark in Azure Databricks
43
star
65

PythonSqlFlask

Code demos and slides for SQL, Python, Flask MVA
Python
41
star
66

AZ-900T0xES-MicrosoftAzureFundamentals

AZ-900T00 y AZ-900T01: Fundamentos de Microsoft Azure
40
star
67

MB-800-Business-Central-Functional-Consultant

40
star
68

Essential-Math-for-Data-Analysis

Learn the essentials of math for data analysis using Excel Online - Part of the Data Analyst Microsoft Professional Program
40
star
69

mslearn-ai-language

Lab files for Azure AI Language modules
C#
40
star
70

AZ-104JA-MicrosoftAzureAdministrator

AZ-104: Microsoft Azure ็ฎก็†่€…
PowerShell
39
star
71

AZ-800-Administering-Windows-Server-Hybrid-Core-Infrastructure

AZ-800
PowerShell
39
star
72

PL-600-Microsoft-Power-Platform-Solution-Architect

39
star
73

SC-400T00A-Microsoft-Information-Protection-Administrator

37
star
74

edX-DEV212x-Intro-to-DevOps

The repository for the edX course: Introduction to DevOps.
JavaScript
35
star
75

MS-203T00-Microsoft-365-Messaging

34
star
76

dp-300-database-administrator

Repository for lab exercises and instructions for Microsoft DP-300 learning content
TSQL
34
star
77

mslearn-ai-services

Lab files for Azure AI Services modules
C#
32
star
78

Introduction-to-Data-Science

Course files for DAT101x: Introduction to Data Science
32
star
79

WS-011-Windows-Server-2019-Administration

31
star
80

mslearn-ai-vision

Lab files for Azure AI Vision modules
C#
31
star
81

dp-080-Transact-SQL

TSQL
30
star
82

JavaScript-Experienced-Developers

A repository for the MVA course on JavaScript for experienced developers.
JavaScript
30
star
83

Data-Science-Orientation

29
star
84

databricks-ml

Machine Learning with Azure Databricks
28
star
85

AZ-104ZH-MicrosoftAzureAdministrator

AZ-104๏ผšMicrosoft Azure ็ฎก็†ๅ‘˜
PowerShell
28
star
86

Research-Methods-for-Data-Science-with-R

Research Methods for Data Science with R
Jupyter Notebook
27
star
87

mslearn-ai-document-intelligence

Lab files for Azure AI Document Intelligence modules
Python
25
star
88

ReactJS

25
star
89

PL-500T00-Microsoft-Power-Automate-RPA-Developer

PL-500T00-Microsoft-Power-Automate-RPA-Developer
25
star
90

MB-901T00-Microsoft-Dynamics-365-Fundamentals

MB-901 Microsoft Dynamics 365 Fundamentals
23
star
91

mslearn-databricks

PowerShell
22
star
92

MB-820-Business-Central-Developer-Certification

AL
22
star
93

MVA-42339-PowerShell-for-SQL-Data-Professionals

Scripts and demo files for MVA course.
PowerShell
21
star
94

mslearn-ai-studio

Practical exercises for Azure AI Studio training
21
star
95

MB-220-Dynamics365forMarketing

MB-220 Dynamics 365 Marketing
20
star
96

MB-230-Dynamics365forCustomerService

MB-230 Dynamics 365 Customer Service
20
star
97

MS-720-Microsoft-Teams-Voice-Engineer

20
star
98

mslearn-knowledge-mining

Lab files for Azure AI Knowledge Mining modules
C#
19
star
99

WS-013T00-Azure-Stack-HCI

Create git repo
PowerShell
18
star
100

Implementing-Realtime-Analysis-with-Hadoop-in-HDInsight

Course files for Implementing Realtime Analysis with Hadoop in HDInsight
18
star