• Stars
    star
    190
  • Rank 203,739 (Top 5 %)
  • Language
    Jupyter Notebook
  • Created over 4 years ago
  • Updated almost 3 years ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Azure Synapse Analytics in a Day Lab

Wide World Importers

Wide World Importers (WWI) is a wholesale novelty goods importer and distributor operating from the San Francisco bay area.

As a wholesaler, WWI's customers are mostly companies who resell to individuals. WWI sells to retail customers across the United States, including specialty stores, supermarkets, computing stores, tourist attraction shops, and some individuals. WWI sells to other wholesalers via a network of agents who promote the products on WWI's behalf. While all of WWI's customers are currently based in the United States, the company intends to expand into other countries.

WWI buys goods from suppliers, including novelty and toy manufacturers, and other novelty wholesalers. They stock the goods in their WWI warehouse and reorder from suppliers as needed to fulfill customer orders. They also purchase large volumes of packaging materials and sell these in smaller quantities as a convenience for the customers.

Recently WWI started to sell a variety of edible novelties such as chili chocolates. The company previously did not have to handle chilled items. To meet food handling requirements, they must monitor the temperature in their chiller room and any of their trucks that have chiller sections.

Lab context

Wide World Importers is designing and implementing a Proof of Concept (PoC) for a unified data analytics platform. Their soft goal is to bring siloed teams to work together on a single platform.

In this lab, you will play the role of various persona: a data engineer, a business analyst, and a data scientist. The workspace is already set up to focus on some of the core development capabilities of Azure Synapse Analytics.

By the end of this lab, you will have performed a non-exhaustive list of operations that combine the strength of Big Data and SQL analytics into a single platform.

How to get started with a provided lab environment

If you are using a hosted lab environment, please follow the steps below to get started:

  1. Select the Lab Environment tab above the lab guide to copy the Azure credentials used for the lab. Make note of the UniqueId value. This value may be referenced at different points during the lab.

    The lab environment details are displayed.

  2. Select Lab Resources under Lab Environment to start the Virtual Machine (VM) provided for this lab. However, you do not need to use the VM to complete the lab. It is there for your convenience to make it easier to sign into Azure if you have an existing account and do not want to log out of it.

    The Virtual Machines are displayed and the Play button is highlighted.

Solution architecture

The diagram below provides a unified view of the exercises in the lab and their estimated times for completion.

Azure Synapse Analytics Lab Exercises

Exercise 1 - Explore the data lake with Azure Synapse Serverless SQL Pool and Azure Synapse Spark

In this exercise, you will explore data using the engine of your choice (SQL or Spark).

Understanding data through data exploration is one of the core challenges faced today by data engineers and data scientists as well. Depending on the data's underlying structure and the specific requirements of the exploration process, different data processing engines will offer varying degrees of performance, complexity, and flexibility.

In Azure Synapse Analytics, you have the possibility of using either the SQL Serverless engine, the big-data Spark engine, or both.

Exercise 2 - Working with Azure Synapse Pipelines

In this exercise, you will use a pipeline with parallel activities to bring data into the Data Lake, transform it, and load it into the Azure Synapse SQL Pool. You will also monitor the progress of the associated tasks.

Once data is properly understood and interpreted, moving it to the various destinations where processing steps occur is the next big task. Any modern data platform must provide a seamless experience for all the typical data wrangling actions like extractions, parsing, joining, standardizing, augmenting, cleansing, consolidating, and filtering.

Azure Synapse Analytics provides two significant categories of features - data flows and data orchestrations (implemented as pipelines). They cover the whole range of needs, from design and development to triggering, execution, and monitoring.

Exercise 3 - High Performance Analysis with Azure Synapse Dedicated SQL Pools

In this exercise, you will try to understand customer details using a query and chart visualizations. You will also explore the performance of various queries.

SQL data warehouses have been for a long time the center of gravity in data platforms. Current data warehouses are capable of providing high performance, distributed, and governed workloads, regardless of the data volumes at hand.

The Azure Synapse SQL Pools in Azure Synapse Analytics is the new incarnation of the former Azure SQL Data Warehouse. It provides all the state-of-the-art SQL data warehousing features while benefiting from the advanced integration with all the other Synapse services.

Exercise 4 - Lake Databases and Database templates

In this exercise, you will explore the concept of a lake database and you will learn how to use readily available database templates for lake databases.

The lake database in Azure Synapse Analytics enables you to bring together database design, meta information about the data that is stored and a possibility to describe how and where the data should be stored. Lake database addresses the challenge of today's data lakes where it is hard to understand how data is structured.

Exercise 5 - Log and telemetry analytics

In this exercise, you will explore the capabilities of the newly integrared Data Explorer runtime in Synapse Analytics.

Azure Synapse data explorer provides you with a dedicated query engine optimized and built for log and time series data workloads. With this new capability now part of Azure Synapse's unified analytics platform, you can easily access your machine and user data to surface insights that can directly improve business decisions. To complement the existing SQL and Apache Spark analytical runtimes, Azure Synapse data explorer is optimized for efficient log analytics, using powerful indexing technology to automatically index structured, semi-structured, and free-text data commonly found in telemetry data.

Exercise 6 - Data governance with Azure Purview

In this exercise, you will use several of the capabilities provided by the integration between Azure Synapse Analytics and Azure Purview workspaces.

Azure Purview is a unified data governance solution that helps you manage and govern your on-premises, multicloud, and software-as-a-service (SaaS) data. Purview enables you to easily create a holistic, up-to-date map of your data landscape with automated data discovery, sensitive data classification, and end-to-end data lineage. It also enables data consumers to find valuable, trustworthy data.

Azure Synapse Analytics and Azure Purview workspaces are tightly integrated, enabling seamless data discovery and lineage.

Exercise 7 - Power BI integration

In this exercise, you will build a Power BI report in Azure Synapse Analytics.

The visual approach in data exploration, analysis, and interpretation is one of the essential tools for both technical users (data engineers, data scientists) and business users. Having a highly flexible and performant data presentation layer is a must for any modern data platform.

Azure Synapse Analytics integrates natively with Power BI, a proven and highly successful data presentation and exploration platform. The Power BI experience is available inside Synapse Studio.

Extension module

The exercise is accompanied by a Power BI extension module with four additional (optional) exercises.

Exercise 8 - Data Science with Spark (optional)

In this exercise, you will leverage a model trained with Azure Machine Learning AutoML using Spark compute to make predictions using the T-SQL PREDICT statement in an Azure Synapse Analytics dedicated SQL pool.

Azure Synapse Analytics provides support for using trained models (in ONNX format) directly from dedicated SQL pools. What this means in practice, is that your data engineers can write T-SQL queries that use those models to make predictions against tabular data stored in a SQL Pool database table.

The model is trained and registered by Azure Machine Learning automated ML (AutoML) using the compute resources provided by a Synapse Analytics Spark pool (the main requirement is that the model format must be supported by ONNX). Using the integration of the Azure Machine Learning experience into Synapse Analytics Studio, the trained model is deployed to the dedicated SQL pool where it is used for inference via the T-SQL PREDICT statement.

More Repositories

1

azure-synapse-analytics-workshop-400

PowerShell
169
star
2

microsoft-learning-paths-databricks-notebooks

Contains notebooks used in the Microsoft Azure Databricks Learning Paths modules.
161
star
3

tech-immersion-data-ai

C#
123
star
4

udacity-intro-to-ml-labs

Jupyter Notebook
80
star
5

foundationallm

A platform accelerating delivery of secure, trustworthy enterprise copilots.
C#
69
star
6

data-ai-technical-bootcamp

Student materials for the Data & AI Technical Bootcamp
60
star
7

azure-synapse-analytics-ga-content-packs

Readiness content packs for Azure Synapse Analytics features released at GA.
PowerShell
40
star
8

ai-in-a-day

Azure AI in a Day Labs
Jupyter Notebook
39
star
9

microsoft-data-engineering-ilt-deploy

Lab environment deployments for the Microsoft data engineering (DP-203) ILT learning content.
PowerShell
27
star
10

machine-learning-quickstarts

Jupyter Notebook
24
star
11

nosql-openhack

JavaScript
17
star
12

azure-synapse-analytics-workshop-300

PowerShell
14
star
13

azure-synapse-in-a-day-demos

Jupyter Notebook
13
star
14

Azure-Machine-Learning-Dev-Guide

12
star
15

azure-synapse-analytics-workshop-300-2-day

Two-day level 300 Azure Synapse Analytics workshop
PowerShell
11
star
16

dp-203-v2

Microsoft Azure DP-203 labs - version 2
PowerShell
9
star
17

serverless-microservices

C#
8
star
18

Databricks-Labs

PowerShell
7
star
19

azure-machine-learning-quickstarts

Quickstart labs that highlight specific features of Azure Machine Learning.
Jupyter Notebook
7
star
20

mcw-mlops-starter

Python
6
star
21

azure-data-engineering-conference-workshop-students

Public repo for students of the Azure data engineering conference workshop.
6
star
22

proj-learning-paths-public

PowerShell
6
star
23

azure-synapse-wwi-lab

5
star
24

aml-notebook-tutorials

Azure Machine Learning Notebook Tutorials
Jupyter Notebook
5
star
25

mcw-mlops-starter-v3

Python
4
star
26

azure-databricks-dev-guide

4
star
27

synapse-in-a-day-deployment

Jupyter Notebook
4
star
28

azure-machine-learning-service-labs

Jupyter Notebook
4
star
29

microsoft-leveraging-azure-digital-twins-supply-chain

This repo contains the Microsoft Cloud Workshop - Leveraging Azure Digital Twins in a supply chain
C#
4
star
30

precon-synapse-power-bi

Build Your First Analytics Data Platform with Azure Synapse and Power BI
3
star
31

MCW-Securing-the-IoT-end-to-end

PowerShell
3
star
32

cloud-core-2020

C#
3
star
33

data-ai-partner-bootcamp

3
star
34

MCW-Azure-Synapse-Analytics

Jupyter Notebook
3
star
35

firedrone-hack-starter

Starter materials and instructions for the FireDrone.AI
Jupyter Notebook
3
star
36

cloudcore-mba

JavaScript
2
star
37

taw-power-apps-power-platform

Companion lab guide for the Designing Power Apps for Power Platform TAW (Microsoft Technical Application Workshop)
C#
2
star
38

mcw-ai-with-azure-databricks-and-azure-machine-learning

2
star
39

advanced-computing-workshop

Guides and lab assets for the workshop "Survey of Advanced Computing in Azure"
Jupyter Notebook
2
star
40

MCW-Predictive-Maintenance-for-Remote-Field-Devices

C#
2
star
41

MCW-innovate-modernize-apps-with-data-ai

C#
2
star
42

microsoft-virtual-training-public

Public repo for code samples
JavaScript
2
star
43

conference-ai-workshop

Repo for the DEVintersection/Microsoft Azure + AI conference After Dark AI workshop hosted by Solliance.
2
star
44

oracle-to-postgresql-migration-guide

This repository contains the code and guide to help a user migrate a Java app using Oracle to PostgreSQL.
Java
2
star
45

CosmosDB-v3-labs

Updates to https://github.com/CosmosDB/labs for SDK v3
Java
2
star
46

security-defender-workshop-400

sentinel-defender-workshop-400
PowerShell
2
star
47

azure-ai-in-a-day-lab-02-starter

Starter repo for Azure AI in a Day lab 02.
Python
2
star
48

MCW-MLOps

Jupyter Notebook
2
star
49

azure-defender-workshop-400

azure-defender-workshop-400
PowerShell
2
star
50

deep-learning-for-developers

2
star
51

ica-wbs

PHP
2
star
52

Solliance_AI_Led_business_process_automation

Temporary home of MCW for Cognitive Services
C#
1
star
53

clean-architecture-workshop

clean-architecture-workshop
PowerShell
1
star
54

mcw-mlops-starter-v2

Contains the starter code for the ML Ops MCW.
Python
1
star
55

tailwind-traders-multicloud

CSS
1
star
56

advanced-dotnet-workshop

advanced-dotnet-workshop
PowerShell
1
star
57

azure-consumption

Scripts to configure tags on resource groups and a Power BI Report.
PowerShell
1
star
58

MCW-Modernizing-data-analytics-with-SQL-Server-2019

Jupyter Notebook
1
star
59

cosmos-db-iot-solution-accelerator

C#
1
star
60

challenge-big-data-vis

Big Data & Visualization challenge repo for learners.
JavaScript
1
star
61

domain-driven-design-workshop

domain-driven-design-workshop
PowerShell
1
star
62

LABVM

Install Lab VMs
PowerShell
1
star
63

security-workshop

DevIntersection Security Workshop
JavaScript
1
star
64

azure-databricks-lablets

Quick 10 minute labs exploring the capabilities of Azure Databricks
1
star
65

microsoft-mcw-continuous-delivery

MCW Azure Continuous Delivery
CSS
1
star
66

mlops-starter

Python
1
star
67

azure-synapse-workshops-common

Jupyter Notebook
1
star
68

microsoft-mysql-developer-guide

PHP
1
star
69

IoTLabs

Jupyter Notebook
1
star
70

advanced-csharp-workshop

advanced-csharp-workshop
PowerShell
1
star
71

MCW-Managed-open-source-databases-on-Azure

C#
1
star
72

microservices-workshop

microservices-workshop
CSS
1
star
73

common-workshop

common-workshop
PowerShell
1
star
74

microsoft-partner-boot-camp

MCWs for the November 2019 Microsoft partner boot camp.
1
star
75

kubernetes-workshop

kubernetes-workshop
PowerShell
1
star