• Stars
    star
    220
  • Rank 180,422 (Top 4 %)
  • Language
    Dockerfile
  • License
    MIT License
  • Created almost 6 years ago
  • Updated 4 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

Docker containers for running training scripts on AzureML

Azure Machine Learning base images

This repository contains Dockerfiles for the base images used in Azure Machine Learning.

Table of Contents

Introduction

These Docker images serve as base images for training and inference in Azure ML. While submitting a training job on AmlCompute or any other target with Docker enabled, Azure ML runs your job in a conda environment within a Docker container.

You can also use these Docker images as base images for your custom Azure ML Environments. If you specify any conda dependencies in your Environment, the extra dependencies are installed on top of the dependencies in the Docker image.

Note that these base images do not come with Python packages, notably the Azure ML Python SDK, installed. If you require the Azure ML SDK package for your job, make sure you also install the appropriate package.

Please note that images supporting Ubuntu 16.04 are now deprecated. We recommend using images supporting Ubuntu 18.04 for the timebeing as we transition towards providing 20.04 images.

Base image dependencies

Currently Azure ML supports cuda9, cuda10 and cuda11 base images. The major dependencies installed in the base images are Miniconda, OpenMPI, CUDA, cuDNN, NCCL, and git. For more detailed information, please view the dockerfiles.

The CPU images are built from ubuntu18.04 and ubuntu20.04.

The GPU images for cuda9 are built from nvidia/cuda:9.0-cudnn7-devel-ubuntu16.04.

The GPU images for cuda10 are built from:

  • nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
  • nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04
  • nvidia/cuda:10.2-cudnn8-devel-ubuntu18.04

The GPU images for cuda11 are built from:

  • nvidia/cuda:11.0.3-cudnn8-devel-ubuntu18.04
  • nvidia/cuda:11.1-cudnn8-devel-ubuntu18.04
  • nvidia/cuda:11.1.1-cudnn8-devel-ubuntu20.04

How to get the Azure ML images

All images in this repository are published to Microsoft Container Registry (MCR).

You can pull these images from MCR using the following command:

  • CPU image example: docker pull mcr.microsoft.com/azureml/openmpi3.1.2-ubuntu18.04
  • GPU image example: docker pull mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04

If you observe the naming convention, image tag maps to the folder name that contains the corresponding Dockerfile.

GPU images pulled from MCR can only be used with Azure Services. Take a look at LICENSE.txt file inside the docker container for more information. GPU images are built from nvidia images. For NVIDIA CUDA and cuDNN take a look at the ThirdPartyNotices.txt file inside the docker container for more information about NVIDIA’s license terms

Featured tags

Below is the list of tags:

Using Azure ML base images for training

In some cases, the Azure ML base images will be used by default:

  • By default, if no base image is explicitly set by the user for a training run, Azure ML will use the image corresponding to azureml.core.environment.DEFAULT_CPU_IMAGE.

  • If you are using an Azure ML curated environment, those are already configured with one of the Azure ML base images. To see which base image is used by a specific curated environment, you can run the following:

    from azureml.core import Environment
    
    curated_env_name = 'AzureML-pytorch-1.7-ubuntu18.04-py37-cuda11-gpu'
    pytorch_env = Environment.get(workspace=ws, name=curated_env_name)
    print(pytorch_env.docker.base_image)

If you want to instead explicitly use one of the Azure ML base images for your job, you can follow the steps below.

Prerequisites

Create an Azure ML Environment

If your training script requires additional dependencies, create a YAML file that defines the conda dependencies. In the below example, the file is named conda_dependencies.yml:

channels:
- conda-forge
dependencies:
- python=3.6.2
- pip:
  - azureml-defaults
  - tensorflow-gpu==2.2.0

Then, create an Azure ML environment from this conda environment specification.

from azureml.core import Environment

env = Environment.from_conda_specification(name='my-env', file_path='./conda_dependencies.yml')

If your script does not require any additional dependencies and you would just like to use the base image directly, just instantiate an Environment object with the following:

from azureml.core import Environment

env = Environment(name='my-env')

Then, for both of the above cases, set the base image you would like to use. For example, here we will specify the openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04 image:

env.docker.enabled = True
env.docker.base_image = 'mcr.microsoft.com/azureml/openmpi3.1.2-cuda10.1-cudnn7-ubuntu18.04'

Configure and submit the training job

Create a ScriptRunConfig object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on.

from azureml.core import Workspace, Experiment
from azureml.core import ScriptRunConfig

ws = Workspace.from_config()
compute_target = ws.compute_targets['my-cluster-name']

src = ScriptRunConfig(source_directory='.',
                      script='train.py',
                      compute_target=compute_target,
                      environment=env)
                      
run = Experiment(workspace=ws, name='my-experiment').submit(src)
run.wait_for_completion(show_output=True)

What happens during job execution

As the job is executed, it goes through the following stages:

  • Preparing: A docker image is created according to the environment defined. The image is uploaded to the workspace's Azure Container Registry and cached for later runs. A new Docker image is built if this is the first time a combination of dependencies are used in a workspace. If not, a cached Docker image is used. Logs are also streamed to the run history and can be viewed to monitor progress. If a curated environment is specified instead, the cached image backing that curated environment will be used.

  • Scaling: The cluster attempts to scale up if the cluster requires more nodes to execute the run than are currently available.

  • Running: All scripts in the script folder are uploaded to the compute target, any datasets specified are mounted or downloaded, and the script is executed. Outputs from stdout and the ./logs folder are streamed to the run history and can be used to monitor the run.

  • Post-Processing: The ./outputs folder of the run is copied over to the run history.

Using your own custom Docker image or Dockerfile for training

If you instead want to use your own custom Docker image or Dockerfile for your training job instead of the Azure ML base images, you can refer to the documentation Train using a custom image.

Resources

For additional documentation and tutorials, see the following:

Projects using Azure Machine Learning

Visit following repositories to see the projects contributed by Azure ML users:

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

More Repositories

1

azure-quickstart-templates

Azure Quickstart Templates
Bicep
13,949
star
2

azure-sdk-for-net

This repository is for active development of the Azure SDK for .NET. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/dotnet/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-net.
C#
5,256
star
3

autorest

OpenAPI (f.k.a Swagger) Specification code generator. Supports C#, PowerShell, Go, Java, Node.js, TypeScript, Python
TypeScript
4,594
star
4

Azure-Sentinel

Cloud-native SIEM for intelligent security analytics for your entire enterprise.
Jupyter Notebook
4,559
star
5

azure-sdk-for-python

This repository is for active development of the Azure SDK for Python. For consumers of the SDK we recommend visiting our public developer docs at https://learn.microsoft.com/python/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-python.
Python
4,540
star
6

azure-powershell

Microsoft Azure PowerShell
C#
4,178
star
7

MachineLearningNotebooks

Python notebooks with ML and deep learning examples with Azure Machine Learning Python SDK | Microsoft
Jupyter Notebook
4,090
star
8

DotNetty

DotNetty project – a port of netty, event-driven asynchronous network application framework
C#
4,087
star
9

azure-cli

Azure Command-Line Interface
Python
3,930
star
10

draft-classic

A tool for developers to create cloud-native applications on Kubernetes.
Go
3,925
star
11

bicep

Bicep is a declarative language for describing and deploying Azure resources
Bicep
3,240
star
12

azure-rest-api-specs

The source for REST API specifications for Microsoft Azure.
HCL
2,634
star
13

azure-sdk-for-java

This repository is for active development of the Azure SDK for Java. For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/java/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-java.
Java
2,305
star
14

azure-sdk-for-js

This repository is for active development of the Azure SDK for JavaScript (NodeJS & Browser). For consumers of the SDK we recommend visiting our public developer docs at https://docs.microsoft.com/javascript/azure/ or our versioned developer docs at https://azure.github.io/azure-sdk-for-js.
TypeScript
2,051
star
15

AKS

Azure Kubernetes Service
HTML
1,965
star
16

azure-functions-host

The host/runtime that powers Azure Functions
C#
1,871
star
17

golua

A Lua 5.3 engine implemented in Go
Go
1,850
star
18

Azurite

A lightweight server clone of Azure Storage that simulates most of the commands supported by it with minimal dependencies
TypeScript
1,823
star
19

azureml-examples

Official community-driven Azure Machine Learning examples, tested with GitHub Actions.
Jupyter Notebook
1,733
star
20

Microsoft-Defender-for-Cloud

Welcome to the Microsoft Defender for Cloud community repository
PowerShell
1,677
star
21

Enterprise-Scale

The Azure Landing Zones (Enterprise-Scale) architecture provides prescriptive guidance coupled with Azure best practices, and it follows design principles across the critical design areas for organizations to define their Azure architecture
PowerShell
1,603
star
22

azure-sdk-for-go

This repository is for active development of the Azure SDK for Go. For consumers of the SDK we recommend visiting our public developer docs at:
Go
1,594
star
23

Stormspotter

Azure Red Team tool for graphing Azure and Azure Active Directory objects
Python
1,525
star
24

durabletask

Durable Task Framework allows users to write long running persistent workflows in C# using the async/await capabilities.
C#
1,516
star
25

azure-policy

Repository for Azure Resource Policy built-in definitions and samples
Open Policy Agent
1,485
star
26

iotedge

The IoT Edge OSS project
C#
1,421
star
27

aztfexport

A tool to bring existing Azure resources under Terraform's management
Go
1,335
star
28

review-checklists

This repo contains code and examples to operationalize Azure review checklists.
Python
1,187
star
29

azure-sdk-for-node

Azure SDK for Node.js - Documentation
JavaScript
1,187
star
30

azure-functions-core-tools

Command line tools for Azure Functions
C#
1,128
star
31

Azure-Functions

PowerShell
1,108
star
32

acs-engine

WE HAVE MOVED: Please join us at Azure/aks-engine!
Go
1,031
star
33

aks-engine

AKS Engine: legacy tool for Kubernetes on Azure (see status)
Go
1,027
star
34

data-api-builder

Data API builder provides modern REST and GraphQL endpoints to your Azure Databases and on-prem stores.
C#
912
star
35

terraform-azurerm-caf-enterprise-scale

Azure landing zones Terraform module
HCL
856
star
36

coco-framework

The Confidential Consortium Blockchain Framework is an open-source system that enables high-scale, confidential blockchain networks that meet all key enterprise requirementsβ€”providing a means to accelerate production enterprise adoption of blockchain technology.
834
star
37

azure-iot-sdks

SDKs for a variety of languages and platforms that help connect devices to Microsoft Azure IoT services
829
star
38

GPT-RAG

Sharing the learning along the way we been gathering to enable Azure OpenAI at enterprise scale in a secure manner. GPT-RAG core is a Retrieval-Augmented Generation pattern running in Azure, using Azure Cognitive Search for retrieval and Azure OpenAI large language models to power ChatGPT-style and Q&A experiences.
Bicep
822
star
39

AzurePublicDataset

Microsoft Azure Traces
Jupyter Notebook
790
star
40

azure-sql-database-samples

Azure SQL Database Samples and Reference Implementation Repository
Python
781
star
41

Azure-Network-Security

Resources for improving Customer Experience with Azure Network Security
Python
768
star
42

caf-terraform-landingzones

This solution, offered by the Open-Source community, will no longer receive contributions from Microsoft. Customers are encouraged to transition to Microsoft Azure Verified Modules for continued support and updates from Microsoft. Please note, this repository is scheduled for decommissioning and will be removed on July 1, 2025.
HCL
763
star
43

azure-search-vector-samples

A repository of code samples for Vector search capabilities in Azure AI Search.
Jupyter Notebook
750
star
44

azure-service-operator

Azure Service Operator allows you to create Azure resources using kubectl
Go
741
star
45

azure-functions-durable-extension

Durable Task Framework extension for Azure Functions
C#
714
star
46

azure-webjobs-sdk

Azure WebJobs SDK
C#
712
star
47

ResourceModules

This repository includes a CI platform for and collection of mature and curated Bicep modules. The platform supports both ARM and Bicep and can be leveraged using GitHub actions as well as Azure DevOps pipelines.
Bicep
710
star
48

ALZ-Bicep

This repository contains the Azure Landing Zones (ALZ) Bicep modules that help deliver and deploy the Azure Landing Zone conceptual architecture in a modular approach. https://aka.ms/alz/docs
Bicep
694
star
49

azure-sdk-for-rust

This repository is for active development of the *unofficial* Azure SDK for Rust. This repository is *not* supported by the Azure SDK team.
Rust
692
star
50

terraform

Source code for the Azure Marketplace Terraform development VM package.
HCL
690
star
51

azure-api-management-devops-resource-kit

Azure API Management DevOps Resource Kit
C#
684
star
52

application-gateway-kubernetes-ingress

This is an ingress controller that can be run on Azure Kubernetes Service (AKS) to allow an Azure Application Gateway to act as the ingress for an AKS cluster.
Go
668
star
53

CCOInsights

Welcome to the Continuous Cloud Optimization Power BI Dashboard GitHub Project. In this repository you will find all the guidance and files needed to deploy the Dashboard in your environment to take benefit of a single pane of glass to get insights about your Azure resources and services.
Mathematica
668
star
54

SimuLand

Understand adversary tradecraft and improve detection strategies
PowerShell
664
star
55

DeepLearningForTimeSeriesForecasting

A tutorial demonstrating how to implement deep learning models for time series forecasting
Jupyter Notebook
655
star
56

azure-xplat-cli

For ARM-based service please go to CLI 2.0.
JavaScript
651
star
57

azure-cosmos-dotnet-v3

.NET SDK for Azure Cosmos DB for the core SQL API
C#
637
star
58

node-sqlserver

C++
625
star
59

azure-devops-cli-extension

Azure DevOps Extension for Azure CLI
Python
621
star
60

azure-storage-fuse

A virtual file system adapter for Azure Blob storage
Go
613
star
61

Community-Policy

This repo is for Microsoft Azure customers and Microsoft teams to collaborate in making custom policies.
Open Policy Agent
612
star
62

azure-resource-manager-schemas

Schemas used to author and validate Resource Manager Templates. These schemas power the intellisense and syntax completion in our ARM Tools VSCode extension, as well as the Export Template API
TypeScript
610
star
63

azure-storage-azcopy

The new Azure Storage data transfer utility - AzCopy v10
Go
607
star
64

counterfit

a CLI that provides a generic automation layer for assessing the security of ML models
Python
599
star
65

azure-iot-sdk-c

A C99 SDK for connecting devices to Microsoft Azure IoT services
C
587
star
66

azure-cosmos-dotnet-v2

Contains samples and utilities relating to the Azure Cosmos DB .NET SDK
577
star
67

azure-service-bus

☁️ Azure Service Bus service issue tracking and samples
571
star
68

aad-pod-identity

[DEPRECATED] Assign Azure Active Directory Identities to Kubernetes applications.
Go
569
star
69

Azure-Sentinel-Notebooks

Interactive Azure Sentinel Notebooks provides security insights and actions to investigate anomalies and hunt for malicious behaviors.
Jupyter Notebook
541
star
70

draft

A day 0 tool for getting your app on k8s fast
Go
541
star
71

AzureStack-QuickStart-Templates

Quick start ARM templates that deploy on Microsoft Azure Stack
PowerShell
540
star
72

azure-openai-samples

Azure OpenAI Samples is a collection of code samples illustrating how to use Azure Open AI in creating AI solution for various use cases across industries. This repository is mained by a community of volunters. We welcomed your contributions.
Jupyter Notebook
540
star
73

WALinuxAgent

Microsoft Azure Linux Guest Agent
Python
538
star
74

AI-in-a-Box

AI-in-a-Box leverages the expertise of Microsoft across the globe to develop and provide AI and ML solutions to the technical community. Our intent is to present a curated collection of solution accelerators that can help engineers establish their AI/ML environments and solutions rapidly and with minimal friction.
Jupyter Notebook
529
star
75

iot-edge-v1

Azure IoT Edge
C
524
star
76

Industrial-IoT

Azure Industrial IoT Platform
C#
521
star
77

MS-AMP

Microsoft Automatic Mixed Precision Library
Python
516
star
78

Mission-Critical

This repository provides a design methodology and approach to building highly-reliable applications on Microsoft Azure for mission-critical workloads.
510
star
79

static-web-apps-cli

Azure Static Web Apps CLI ✨
TypeScript
509
star
80

mlops-v2

Azure MLOps (v2) solution accelerators. Enterprise ready templates to deploy your machine learning models on the Azure Platform.
Shell
505
star
81

azure-docs-powershell-samples

Azure Powershell code samples, often used in docs.microsoft.com/Azure developer documentation
PowerShell
503
star
82

azqr

Azure Quick Review
Go
503
star
83

bicep-registry-modules

Bicep registry modules
Bicep
501
star
84

azure-storage-node

Microsoft Azure Storage SDK for Node.js
JavaScript
495
star
85

api-management-developer-portal

Developer portal provided by the Azure API Management service.
TypeScript
487
star
86

kubelogin

A Kubernetes credential (exec) plugin implementing azure authentication
Go
486
star
87

Azure-TDSP-ProjectTemplate

TDSP: Data science project template repository with standardized directory structure and document templates to support efficient project execution and collaboration.
R
483
star
88

azure-sdk

This is the Azure SDK parent repository and mostly contains documentation around guidelines and policies as well as the releases for the various languages supported by the Azure SDK.
PowerShell
478
star
89

azure-mobile-services

Mobile Services is deprecated - Use Mobile Apps instead
HTML
472
star
90

azure-devtestlab

Azure DevTestLab artifacts, scripts and samples
PowerShell
458
star
91

azure-iot-sdk-csharp

A C# SDK for connecting devices to Microsoft Azure IoT services
C#
455
star
92

actions-workflow-samples

Help developers to easily get started with GitHub Action workflows to deploy to Azure
Pug
450
star
93

azure-devops-utils

Azure DevOps Utilities
Shell
450
star
94

AzureDatabricksBestPractices

Version 1 of Technical Best Practices of Azure Databricks based on real world Customer and Technical SME inputs
446
star
95

app-service-announcements

Subscribe to this repo to be notified about major changes in App Service
446
star
96

azure-storage-net

Microsoft Azure Storage Libraries for .NET
C#
445
star
97

Azure-DataFactory

C#
444
star
98

arm-ttk

Azure Resource Manager Template Toolkit
PowerShell
442
star
99

RDS-Templates

ARM Templates for Remote Desktop Services deployments
PowerShell
427
star
100

Copilot-For-Security

Microsoft Copilot for Security is a generative AI-powered security solution that helps increase the efficiency and capabilities of defenders to improve security outcomes at machine speed and scale, while remaining compliant to responsible AI principles
PowerShell
426
star