• Stars
    star
    106
  • Rank 325,871 (Top 7 %)
  • Language Bicep
  • License
    MIT License
  • Created over 3 years ago
  • Updated 11 months ago

Reviews

There are no reviews yet. Be the first to send feedback to the community and the maintainers!

Repository Details

This is the Azure Kubernetes Service (AKS) baseline cluster for regulated workloads reference implementation as produced by the Microsoft Azure Architecture Center.

Azure Kubernetes Service (AKS) Baseline Cluster for Regulated Workloads

This reference implementation demonstrates the recommended starting (baseline) infrastructure architecture for an AKS cluster that is under regulatory compliance requirements (such as PCI). This implementation builds directly upon the AKS Baseline Cluster reference implementation and adds to it additional implementation points that are more commonly seen in regulated environments vs typical "public cloud" consumption patterns.

🎓 Foundational Understanding
If you haven't familiarized yourself with the general-purpose AKS baseline cluster architecture, you should start there before continuing here. This architecture is constructed from the AKS baseline, which is the foundation for this body of work. This reference implementation avoids rearticulating points that are already addressed in the AKS baseline cluster.

Compliance

⚠️ These artifacts have not been certified in any official capacity; regulatory compliance is a shared responsibility between you and your hosting provider. This implementation is designed to aide you on your journey to achieving your compliance, but by itself does not ensure any level of compliance. To understand Azure compliance and shared responsibility models, visit the Microsoft Trust Center.

Azure and AKS are well positioned to give you the tools and allow you to build processes necessary to help you achieve a compliant hosting infrastructure. The implementation details can be complex, as is the overall process of compliance. We walk through the deployment here in a rather verbose method to help you understand each component of this architecture, teaching you about each layer and providing you with the knowledge necessary to apply it to your unique compliance scoped workload.

Even if you are not in a regulated environment, this infrastructure demonstrates an AKS cluster with a more heightened security posture over the general-purpose cluster presented in the AKS baseline. You might find it useful to take select concepts from here and apply it to your non-regulated workloads (at the tradeoff of added complexity and hosting costs).

Azure Architecture Center guidance

This project has a companion set of articles that describe challenges, design patterns, and best practices for a AKS cluster designed to host workloads that fall in PCI-DSS 3.2.1 scope. You can find this article on the Azure Architecture Center at Azure Kubernetes Service (AKS) regulated cluster for PCI-DSS 3.2.1. If you haven't reviewed it, we suggest you read it; as it will give added context to the considerations applied in this implementation. This repo primarly focuses on deployment concerns, while compliance concerns are mostly addressed in the linked article series above.

Architecture

This reference implementation is infrastructure focused, more so than workload. It concentrates on dealing with the AKS cluster itself. This implementation will touch on workload concerns, but does not contain end-to-end guidance on in-scope workload architecture, container security, or isolation. There are some good practices demonstrated and others talked about, but it is not exhaustive.

The implementation presented here is the minimum starting point for most AKS clusters falling into a compliance scope. This implementation integrates with Azure services that will deliver observability, provide a network topology that will support public traffic isolation, and keep the in-cluster traffic secure as well. This architecture should be considered your architectural starting point for preproduction and production stages of clusters hosting regulated workloads.

The material here is relatively dense. We strongly encourage you to start by reading the Azure Architecture Center guidance linked above and then dedicate at least four hours to walk through these instructions, with a mind to learning. You will not find any "one click" deployment here. However, once you've understood the components involved and identified the shared responsibilities between your team and your greater IT organization, it is encouraged that you build auditable deployment processes around your final infrastructure.

Finally, this implementation uses a small, custom application as an example workload. This workload is minimally interesting, as it is here exclusively to help you experience the infrastructure and illustrate network and security controls in place. The workload, and its deployment, does not represent any sort of "best practices" for regulated workloads.

Core architecture components

Azure platform

  • AKS v1.27
  • Azure Virtual Networks (hub-spoke)
    • Azure Firewall managed egress
    • Hub-proxied DNS
    • BYO Private DNS Zone for AKS with no public DNS representation
  • Azure Application Gateway (WAF - OWASP 3.2)
  • AKS-managed Internal Load Balancers
  • Azure Bastion for maintenance access
  • Private Link enabled Key Vault and Azure Container Registry
  • Private Azure Container Registry Task Runners

In-cluster open-source software components

Network topology

Network diagram depicting a hub-spoke network with two peered VNets. The cluster spoke contains subnets for a jump box, the cluster, and other related services.

Workload HTTPS request flow

Network flow showing Internet traffic passing through Azure Application Gateway then into the Ingress Controller and then through the workload pods. All connections are TLS.

Deploy the reference implementation

A deployment of AKS-hosted workloads typically experiences a separation of duties and lifecycle management in the area of identity & security group management, the host network, the cluster infrastructure, and finally the workload itself. This reference implementation will have you be working across these various roles. Regulated environments require strong, documented separation of concerns; but ultimately you'll decide where each boundary should be.

Also, remember the primary purpose of this body of work is to illustrate the topology and decisions made in this cluster. A guided, "step-by-step" flow will help you learn the pieces of the solution and give you insight into the relationship between them. A bedrock understanding of your infrastructure, its supply chain, and its "Day-2" workflows are critical for compliance concerns. If you cannot explain each decision point and rationalization, audit conversations can quickly turn uncomfortable.

Ultimately, lifecycle/SDLC management of your cluster, its dependencies, and your workloads will depend on your specific situation. You'll need to account for team roles, centralized and decentralized IT roles, organizational standards, industry expectations, and specific mandates by your compliance auditor.

Start this learning journey in the Prepare the subscription section. If you follow this through the end, you'll have our recommended baseline cluster for regulated industries installed, with a sample workload running for you to reference in your own Azure subscription.

1. 🚀 Prepare the subscription

There are considerations that must be addressed before you start deploying your cluster. Do I have enough permissions in my subscription and Microsoft Entra tenant to do a deployment of this size? How much of this will be handled by my team directly vs having another team be responsible?

2. Build regional networking hub

This reference implementation is built on a traditional hub-spoke model, typical found in your organization's Connectivity subscription. The hub will contain Azure Firewall, DNS forwarder, and Azure Bastion services.

3. Plan Kubernetes API server access

Because the AKS server is a "private cluster" the control plane is not exposed to the internet. Management now can only be performed with network line of sight to the private endpoint exposed by AKS. In this case, you'll build an Azure Bastion-fronted jump box.

4. Deploy the cluster

This is the heart of the guidance in this reference implementation; paired with prior network topology guidance. Here you will deploy the Azure resources for your cluster and the adjacent services such as Azure Application Gateway WAF, Azure Monitor, Azure Container Registry, and Azure Key Vault. This is also where you will validate the cluster is bootstrapped.

5. Deploy your workload

A simple workload made up of four interconnected services is manually deployed across two namespaces to illustrate concepts such as nodepool placement, zero-trust network policies, and external infrastructure protections offered by the applied NSGs and Azure Firewall rules.

6. 🏁 Validation

Now that the cluster and the sample workload is deployed; now it's time to look at how the cluster is functioning.

7. 🧹 Clean up resources

Most of the Azure resources deployed in the prior steps will have ongoing billing impact unless removed.

Separation of duties

All workloads that find themselves in compliance scope usually require a documented separation of duties/concern implementation plan. Kubernetes poses an interesting challenge in that it involves a significant number of roles typically found across an IT organization. Networking, identity, SecOps, governance, workload teams, cluster operations, deployment pipelines, any many more. If you're looking for a starting point on how you might consider breaking up the roles that are adjacent to the AKS cluster, consider reviewing our Microsoft Entra role guide shipped as part of this reference implementation.

📓 For more information, see Azure Architecture Center guidance for PCI-DSS 3.2.1 Requirement 7, 8, and 9 in AKS.

Is that all, what about ... !?

Yes, there are concerns that do extend beyond what this implementation could reasonably demonstrate for a general audience. This reference implementation strived to be accessible for most people without putting undo burdens on the subscription brought to this walkthrough. This means SKU choices with relatively large default quotas, not using features that have very limited regional availability, not asking for learners to be overwhelmed with "Bring your own encryption key" options for services, and similar. All in hopes that more people can complete this walkthrough without disruption or excessive coordination with subscription or management group owners.

For your implementation, take this starting point and add on additional security measures talked about throughout the walkthrough and the Azure Architecture Center guidance that were not directly implemented. For example, enable JIT and Conditional Access Policies, use Encryption-at-Host features if applicable to your workload, and so on.

For a list of additional considerations for your architecture, please see our Additional Considerations document.

Cost

This reference implementation runs idle around $95 (US Dollars) per day within the first 30 days; and you can expect it to increase over time as some Microsoft Defender for Cloud tooling has free-trial period and logs will continue to accrue. The largest contributors to the starting cost are Azure Firewall, the AKS nodepools (virtual machine scale sets), and Log Analytics. While some costs are usually cluster operator costs, such as nodepool VMSS, Log Analytics, incremental Microsoft Defender for Cloud costs; others will likely be amortized across multiple business units or applications, such as Azure Firewall.

Some customers will opt to amortize cluster costs across workloads by hosting a multitenant cluster within their organization, maximizing density with workload diversity. Doing so with regulated workloads is not advised. Regulated environments will generally prioritize compliance and security (isolation) over cost (diverse density).

Final thoughts

Kubernetes is a very flexible platform, giving infrastructure and application operators many choices to achieve their business and technology objectives. At points along your journey, you will need to consider when to take dependencies on Azure platform features, CNCF OSS solutions, ISV solutions, support channels, and what operational processes need to be in place. We encourage this reference implementation to be the place you start architectural conversations within your own team; adapting to your specific requirements, and ultimately delivering a solution that delights your customers and your auditors.

Related documentation

Contributions

Please see our contributor guide.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

With ❤️ from Microsoft Patterns & Practices, Azure Architecture Center.

More Repositories

1

microservices-reference-implementation

A reference implementation demonstrating microservices architecture and best practices for Microsoft Azure
Shell
822
star
2

cloud-design-patterns

Prescriptive Architecture Guidance for Cloud Applications
C#
726
star
3

performance-optimization

Guidance on how to observe, measure, and correct common issues in a cloud-based system.
C#
688
star
4

reference-architectures

templates and scripts for deploying Azure Reference Architectures
C#
640
star
5

aks-baseline

This is the Azure Kubernetes Service (AKS) Baseline Cluster reference implementation as produced by the Microsoft Azure Architecture Center.
Bicep
615
star
6

template-building-blocks

A tool for deploying Azure infrastructure based on proven practices. Azure building blocks take advantage of the Azure CLI and Azure Resource Manager templates to provision collections of resources as logical units with production-ready settings.
JavaScript
328
star
7

spark-monitoring

Monitoring Azure Databricks jobs
Scala
212
star
8

AzureNamingTool

The Azure Naming Tool is a .NET 8 Blazor application, with a RESTful API. The UI consists of several pages to allow the configuration and generation of Azure Resource names. The API provides a programmatic interface for the functionality.
HTML
183
star
9

serverless-reference-implementation

Serverless reference implementation guidance
C#
167
star
10

aks-fabrikam-dronedelivery

AKS Fabrikam Drone Delivery ❤️ AKS baseline
Mustache
121
star
11

samples

Bicep
120
star
12

azure-databricks-streaming-analytics

Stream processing with Azure Databricks
Scala
105
star
13

transactional-outbox-pattern

An implementation of the Transactional Outbox Pattern with Cosmos DB
C#
58
star
14

aks-baseline-multi-region

This is the Azure Kubernetes Service (AKS) baseline for multi-region reference implementation as produced by the Microsoft Azure Architecture Center.
Shell
51
star
15

identity-reference-architectures

Reference architectures for extending your Active Directory environment to Azure
PowerShell
48
star
16

solution-architectures

This content is referenced by Azure Architecture Center articles.
Shell
45
star
17

iot-guidance

Code samples that show best practices for building IoT solutions.
C#
32
star
18

cloud-services-to-service-fabric

Migrate a Cloud Services application to Service Fabric
C#
29
star
19

container-apps-fabrikam-dronedelivery

Bicep
27
star
20

microservices-reference-implementation-servicefabric

Microservices reference implementation deployed to Azure Service Fabric
C#
20
star
21

vnet-integrated-serverless-microservices

TypeScript
20
star
22

azure-stream-analytics-data-pipeline

C#
16
star
23

gridwich

Gridwich - Media Processing System
C#
14
star
24

go-batcher

Batching and rate-limiting for go without any opinion of the datastore.
Go
12
star
25

interruptible-workload-on-spot

Interruptible workloads on Azure Spot VM instances reference implementation as produced by the Microsoft Azure Architecture Center.
Bicep
11
star
26

serverless-automation

Scenarios around automating tasks using Azure serverless technologies
PowerShell
11
star
27

fabrikam-dronedelivery-workload

This repository contains source files for services that are shared by the microservices and fabrikam-drone delivery reference implementations.
C#
11
star
28

template-examples

Extend Azure Resource Manager template functionality.
10
star
29

app-service-environments-ILB-deployments

Bicep
9
star
30

aks-jumpbox-imagebuilder

An example of using Azure Image Builder to generate a jump box image to be used for ops access on network-restricted AKS clusters.
Bicep
9
star
31

cognitive-services-reference-implementation

This reference implementation builds the first phase of a call center analytics pipeline using Azure Cognitive Speech API Service, Azure Function, Blob storage and an app service.
C#
8
star
32

letsencrypt-pip-cert-generation

A method one can use to generate a Let’s Encrypt® certificate for a Azure Public IP domain prefix.
Shell
6
star
33

geode-pattern-accelerator

The accelerator is designed to help developers with Azure Functions based APIs that utilize Cosmos DB as a data store to implement the geode pattern by deploying their API to geodes in distributed Azure regions.
HCL
6
star
34

iaas-landing-zone-baseline

This is the IaaS baseline for Azure landing zones reference implementation as produced by the Azure Architecture Center.
Bicep
5
star
35

iaas-baseline

Infrastructure as a Service (IaaS) baseline reference implementation
Bicep
4
star
36

multi-stage-azure-pipeline-automation-app

The project demonstrates how to automate azure pipelines to deploy a dotnet-angular project to azure app service
TypeScript
4
star
37

multi-stage-azure-pipeline-automation

The project uses Azure Logic App to Automate Azure DevOps Multistage Pipelines
PowerShell
3
star
38

aci-auto-healing

Using serverless automation to update backend pools on Azure Application Gateway in response to changes in Azure Container Instances.
Bicep
1
star
39

intern-js-pipeline

Nightly Build Testing with Playwright - automated build testing and monitoring for technical documentation
JavaScript
1
star
40

hilojs

JavaScript
1
star