Cloud Infrastructure Project
Objective
Design and deploy a complete cloud-native infrastructure on Azure integrating all cloud skills from the program: an Azure Kubernetes Service (AKS) cluster running a containerized web application, a CI/CD pipeline using GitHub Actions that builds, tests, and deploys to AKS on every push to main, Azure Container Registry (ACR) for image storage, Azure SQL Database as the managed backend, Key Vault for secrets management, Azure Front Door for global load balancing and WAF, comprehensive monitoring with Azure Monitor and Container Insights, and Microsoft Defender for Cloud for security posture management. All infrastructure is deployed via Bicep templates.
Tools & Technologies
Azure Kubernetes Service (AKS)— managed Kubernetes clusterAzure Container Registry (ACR)— private container image registryAzure SQL Database— managed relational database backendAzure Key Vault— secrets, certificates, and key managementAzure Front Door + WAF— global CDN and web application firewallGitHub Actions— CI/CD pipeline automationBicep— Azure-native IaC DSL (ARM template abstraction)kubectl— Kubernetes cluster managementHelm— Kubernetes package manager for NGINX IngressMicrosoft Defender for Cloud— cloud security posture management
Architecture Overview
Step-by-Step Process
Wrote a Bicep template to deploy the AKS cluster with system and user node pools, attach ACR with the acrPull role, enable Workload Identity for pod-level AAD authentication, and enable Container Insights for monitoring.
// aks.bicep
param clusterName string = 'aks-capstone'
param location string = resourceGroup().location
param nodeVmSize string = 'Standard_D2s_v3'
param acrName string = 'acrcapstone${uniqueString(resourceGroup().id)}'
resource acr 'Microsoft.ContainerRegistry/registries@2023-01-01-preview' = {
name: acrName
location: location
sku: { name: 'Standard' }
properties: { adminUserEnabled: false }
}
resource aks 'Microsoft.ContainerService/managedClusters@2023-07-01' = {
name: clusterName
location: location
identity: { type: 'SystemAssigned' }
properties: {
kubernetesVersion: '1.28'
enableRBAC: true
dnsPrefix: clusterName
agentPoolProfiles: [
{
name: 'system'
count: 3
vmSize: nodeVmSize
mode: 'System'
osDiskSizeGB: 50
}
{
name: 'user'
count: 2
vmSize: nodeVmSize
mode: 'User'
}
]
addonProfiles: {
omsagent: {
enabled: true
config: { logAnalyticsWorkspaceResourceID: logWorkspace.id }
}
}
oidcIssuerProfile: { enabled: true }
securityProfile: { workloadIdentity: { enabled: true } }
}
}
// Grant AKS system identity acrPull on ACR
resource acrRoleAssignment 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
name: guid(acr.id, aks.identity.principalId, 'acrPull')
scope: acr
properties: {
roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '7f951dda-4ed3-4680-a7ca-43fe172d538d')
principalId: aks.identity.principalId
principalType: 'ServicePrincipal'
}
}
# Deploy Bicep template
az deployment group create \
--resource-group rg-capstone-cloud \
--template-file aks.bicep \
--parameters nodeVmSize=Standard_D2s_v3
# Get credentials
az aks get-credentials \
--resource-group rg-capstone-cloud \
--name aks-capstone
kubectl get nodes
Installed the Secrets Store CSI Driver with Azure Key Vault provider, created a managed identity for the pods, and configured a SecretProviderClass to mount database credentials as files inside application pods.
# Install Secrets Store CSI Driver via Helm
helm repo add secrets-store-csi-driver https://kubernetes-sigs.github.io/secrets-store-csi-driver/charts
helm install csi-secrets-store secrets-store-csi-driver/secrets-store-csi-driver \
--namespace kube-system \
--set syncSecret.enabled=true
# Install Azure Key Vault provider
helm repo add csi-secrets-store-provider-azure \
https://azure.github.io/secrets-store-csi-driver-provider-azure/charts
helm install akv-provider csi-secrets-store-provider-azure/csi-secrets-store-provider-azure \
--namespace kube-system
## SecretProviderClass manifest
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: azure-kvs
namespace: app
spec:
provider: azure
parameters:
usePodIdentity: "false"
useVMManagedIdentity: "false"
clientID: "${WORKLOAD_IDENTITY_CLIENT_ID}"
keyvaultName: kv-capstone-lab
tenantId: "${AZURE_TENANT_ID}"
objects: |
array:
- |
objectName: db-connection-string
objectType: secret
objectVersion: ""
- |
objectName: app-secret-key
objectType: secret
secretObjects:
- secretName: app-secrets
type: Opaque
data:
- objectName: db-connection-string
key: DB_CONNECTION_STRING
Created a GitHub Actions workflow that triggers on push to main: builds and pushes the Docker image to ACR, runs security scanning with Trivy, then deploys to AKS using a rolling update strategy.
# .github/workflows/deploy.yml
name: Build and Deploy to AKS
on:
push:
branches: [main]
env:
REGISTRY: acrcapstoneLAB.azurecr.io
IMAGE_NAME: capstone-app
AKS_CLUSTER: aks-capstone
RESOURCE_GROUP: rg-capstone-cloud
jobs:
build-and-push:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Azure Login
uses: azure/login@v1
with:
creds: ${{ secrets.AZURE_CREDENTIALS }}
- name: Login to ACR
run: az acr login --name acrcapstoneLAB
- name: Build and push Docker image
run: |
docker build -t $REGISTRY/$IMAGE_NAME:${{ github.sha }} .
docker push $REGISTRY/$IMAGE_NAME:${{ github.sha }}
- name: Run Trivy security scan
uses: aquasecurity/trivy-action@master
with:
image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
format: sarif
output: trivy-results.sarif
severity: CRITICAL,HIGH
- name: Upload Trivy scan to GitHub Security tab
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: trivy-results.sarif
deploy:
needs: build-and-push
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: azure/login@v1
with: { creds: '${{ secrets.AZURE_CREDENTIALS }}' }
- uses: azure/aks-set-context@v3
with:
resource-group: ${{ env.RESOURCE_GROUP }}
cluster-name: ${{ env.AKS_CLUSTER }}
- name: Deploy to AKS
run: |
kubectl set image deployment/capstone-app \
app=$REGISTRY/$IMAGE_NAME:${{ github.sha }} \
--namespace app
kubectl rollout status deployment/capstone-app \
--namespace app --timeout=5m
Deployed NGINX Ingress Controller via Helm, created Ingress resources for the application, and configured Azure Front Door as the global entry point with a WAF policy in Prevention mode.
# Deploy NGINX Ingress Controller
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx --create-namespace \
--set controller.service.type=LoadBalancer \
--set controller.service.annotations."service\.beta\.kubernetes\.io/azure-load-balancer-health-probe-request-path"=/healthz
## Kubernetes Ingress manifest
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: capstone-ingress
namespace: app
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "false"
spec:
ingressClassName: nginx
rules:
- host: app.capstone.lab
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: capstone-app-svc
port: { number: 80 }
# Azure Front Door WAF Policy (Prevention mode)
az network front-door waf-policy create \
--resource-group rg-capstone-cloud \
--name wafCapstoneLab \
--mode Prevention \
--redirect-url https://app.capstone.lab/blocked
# Add managed ruleset (OWASP 3.2)
az network front-door waf-policy managed-rules add \
--policy-name wafCapstoneLab \
--resource-group rg-capstone-cloud \
--type DefaultRuleSet \
--version 1.0
Enabled Container Insights for pod-level metrics, created alert rules for pod restart count and CPU/memory limits, and reviewed Defender for Cloud recommendations to achieve a security score above 80%.
# Verify Container Insights is collecting data
kubectl get pods -n kube-system | grep omsagent
# KQL query — container restarts in last 24h
# (Run in Log Analytics workspace)
# KubePodInventory
# | where TimeGenerated > ago(24h)
# | where ContainerLastStatus == "Error"
# | summarize RestartCount=sum(ContainerRestartCount) by Name, Namespace
# | order by RestartCount desc
# Create alert for pod OOMKill events
az monitor metrics alert create \
--resource-group rg-capstone-cloud \
--name alert-aks-oom \
--description "Pod OOM killed" \
--scopes $(az aks show --resource-group rg-capstone-cloud \
--name aks-capstone --query id -o tsv) \
--condition "count kube_pod_container_status_last_terminated_reason == 1" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 1
# Defender for Cloud — review recommendations
az security assessment list \
--query "[?properties.status.code=='Unhealthy']
.{Name:displayName, Severity:properties.metadata.severity}" \
--output table
# Final deployment test
kubectl get pods -n app
kubectl get ingress -n app
INGRESS_IP=$(kubectl get svc -n ingress-nginx ingress-nginx-controller \
-o jsonpath='{.status.loadBalancer.ingress[0].ip}')
curl -H "Host: app.capstone.lab" http://$INGRESS_IP/health
Complete Workflow
Challenges & Solutions
- GitHub Actions failing to push to ACR — The service principal credentials in GitHub Secrets were missing the
acrPushrole. Added the role assignment:az role assignment create --role AcrPush --assignee $SP_ID --scope $ACR_ID. - Key Vault CSI Driver pods failing to mount secrets — The Workload Identity federated credential was configured for the wrong namespace. The service account and its federation must be in the same namespace as the pods consuming the secrets.
- AKS pods not pulling images from ACR — The AKS system-assigned identity needed the
acrPullrole on ACR. This is typically done at cluster creation time in Bicep; missing it causesImagePullBackOfferrors on all pods. - Azure Front Door WAF blocking legitimate API requests — The managed OWASP ruleset flagged API payloads containing JSON with certain patterns. Tuned the WAF by creating exclusions for the specific fields and request types that triggered false positives, then switched to Prevention mode.
Key Takeaways
- Workload Identity in AKS provides pod-level AAD identity without storing credentials — it is the modern, secure way to authenticate pods to Azure services, replacing pod-managed identities and service principal credential files.
- CI/CD pipelines should include container image security scanning (Trivy/Snyk) as a blocking step before deployment — pushing vulnerable images to production is worse than a failed deployment.
- Azure Front Door WAF requires careful tuning in Detection mode before switching to Prevention — managed rulesets generate false positives on legitimate API traffic that must be exclusion-listed first.
- The combination of Bicep + GitHub Actions + Azure represents a complete cloud-native DevOps pipeline — infrastructure and application deployments are version-controlled, automated, and reproducible from a single repository.