Milestone 2: Full Environment
Objective
Expand the Milestone 1 foundation into a production-ready environment by adding hardened NSG rule sets, custom route tables for forced-tunneling traffic inspection, Azure Monitor alerts with action groups, cost management budgets, Azure Policy assignments for compliance enforcement, and a complete VNet peering setup for a secondary disaster recovery VNet. All configuration changes are tracked and deployable via Azure CLI scripts committed to version control.
Tools & Technologies
Azure NSG Flow Logs— traffic visibility for all subnet flowsAzure Route Tables— custom routing for traffic inspectionAzure Monitor Alerts— metric and log-based alertingAzure Action Groups— alert notification channelsAzure Cost Management + Budgets— spend visibility and alertsAzure Policy— governance and compliance enforcementVNet Peering— cross-VNet private routingAzure Network Watcher— flow logs and topology visualizationLog Analytics KQL— query language for log analysis
Architecture Overview
Step-by-Step Process
Enabled NSG Flow Logs version 2 for all subnets to capture all allowed and denied traffic flows, storing to a storage account with retention and forwarding to Log Analytics for query.
# Enable Network Watcher in region
az network watcher configure \
--resource-group NetworkWatcherRG \
--locations canadacentral \
--enabled true
# Create storage account for flow logs
az storage account create \
--name "stflowlogs$(openssl rand -hex 4)" \
--resource-group $RG \
--location $LOCATION \
--sku Standard_LRS \
--kind StorageV2
STORAGE_ID=$(az storage account show --name stflowlogsXXXX \
--resource-group $RG --query id -o tsv)
LAW_ID=$(az monitor log-analytics workspace show \
--resource-group $RG --workspace-name law-lab \
--query id -o tsv)
# Enable flow logs for web NSG (version 2 includes traffic analytics)
az network watcher flow-log create \
--resource-group NetworkWatcherRG \
--name flowlog-nsg-web \
--nsg /subscriptions/$SUB_ID/resourceGroups/$RG/providers/Microsoft.Network/networkSecurityGroups/nsg-web \
--storage-account $STORAGE_ID \
--workspace $LAW_ID \
--traffic-analytics true \
--interval 10 \
--retention 30 \
--format JSON \
--log-version 2
Created a custom route table that routes all outbound internet traffic from web and app subnets through a network virtual appliance (firewall VM) for inspection, overriding the default internet route.
# Create route table
az network route-table create \
--resource-group $RG \
--name rt-webapp-egress \
--location $LOCATION \
--disable-bgp-route-propagation false
# Route all internet traffic to NVA firewall VM
az network route-table route create \
--resource-group $RG \
--route-table-name rt-webapp-egress \
--name default-to-nva \
--address-prefix 0.0.0.0/0 \
--next-hop-type VirtualAppliance \
--next-hop-ip-address 10.10.4.4
# Also route RFC1918 inter-subnet through NVA for inspection
az network route-table route create \
--resource-group $RG \
--route-table-name rt-webapp-egress \
--name app-subnet-via-nva \
--address-prefix 10.10.2.0/24 \
--next-hop-type VirtualAppliance \
--next-hop-ip-address 10.10.4.4
# Associate with web and app subnets
az network vnet subnet update \
--resource-group $RG --vnet-name vnet-lab \
--name snet-web \
--route-table rt-webapp-egress
az network vnet subnet update \
--resource-group $RG --vnet-name vnet-lab \
--name snet-app \
--route-table rt-webapp-egress
Created an Action Group for notifications and metric alert rules for CPU, available memory, and disk usage thresholds on the web VMs.
# Create Action Group
az monitor action-group create \
--resource-group $RG \
--name ag-lab-alerts \
--short-name labAlerts \
--action email lab-admin [email protected]
# CPU alert — trigger if > 85% for 5 minutes
VM_ID=$(az vm show --resource-group $RG --name web-vm --query id -o tsv)
AG_ID=$(az monitor action-group show --resource-group $RG \
--name ag-lab-alerts --query id -o tsv)
az monitor metrics alert create \
--resource-group $RG \
--name alert-web-vm-cpu \
--description "Web VM CPU above 85%" \
--scopes $VM_ID \
--condition "avg Percentage CPU > 85" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--action $AG_ID
# Available memory alert (< 512 MB)
az monitor metrics alert create \
--resource-group $RG \
--name alert-web-vm-memory \
--description "Web VM available memory below 512MB" \
--scopes $VM_ID \
--condition "avg Available Memory Bytes < 536870912" \
--window-size 5m \
--evaluation-frequency 1m \
--severity 2 \
--action $AG_ID
Set up a monthly cost budget with email alerts at 80% and 100% thresholds, and applied Azure Policy to restrict VM sizes to approved SKUs and enforce required resource tags.
# Create monthly cost budget
az consumption budget create \
--resource-group $RG \
--budget-name budget-lab-monthly \
--amount 50 \
--time-grain Monthly \
--start-date "$(date +%Y-%m-01)" \
--end-date "2026-12-31" \
--notifications '[
{"enabled":true,"operator":"GreaterThan","threshold":80,
"contactEmails":["[email protected]"],"contactRoles":["Owner"]},
{"enabled":true,"operator":"GreaterThan","threshold":100,
"contactEmails":["[email protected]"],"contactRoles":["Owner"]}
]'
# Assign "Allowed virtual machine SKUs" policy
az policy assignment create \
--name "allowed-vm-skus" \
--display-name "Allowed VM SKUs" \
--policy "/providers/Microsoft.Authorization/policyDefinitions/cccc23c7-8427-4f53-ad12-b6a63eb452b3" \
--scope "/subscriptions/$SUB_ID/resourceGroups/$RG" \
--params '{"listOfAllowedSKUs": {"value": ["Standard_B1s","Standard_B2s","Standard_B4ms"]}}'
# Enforce required tags policy
az policy assignment create \
--name "require-environment-tag" \
--display-name "Require Environment Tag" \
--policy "/providers/Microsoft.Authorization/policyDefinitions/871b6d14-10aa-478d-b590-94f262ecfa99" \
--scope "/subscriptions/$SUB_ID/resourceGroups/$RG" \
--params '{"tagName":{"value":"Environment"}}'
Created a secondary DR VNet in East US 2 and established bidirectional peering with the primary VNet. Ran KQL queries in Log Analytics to verify flow logs were populating correctly.
# Create DR VNet in East US 2
az network vnet create \
--resource-group $RG \
--name vnet-dr \
--address-prefix 10.20.0.0/16 \
--location eastus2
# Establish bidirectional peering
PRIMARY_ID=$(az network vnet show --resource-group $RG \
--name vnet-lab --query id -o tsv)
DR_ID=$(az network vnet show --resource-group $RG \
--name vnet-dr --query id -o tsv)
az network vnet peering create \
--name peer-primary-to-dr \
--resource-group $RG \
--vnet-name vnet-lab \
--remote-vnet $DR_ID \
--allow-vnet-access
az network vnet peering create \
--name peer-dr-to-primary \
--resource-group $RG \
--vnet-name vnet-dr \
--remote-vnet $PRIMARY_ID \
--allow-vnet-access
# KQL query — check flow logs in Log Analytics
# Run in Log Analytics workspace query editor:
# AzureNetworkAnalytics_CL
# | where SubType_s == "FlowLog"
# | where TimeGenerated > ago(1h)
# | summarize FlowCount = count() by NSGName_s, FlowDirection_s
# | order by FlowCount desc
Complete Workflow
Challenges & Solutions
- Route table causing internet connectivity loss on web VMs — Routing all traffic to the NVA while the NVA didn't have IP forwarding enabled caused a black hole. Had to enable
Enable IP forwardingon the NVA NIC and configure iptables FORWARD rules on the VM. - NSG Flow Logs not appearing in Log Analytics — Traffic Analytics has a 60-minute delay before data appears. Also confirmed the Log Analytics workspace was in the same region as the Network Watcher to avoid cross-region data transfer costs.
- Azure Policy blocking test VMs with non-approved SKUs — The policy assignment was set to
Denymode immediately. Changed toAuditmode first to inventory non-compliant resources before enforcing. - VNet peering not allowing traffic across peered VNets — The peering was created but
AllowForwardedTrafficwas not enabled. Updated both peering connections to allow forwarded traffic for the route table to function correctly across peered VNets.
Key Takeaways
- Azure Policy in Audit mode first is the correct approach for governance adoption — enforcing Deny mode before an inventory assessment creates operational disruptions.
- NSG Flow Logs provide full traffic visibility but have a 60-minute latency for Traffic Analytics aggregations — they are not suitable for real-time alerting but excellent for forensic analysis.
- Custom route tables with NVA next-hops require IP forwarding enabled both on the Azure NIC level (Azure portal setting) AND at the OS level (sysctl net.ipv4.ip_forward=1).
- Cost budgets should be set at 70-80% threshold rather than 100% — by the time you reach 100%, there's no opportunity to intervene before overspend occurs.