AKS + Defender for Containers: Complete Security Guide...

The Attack That Happened Because Nobody Checked the Pod Spec

In early 2025, a penetration test against a mid-size financial services firm found a CI/CD pipeline that deployed workloads to AKS with securityContext.privileged: true and hostPID: true set on the build agent pod. The cluster was running Kubernetes 1.26, and the deprecated PodSecurityPolicy controller had been disabled in 1.25 without a replacement policy in place. Defender for Containers was not enabled. The registry had no image signing.

Within 15 minutes of landing on the build agent pod, the tester had mounted the host filesystem via /proc/1/root, read the node's Azure Instance Metadata Service (IMDS) token at http://169.254.169.254/metadata/identity/oauth2/token, and used that managed identity to enumerate the Azure subscription. The managed identity had Contributor on the resource group. Game over.

This is not an edge case. The combination of no Pod Security Admission, permissive pod specs, and no runtime detection is the default state for AKS clusters set up in the 2020-2022 window and never revisited. Most of the security effort went into network-level controls: private cluster, NSGs, Azure Firewall. None of those controls stop a privileged container from escaping to the host and abusing the node's identity.

This guide covers the full stack of controls that actually prevent and detect that scenario.

Defender for Containers: What It Covers and What It Does Not

Defender for Containers is the Microsoft Defender plan that targets AKS and Arc-enabled Kubernetes clusters. It is not a firewall or a pod policy engine. Understanding what it actually does prevents the common mistake of treating it as a complete solution.

What Is Included

Image vulnerability assessment: Defender scans images in Azure Container Registry (ACR) and produces a per-image CVE list correlated against the OS package manifest and language runtime packages. As of the Defender for Cloud update in Q4 2024, it uses both the Microsoft Vulnerability Database and the Qualys scanner engine for coverage of OS-level and application-level vulnerabilities. Scans trigger on push and run weekly for images already in the registry.

Admission-time scanning: With the Defender profile deployed as a DaemonSet on the AKS node pool, images are checked at admission time against the vulnerability database. Pods referencing images with critical CVEs can be blocked via a deny policy. This is separate from registry scanning and covers images pulled from non-ACR registries.

Kubernetes audit log analysis: Defender ingests the AKS audit log stream and applies detection rules for anomalous API calls: creation of privileged pods, modification of cluster-admin bindings, use of exec into running containers, creation of pods in the kube-system namespace by non-system accounts.

Node-level threat detection: The Defender sensor running as a DaemonSet monitors process trees, network connections, and filesystem events at the node level. It detects crypto miners, reverse shells, and container escape techniques such as mounting /proc or accessing the Docker socket.

Kubernetes control plane hardening assessment: Defender for Cloud surfaces CIS Kubernetes Benchmark recommendations, AKS-specific misconfigurations (anonymous authentication on the API server, overly permissive RBAC bindings), and network policy gaps.

What It Does NOT Cover

Defender for Containers does not enforce pod security policy. It alerts on privileged pods but does not block them unless you separately configure an Azure Policy deny effect. It does not replace Network Policy: you can have Defender fully deployed with no network segmentation between pods and it will not prevent lateral movement via the pod network. It does not sign images or enforce image provenance checks. That requires Notation with Azure Key Vault or a third-party admission webhook like Kyverno with Cosign.

The plan costs $7 per vCore per month (as of January 2026). For a 10-node cluster with 4 vCores per node, that is $280/month. Not trivial, but significantly less than the average cost of a container compromise incident.

See the [CSPM comparison guide](/blog/best-cspm-tools-2026-defender-for-cloud-vs-wiz-vs-orca-vs-prisma-cloud) for how Defender for Containers stacks up against Wiz and Orca on container security coverage.

Enabling Defender for Containers via Azure CLI

# Enable Defender for Containers on the subscription
az security pricing create \
  --name Containers \
  --tier Standard

# Verify the Defender profile DaemonSet is running on the cluster
kubectl get daemonset microsoft-defender-collector-ds \
  -n kube-system

# Check the sensor is reporting to Defender
kubectl logs -n kube-system \
  -l app=microsoft-defender-collector \
  --tail=20

The Defender sensor deploys automatically when the plan is enabled and the cluster has the --enable-defender flag set, or when the Azure Policy initiative "Enable Microsoft Defender for Cloud on your subscription" is assigned.

Image Scanning and Supply Chain Security

Registry Scanning vs. Admission-Time Scanning

These two mechanisms cover different attack vectors and are not interchangeable.

Mechanism	When It Runs	What It Catches	What It Misses
ACR registry scanning	On push + weekly	CVEs in images stored in ACR	Images from Docker Hub, GHCR, non-ACR registries
Defender admission scanning	At pod creation	CVEs in any image at deployment time	Images that are not yet deployed
OPA/Kyverno policy	At pod creation	Policy violations (e.g., no digest pinning)	Vulnerability content inside image
Notation signing check	At pod creation	Unsigned or tampered images	Signed images with vulnerabilities

For a production cluster, you need all four layers. Registry scanning catches drift between your last scan and today's CVE database. Admission scanning catches images from external registries. Policy enforcement catches configuration mistakes like using latest tags or unpinned digests. Signing verification catches supply chain substitution.

Enforcing Image Scanning Results with Azure Policy

# Assign the built-in policy to block containers with critical CVEs
az policy assignment create \
  --name "block-critical-cve-images" \
  --display-name "Block AKS pods with critical vulnerabilities" \
  --policy "/providers/Microsoft.Authorization/policyDefinitions/13cd7ae3-5bc0-4ac4-a62d-4f7c120b9759" \
  --scope "/subscriptions/<subscription-id>/resourceGroups/<rg-name>" \
  --enforcement-mode Default \
  --params '{"effect": {"value": "Deny"}}'

# Check compliance state for the cluster
az policy state list \
  --resource "/subscriptions/<subscription-id>/resourceGroups/<rg-name>/providers/Microsoft.ContainerService/managedClusters/<cluster-name>" \
  --query "[?complianceState=='NonCompliant'].{policy:policyDefinitionName,resource:resourceId}" \
  --output table

Notation Image Signing with Azure Key Vault

Notation (the CNCF image signing standard, now at v1.1.0) integrates with Azure Key Vault for key storage and ACR for signature storage. The workflow: the CI pipeline signs the image digest after build, and an admission webhook (Ratify, maintained by Azure) verifies the signature at pod creation time.

# Install Notation CLI
curl -Lo notation.tar.gz https://github.com/notaryproject/notation/releases/download/v1.1.0/notation_1.1.0_linux_amd64.tar.gz
tar -xzf notation.tar.gz
sudo mv notation /usr/local/bin/

# Add the Azure Key Vault plugin
notation plugin install azure-kv \
  https://github.com/Azure/notation-azure-kv/releases/download/v1.2.0/notation-azure-kv_1.2.0_linux_amd64.tar.gz

# Sign an image using the AKV-backed signing key
notation sign \
  --key "https://<keyvault-name>.vault.azure.net/keys/<key-name>/<version>" \
  <acr-name>.azurecr.io/<image-name>@sha256:<digest>

# Verify signature
notation verify \
  <acr-name>.azurecr.io/<image-name>@sha256:<digest>

Deploy Ratify as an admission webhook to enforce signature verification at the Kubernetes layer. All unsigned images are rejected at admission, regardless of where they originate.

Pod Security Admission: Replacing Deprecated PSP

PodSecurityPolicy was deprecated in Kubernetes 1.21 and removed in 1.25. AKS clusters running 1.25+ have no pod security enforcement unless you explicitly configure Pod Security Admission (PSA) or a third-party admission controller like Kyverno or OPA Gatekeeper.

The Three PSA Modes

Pod Security Admission operates on three levels, each of which can run in three modes:

enforce: Policy violations reject the pod at admission. Use this in production namespaces.
audit: Violations are logged to the Kubernetes audit log but the pod is allowed. Use this during migration.
warn: Violations produce a warning in the API response (visible in kubectl output) but the pod is allowed. Use this for developer feedback.

The two relevant security profiles:

baseline: Prevents known privilege escalations. Blocks: privileged containers, host namespaces (hostPID, hostIPC, hostNetwork), host path mounts, and specific Linux capabilities (NET_RAW, SYS_ADMIN).

restricted: Enforces the full hardened posture. Requires: runAsNonRoot: true, seccompProfile: RuntimeDefault or Localhost, drops all capabilities (drop: ["ALL"]), disallows privilege escalation (allowPrivilegeEscalation: false), and requires read-only root filesystem.

Practical Namespace Labeling Strategy

Do not apply restricted to every namespace by default. Most legacy workloads will break. Use a tiered approach:

# Production application namespaces: enforce restricted
kubectl label namespace production \
  pod-security.kubernetes.io/enforce=restricted \
  pod-security.kubernetes.io/enforce-version=v1.29 \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/audit-version=v1.29 \
  pod-security.kubernetes.io/warn=restricted \
  pod-security.kubernetes.io/warn-version=v1.29

# Staging/migration namespaces: audit restricted, enforce baseline
kubectl label namespace staging \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/enforce-version=v1.29 \
  pod-security.kubernetes.io/audit=restricted \
  pod-security.kubernetes.io/audit-version=v1.29

# Infrastructure namespaces (monitoring, ingress): baseline enforce only
kubectl label namespace monitoring \
  pod-security.kubernetes.io/enforce=baseline \
  pod-security.kubernetes.io/enforce-version=v1.29

# Dry-run to preview what violations exist before enforcing restricted
kubectl --dry-run=server \
  label namespace <namespace> \
  pod-security.kubernetes.io/enforce=restricted

The --enforce-version label pins the policy to a specific Kubernetes version, preventing policy drift when the cluster is upgraded. Always pin to a specific version.

What Baseline Blocks That Matters Most

Baseline blocks the IMDS token theft scenario from the opening: hostPID: true is not permitted under baseline. A pod spec with hostPID or hostNetwork will be rejected at admission in an enforce-baseline namespace. It also blocks securityContext.privileged: true, which is the other primary container escape vector.

Restricted additionally requires seccompProfile: RuntimeDefault, which constrains the syscall surface available to the container. Combined with capability dropping, this significantly raises the cost of exploiting a container-level vulnerability to achieve host compromise.

See the [Kubernetes security best practices guide](/blog/kubernetes-security-best-practices-2026) for the full set of pod spec hardening recommendations beyond what PSA enforces.

AKS RBAC and Workload Identity

Why AAD Pod Identity Is a Lateral Movement Risk

AAD Pod Identity (the v1 solution, also called aad-pod-identity) worked by running a privileged DaemonSet that intercepted IMDS calls from pods and returned tokens scoped to an assigned managed identity. The interception mechanism used a hostNetwork: true pod with iptables rules.

Two problems: first, any pod that could manipulate iptables on the node could intercept IMDS calls from other pods. Second, the pod-to-identity binding used Kubernetes labels, which any user with pod edit permissions could apply to their own pod to assume a more privileged identity.

AKS Workload Identity (GA since AKS 1.27) eliminates both problems by using the OIDC issuer built into AKS and Kubernetes service account token projection. No privileged DaemonSet, no iptables manipulation, no label-based identity assignment.

Migrating from AAD Pod Identity to AKS Workload Identity

# Enable OIDC issuer and Workload Identity on an existing cluster
az aks update \
  --name <cluster-name> \
  --resource-group <rg-name> \
  --enable-oidc-issuer \
  --enable-workload-identity

# Get the OIDC issuer URL
OIDC_ISSUER=$(az aks show \
  --name <cluster-name> \
  --resource-group <rg-name> \
  --query "oidcIssuerProfile.issuerUrl" \
  --output tsv)

# Create a user-assigned managed identity for the workload
az identity create \
  --name "workload-identity-myapp" \
  --resource-group <rg-name>

CLIENT_ID=$(az identity show \
  --name "workload-identity-myapp" \
  --resource-group <rg-name> \
  --query "clientId" \
  --output tsv)

# Create federated credential linking the managed identity to a Kubernetes service account
az identity federated-credential create \
  --name "myapp-federated-cred" \
  --identity-name "workload-identity-myapp" \
  --resource-group <rg-name> \
  --issuer "$OIDC_ISSUER" \
  --subject "system:serviceaccount:production:myapp-sa" \
  --audience "api://AzureADTokenExchange"

Then annotate the Kubernetes service account and configure the pod spec:

apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp-sa
  namespace: production
  annotations:
    azure.workload.identity/client-id: "<client-id-from-above>"
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  namespace: production
spec:
  template:
    metadata:
      labels:
        azure.workload.identity/use: "true"
    spec:
      serviceAccountName: myapp-sa
      containers:
        - name: myapp
          image: <acr-name>.azurecr.io/myapp:v1.2.3@sha256:<digest>
          securityContext:
            runAsNonRoot: true
            runAsUser: 1000
            allowPrivilegeEscalation: false
            readOnlyRootFilesystem: true
            capabilities:
              drop: ["ALL"]
            seccompProfile:
              type: RuntimeDefault

The pod receives a projected service account token volume automatically. The Azure SDK reads the token from the well-known path and exchanges it with Entra ID for an access token scoped to the managed identity. No IMDS call leaves the pod. No privileged DaemonSet intercepts the request.

The subject format system:serviceaccount:<namespace>:<sa-name> is exact and namespace-scoped. A pod in a different namespace cannot claim the same identity. This is the fundamental security improvement over label-based pod identity.

See the [federated credentials guide](/blog/flexible-federated-identity-credentials-entra-github-terraform) for Bicep templates that provision the full federated credential chain, and the [non-human identity guide](/blog/non-human-identities-nhi-security-guide) for governance of workload identities at scale.

Provisioning Workload Identity with Bicep

param appName string
param location string
param oidcIssuerUrl string
param aksNamespace string
param serviceAccountName string

resource keyVault 'Microsoft.KeyVault/vaults@2023-07-01' existing = {
  name: 'kv-${appName}'
}

// Managed identity for the workload
resource workloadIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'workload-identity-${appName}'
  location: location
}

// Federated credential linking to Kubernetes service account
resource federatedCredential 'Microsoft.ManagedIdentity/userAssignedIdentities/federatedIdentityCredentials@2023-01-31' = {
  parent: workloadIdentity
  name: '${appName}-federated-cred'
  properties: {
    issuer: oidcIssuerUrl
    subject: 'system:serviceaccount:${aksNamespace}:${serviceAccountName}'
    audiences: ['api://AzureADTokenExchange']
  }
}

// Scoped role assignment: least-privilege to Key Vault secrets only
resource kvSecretUserRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  name: guid(workloadIdentity.id, keyVault.id, 'Key Vault Secrets User')
  scope: keyVault
  properties: {
    roleDefinitionId: subscriptionResourceId(
      'Microsoft.Authorization/roleDefinitions',
      '4633458b-17de-408a-b874-0445c86b69e6'  // Key Vault Secrets User
    )
    principalId: workloadIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

output clientId string = workloadIdentity.properties.clientId
output principalId string = workloadIdentity.properties.principalId

Network Policy: Default Deny and Egress Restrictions

Calico vs Azure Network Policy

AKS supports two network policy engines: Azure Network Policy (managed by Microsoft, L4 only, limited to 250 nodes) and Calico (open source, full NetworkPolicy spec support, scales beyond 250 nodes, supports FQDN-based egress filtering in the enterprise version).

Feature	Azure Network Policy	Calico OSS	Calico Enterprise
Max nodes	250	Unlimited	Unlimited
Egress DNS filtering	No	No	Yes (FQDN policy)
NetworkPolicy spec support	Full L4	Full L4	Full L4 + L7
Pod-to-pod encryption	No	WireGuard (v3.14+)	WireGuard
Global network policies	No	Yes (CRDs)	Yes
Cost	Included	Included	Commercial license

For clusters over 250 nodes or clusters requiring encrypted pod-to-pod communication, Calico is the correct choice. Specify the network plugin at cluster creation: --network-plugin azure --network-policy calico.

Implementing Default Deny

The Kubernetes default allows all pods to communicate with all other pods. If a pod is compromised, it has full network access to every other pod in the cluster. Default deny reverses this: all traffic is blocked unless explicitly permitted.

# Default deny all ingress and egress for a namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
# Allow egress to kube-dns only (required for DNS resolution)
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: production
spec:
  podSelector: {}
  policyTypes:
    - Egress
  egress:
    - ports:
        - port: 53
          protocol: UDP
        - port: 53
          protocol: TCP
# Allow ingress from ingress controller to application pods
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-ingress-controller
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: myapp
  policyTypes:
    - Ingress
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-nginx
      ports:
        - port: 8080
          protocol: TCP
# Allow application egress to Azure SQL private endpoint
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-sql-egress
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: myapp
  policyTypes:
    - Egress
  egress:
    - to:
        - ipBlock:
            cidr: 10.0.4.5/32  # Private endpoint IP for Azure SQL
      ports:
        - port: 1433
          protocol: TCP

Apply default deny before applying allow rules. If you reverse the order, there is a window where pods are unreachable. With Calico, use GlobalNetworkPolicy CRDs to apply default deny across the entire cluster before namespace-scoped policies add exceptions.

For a comprehensive [zero trust](/blog/what-is-zero-trust-security-complete-guide) network posture in AKS, default deny at the network layer is the foundation. Every allowed flow should be documented and justified.

Runtime Threat Detection: Defender Alerts and KQL

Key Defender for Containers Alerts

Defender for Containers generates alerts in the SecurityAlert table in Log Analytics. The most actionable alerts for AKS:

Alert Name	Severity	What It Detects
`K8S.NODE_CryptominerDetected`	High	Crypto miner process running in container
`K8S.NODE_PrivilegedContainerArtifacts`	High	Privileged container accessing host paths
`K8S.NODE_ContainerEscape`	Critical	Container escape technique detected at node level
`K8S.NODE_ReverseShell`	High	Outbound connection matching reverse shell pattern
`K8S.NODE_NewPrivilegedContainer`	Medium	New container started with privileged flag
`K8S_AUDIT.ClusterAdminBindingCreated`	High	New ClusterRoleBinding to cluster-admin created
`K8S_AUDIT.ExposedServiceAccountToken`	Medium	Service account token accessed from exec session
`K8S_AUDIT.AnonymousAccessToAPIServer`	Medium	Unauthenticated request to API server

KQL Queries for AKS Threat Hunting

// Detect privileged pod creation in the last 7 days
SecurityAlert
| where TimeGenerated > ago(7d)
| where AlertType in (
    "K8S.NODE_NewPrivilegedContainer",
    "K8S.NODE_PrivilegedContainerArtifacts",
    "K8S.NODE_ContainerEscape"
  )
| extend Details = parse_json(ExtendedProperties)
| project
    TimeGenerated,
    AlertType,
    AlertSeverity,
    CompromisedEntity,
    Details.ContainerName,
    Details.PodName,
    Details.Namespace,
    Details.NodeName,
    RemediationSteps
| order by TimeGenerated desc

// Detect cluster-admin binding creation (privilege escalation indicator)
AzureDiagnostics
| where TimeGenerated > ago(24h)
| where Category == "kube-audit"
| extend AuditLog = parse_json(log_s)
| where AuditLog.verb == "create"
  and AuditLog.objectRef.resource == "clusterrolebindings"
| extend
    User = AuditLog.user.username,
    RoleRef = AuditLog.requestObject.roleRef.name,
    Subjects = AuditLog.requestObject.subjects
| where RoleRef == "cluster-admin"
| project TimeGenerated, User, RoleRef, Subjects

// Hunt for container exec sessions (potential lateral movement indicator)
AzureDiagnostics
| where TimeGenerated > ago(24h)
| where Category == "kube-audit"
| extend AuditLog = parse_json(log_s)
| where AuditLog.verb == "create"
  and AuditLog.objectRef.subresource == "exec"
| extend
    User = AuditLog.user.username,
    Namespace = AuditLog.objectRef.namespace,
    PodName = AuditLog.objectRef.name,
    Command = AuditLog.requestObject.command
| project TimeGenerated, User, Namespace, PodName, Command
| order by TimeGenerated desc

// Images pulled from non-ACR registries (supply chain risk signal)
ContainerImageInventory
| where TimeGenerated > ago(24h)
| where Repository !contains ".azurecr.io"
  and Repository !startswith "mcr.microsoft.com"
| project TimeGenerated, Computer, Repository, Image, ImageTag, Running
| order by TimeGenerated desc

Enable AKS audit log collection by sending the kube-audit and kube-audit-admin diagnostic categories to the Log Analytics workspace attached to Defender for Cloud. Without these categories, the audit log KQL queries return no results.

# Enable AKS diagnostic settings for audit logging
az monitor diagnostic-settings create \
  --name "aks-audit-logs" \
  --resource "/subscriptions/<subscription-id>/resourceGroups/<rg-name>/providers/Microsoft.ContainerService/managedClusters/<cluster-name>" \
  --workspace "<log-analytics-workspace-id>" \
  --logs '[
    {"category": "kube-audit", "enabled": true},
    {"category": "kube-audit-admin", "enabled": true},
    {"category": "kube-controller-manager", "enabled": true},
    {"category": "kube-scheduler", "enabled": true},
    {"category": "cluster-autoscaler", "enabled": true}
  ]'

What Falco Catches That Defender Misses

Defender for Containers provides solid coverage for known-bad patterns but has latency in adding detections for novel techniques. Falco (CNCF graduated, v0.39.0 as of early 2026) runs as a DaemonSet with direct kernel access via eBPF and can be tuned with custom rules.

Specific gaps Falco fills in a Defender deployment:

Sensitive file reads inside containers: Falco can alert on reads to /etc/shadow, /root/.ssh, or credential files within a running container. Defender detects escape attempts but not reconnaissance inside the container before the escape.
Unexpected outbound connections by process name: Falco rules can fire when python3 or sh initiates a network connection that an nginx container would never normally make.
Syscall-level container escape detection: Falco with the eBPF probe detects ptrace calls, namespace manipulation via unshare, and nsenter commands that indicate an attempted container escape before it succeeds.
Modification of /etc/passwd or /etc/sudoers: Persistence mechanisms that write to these paths inside a container are caught by Falco rules but are not in Defender's default detection set.

The operational cost is rule maintenance. Falco generates significant noise in default configuration. Tune it against your specific workload before enabling alerting in production. Combine Falco alerts with Sentinel via the Falco Sidekick integration to correlate with Defender alerts in a single incident queue.

AKS Hardening Checklist

[ ] Enable Defender for Containers on all subscriptions running AKS clusters (az security pricing create --name Containers --tier Standard)
[ ] Deploy Defender sensor DaemonSet to all node pools and verify it is reporting (kubectl get daemonset microsoft-defender-collector-ds -n kube-system)
[ ] Enable OIDC issuer and Workload Identity on all clusters; remove all AAD Pod Identity components (aad-pod-identity DaemonSet and CRDs)
[ ] Migrate all workloads from AAD Pod Identity to AKS Workload Identity with namespace-scoped federated credentials
[ ] Apply Pod Security Admission labels to all namespaces: enforce=restricted for production, enforce=baseline minimum for all others
[ ] Audit all existing pods for privileged: true, hostPID: true, hostNetwork: true, and hostIPC: true in pod specs; remediate before enforcing PSA
[ ] Deploy default-deny NetworkPolicy to all application namespaces; document and justify every allow rule
[ ] Enable Notation image signing in CI/CD pipeline for all images pushed to ACR; deploy Ratify admission webhook to enforce signature verification at pod creation
[ ] Enable ACR image vulnerability scanning and assign the Azure Policy deny effect for images with critical CVEs
[ ] Enable AKS diagnostic settings for kube-audit and kube-audit-admin log categories sent to the Defender-linked Log Analytics workspace
[ ] Create KQL alert rules for: cluster-admin binding creation, privileged pod creation, container exec into running pods, and images pulled from non-ACR registries
[ ] Disable the Kubernetes dashboard if enabled; audit all ClusterRoleBinding objects bound to cluster-admin and remove any non-system service accounts
[ ] Pin all pod images to digest (image@sha256:<digest>) rather than mutable tags in production deployments
[ ] Configure private cluster (API server VNET integration) and restrict authorized IP ranges for all clusters accessible via public endpoint
[ ] Review Defender for Cloud AKS recommendations weekly; remediate all High severity findings within 14 days per your patch SLA

Frequently Asked Questions

What is Pod Security Admission and why is it replacing Pod Security Policies in AKS?

Pod Security Admission (PSA) is a built-in Kubernetes admission controller that enforces security profiles (privileged, baseline, or restricted) at the namespace level using labels. Pod Security Policies (PSP) were a cluster-scoped resource that required complex RBAC bindings to work correctly and were removed from Kubernetes in version 1.25. PSA is simpler to operate because security requirements are declared on the namespace itself, making it immediately visible what enforcement level applies to each workload. AKS automatically supports PSA from Kubernetes 1.25 onward and Microsoft recommends the restricted profile for all production application namespaces.

How does AKS Workload Identity differ from the legacy AAD Pod Identity approach?

AAD Pod Identity used a DaemonSet that intercepted IMDS requests from pods and mapped them to Azure AD identities via CRDs. This architecture had known security issues including privilege escalation risk through the NMI (Node Managed Identity) component and race conditions in the identity binding. AKS Workload Identity is the replacement: it uses the Kubernetes service account token as an OIDC credential that is exchanged for an Azure AD access token using federated identity credentials. There is no daemon intercepting IMDS traffic, and the identity binding is scoped to a specific Kubernetes service account in a specific namespace with a specific OIDC issuer claim, giving it a much smaller blast radius.

What does Defender for Containers detect that a standard Kubernetes audit log review would miss?

Defender for Containers runs a threat intelligence engine against cluster activity that a manual audit log review would not catch at scale. Specific detections include cryptocurrency mining activity identified by process behavior rather than known hashes, container escape attempts using techniques like namespace manipulation and privileged container creation, exposed Kubernetes dashboards probed from external IPs, and lateral movement via service account token exfiltration. The sensor runs as a DaemonSet with direct node access and can detect syscall-level activity that the Kubernetes API server audit log does not record.

Why should container images be pinned to digest rather than using mutable tags in production?

A mutable image tag such as nginx:latest or app:v2 can be overwritten in the registry, meaning the image a pod pulls today may be different from what it pulled yesterday. An attacker who can push to your registry or who compromises an upstream registry can silently replace a tagged image with a malicious one. Pinning to digest (image@sha256:abc123...) guarantees that the exact byte-for-byte image that was built and scanned in CI is the one that runs in production. Notation image signing combined with the Ratify admission webhook provides a stronger guarantee by verifying a cryptographic signature at pod creation time.

What is the most commonly missed AKS security configuration in enterprise deployments?

Based on Defender for Cloud recommendations and real-world assessments, the most commonly missed configuration is leaving the Kubernetes API server accessible from the public internet without IP range restrictions. Many AKS clusters are deployed with the default public API server endpoint and never have authorized IP ranges configured, which means the API server accepts connections from any IP address. The second most common gap is missing Defender for Containers sensor coverage on clusters deployed before the sensor was available or in subscriptions where the Defender plan was not enabled at deployment time.