L18. Audit Logging and Incident Response in Kubernetes
Video generating
Check back soon for the video lesson on Audit Logging and Incident Response in Kubernetes
When an incident happens in Kubernetes, audit logs are your forensic trail. Learn how to configure comprehensive audit logging, investigate security incidents, and build a Kubernetes IR playbook.
Kubernetes Audit Logs
Audit logs record every request to the API server. They answer the critical forensic questions: who did what, when, and from where.
Each audit event includes:
- User: The authenticated identity (user, service account, or anonymous)
- Verb: The operation (get, create, delete, patch)
- Resource: What was affected (pod, secret, role)
- Namespace: Where it happened
- Source IP: Where the request came from
- Response code: Whether it succeeded
Configuring Audit Policies
A good audit policy balances visibility with log volume:
apiVersion: audit.k8s.io/v1
kind: Policy
rules:
# Log all secret operations with full request body
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">level: Request</li>
</ul>
resources:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">group: ""</li>
</ul>
resources: ["secrets"] # Log RBAC changes with full details
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">level: RequestResponse</li>
</ul>
resources:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">group: "rbac.authorization.k8s.io"</li>
</ul>
resources: ["clusterroles", "clusterrolebindings", "roles", "rolebindings"]
# Log pod exec and port-forward
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">level: Request</li>
</ul>
resources:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">group: ""</li>
</ul>
resources: ["pods/exec", "pods/portforward", "pods/attach"]
# Log node and namespace operations
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">level: Metadata</li>
</ul>
resources:
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">group: ""</li>
</ul>
resources: ["nodes", "namespaces"]
# Log everything else at Metadata level
<ul class="list-disc pl-6 mb-4 space-y-2">
<li class="text-slate-300">level: Metadata</li>
</ul>
omitStages: ["RequestReceived"]
Log Shipping
Send audit logs to a centralized SIEM for analysis and alerting:
- Managed clusters: EKS sends audit logs to CloudWatch (enable in cluster logging settings). AKS sends to Azure Monitor. GKE sends to Cloud Logging.
- Self-managed: Ship logs from the audit log file to Elasticsearch, Splunk, or your SIEM using Fluentd or Vector.
Incident Response Playbook
When a security incident is detected in Kubernetes: 1. Contain
# Isolate the compromised pod with a deny-all network policy
kubectl label pod compromised-pod quarantine=true -n production# Apply isolation policy
kubectl apply -f - <<EOF
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: quarantine
namespace: production
spec:
podSelector:
matchLabels:
quarantine: "true"
policyTypes: ["Ingress", "Egress"]
EOF
2. Preserve evidence
# Capture pod details before deletion
kubectl get pod compromised-pod -n production -o yaml > pod-evidence.yaml
kubectl logs compromised-pod -n production --all-containers > pod-logs.txt
kubectl describe pod compromised-pod -n production > pod-describe.txt# Snapshot the container filesystem
kubectl cp production/compromised-pod:/tmp ./evidence/tmp/
3. Investigate
Query audit logs for the compromised identity:
# KQL-style query for Sentinel/Elasticsearch
user.username == "system:serviceaccount:production:web-app"
AND verb in ("create", "patch", "delete")
AND objectRef.resource in ("secrets", "roles", "clusterroles")
4. Eradicate
- Rotate all secrets the compromised pod had access to
- Revoke the service account's RBAC permissions
- Delete and recreate the compromised pod from a known-good image
- Scan the image for vulnerabilities and verify its signature
- Tighten RBAC to remove over-permissions found during investigation
- Add Network Policies to prevent the lateral movement path
- Add Falco rules to detect the attack pattern
- Update Pod Security Standards if the attack exploited weak security contexts
Key Detection Queries
| What to Detect | Audit Log Signal |
|---|---|
| Secret enumeration | Multiple get secrets across namespaces |
| RBAC escalation | create or patch on clusterrolebindings |
| Pod exec abuse | create pods/exec on production pods |
| Token theft | get secrets for service account tokens |
| Namespace escape | create pods in kube-system by non-admin |
- ✓Audit logs record every API server request: who (user/SA), what (verb/resource), when, and from where (source IP)
- ✓Log secret operations at Request level and RBAC changes at RequestResponse level for forensic detail
- ✓Contain compromised pods immediately with a deny-all NetworkPolicy using a quarantine label
- ✓Preserve evidence (pod YAML, logs, filesystem snapshot) before deleting compromised pods
- ✓Key detection signals: secret enumeration across namespaces, RBAC escalation, pod exec in production, and pods created in kube-system by non-admins
1. What is the first step when responding to a compromised pod in Kubernetes?
2. Which audit log signal indicates possible RBAC privilege escalation?
3. Why should you preserve pod evidence before deletion during incident response?