Infrastructure Drift: How to Detect It and What to Do About It
Infrastructure drift causes outages and security issues. Learn how to detect when your actual infrastructure differs from your code, and how to fix it.
The Drift Problem
You've got beautiful Terraform code, well-organized modules, everything documented. Then someone makes a "quick fix" in the AWS console, and suddenly your code doesn't match reality.
That's drift. It starts small and grows until you have no idea what's actually running.
Why Drift Happens
The Usual Suspects
- Emergency fixes: Production is down, someone fixes it manually
- Console convenience: It's faster to click than to write code
- Automated processes: Auto-scaling modifies resources
- Service integrations: AWS services create resources on your behalf
- Lack of access control: Too many people with console access
The Cost of Drift
- Security gaps: Hardcoded rules bypassing IaC review
- Outages: Terraform destroys manually-created resources
- Compliance failures: Auditors find undocumented changes
- Lost time: Engineers debugging why environments differ
Detecting Drift
Terraform Plan
Run terraform plan -detailed-exitcode regularly. Exit code 2 means drift detected.
Automated Drift Detection
Set up a scheduled pipeline that runs terraform plan every few hours and alerts on drift.
Third-Party Solutions
Tools like Driftctl, Firefly, env0, and Spacelift provide sophisticated drift detection.
Remediation Strategies
Option 1: Update Infrastructure to Match Code
Run terraform apply to revert manual changes. Warning: This might cause downtime.
Option 2: Update Code to Match Infrastructure
Update your Terraform code to include the change, then verify with terraform plan.
Option 3: Import Unmanaged Resources
Write the resource block, run terraform import, adjust until plan shows no changes.
Option 4: Remove from State
Use terraform state rm to stop managing resources that should be managed elsewhere.
Preventing Future Drift
Technical Controls
- Restrict console access using IAM policies
- Enforce tags that identify IaC-managed resources
- Use Service Control Policies (SCPs)
Process Controls
- Document break-glass procedures
- Require PR review for all changes
- Conduct regular drift audits
- Train the team on why IaC matters
Drift Response Playbook
- Assess: Is this expected?
- Document: Who, what, when, why
- Decide: Update code or infrastructure?
- Remediate: Make the fix
- Verify: Confirm drift resolved
- Prevent: How do we stop this recurring?
Key Takeaways
- Drift is inevitable; detecting it quickly is what matters
- Automated scanning should run at least daily
- Prevention through access control is better than detection
- Document everything
Zero drift is unrealistic. Quick detection and consistent remediation? That's achievable.
Questions & Answers
Related Articles
Need Help with Your Security?
Our team of security experts can help you implement the strategies discussed in this article.
Contact Us