Infrastructure Drift: How to Detect It and What to Do About It
Infrastructure drift causes outages and security issues. Learn how to detect when your actual infrastructure differs from your code, and how to fix it.

Video transcript
Your Infrastructure as Code looks perfect in Git. But your actual cloud resources have drifted. How did that happen? And more importantly, how do you catch it before it causes an outage? Infrastructure drift is the silent killer in DevOps teams. When your live resources stop matching your declared code, you lose compliance visibility, create security gaps, and invite manual configuration errors that nobody can trace. The cost is steep: unplanned downtime, audit failures, and late-night incident calls. Think of drift like a ship's course correction. You set your heading in Terraform, but manual clicks in the A W S console nudge the wheel. Soon you're miles off course. Drift detection tools continuously compare your declared state against reality, catching those nudges before they matter. Automated remediation works like a co-pilot with authority. When drift is detected, your system can automatically roll back to the golden configuration. This eliminates the human lag between discovery and fix. Teams using automated remediation close gaps in minutes, not days. Compliance scanning during drift detection ties your I A C directly to regulatory requirements. Every divergence from code becomes a compliance event, not just a technical hiccup. This transforms drift detection from a DevOps chore into a continuous compliance checkpoint. Start today: pick one critical resource group and enable drift detection in your I A C tool. Watch what surfaces. You'll be amazed. Read the complete guide at protego dot me.
The Drift Problem
You've got beautiful Terraform code, well-organized modules, everything documented. Then someone makes a "quick fix" in the AWS console, and suddenly your code doesn't match reality.
That's drift. It starts small and grows until you have no idea what's actually running.
Why Drift Happens
The Usual Suspects
- Emergency fixes: Production is down, someone fixes it manually
- Console convenience: It's faster to click than to write code
- Automated processes: Auto-scaling modifies resources
- Service integrations: AWS services create resources on your behalf
- Lack of access control: Too many people with console access
The Cost of Drift
- Security gaps: Hardcoded rules bypassing IaC review
- Outages: Terraform destroys manually-created resources
- Compliance failures: Auditors find undocumented changes
- Lost time: Engineers debugging why environments differ
Detecting Drift
Terraform Plan
Run terraform plan -detailed-exitcode regularly. Exit code 2 means drift detected.
Automated Drift Detection
Set up a scheduled pipeline that runs terraform plan every few hours and alerts on drift. For teams on Azure DevOps, our [Azure DevOps Pipelines guide](/blog/azure-devops-pipelines-beginners-guide) covers how to configure scheduled pipeline runs and alert integrations.
Third-Party Solutions
Tools like Driftctl, Firefly, env0, and Spacelift provide sophisticated drift detection.
Remediation Strategies
Option 1: Update Infrastructure to Match Code
Run terraform apply to revert manual changes. Warning: This might cause downtime.
Option 2: Update Code to Match Infrastructure
Update your Terraform code to include the change, then verify with terraform plan.
Option 3: Import Unmanaged Resources
Write the resource block, run terraform import, adjust until plan shows no changes.
Option 4: Remove from State
Use terraform state rm to stop managing resources that should be managed elsewhere.
Preventing Future Drift
Technical Controls
- Restrict console access using IAM policies
- Enforce tags that identify IaC-managed resources
- Use Service Control Policies (SCPs)
Process Controls
- Document break-glass procedures
- Require PR review for all changes
- Conduct regular drift audits
- Train the team on why IaC matters
Drift Response Playbook
- Assess: Is this expected?
- Document: Who, what, when, why
- Decide: Update code or infrastructure?
- Remediate: Make the fix
- Verify: Confirm drift resolved
- Prevent: How do we stop this recurring?
Key Takeaways
- Drift is inevitable; detecting it quickly is what matters
- Automated scanning should run at least daily
- Prevention through access control is better than detection
- Document everything
Zero drift is unrealistic. Quick detection and consistent remediation? That's achievable.
Frequently Asked Questions
What is infrastructure drift in Terraform?
Infrastructure drift occurs when the actual state of cloud resources diverges from the desired state defined in your Terraform code. It happens when engineers make manual changes in the cloud console, when automated processes (auto-scaling, service integrations) modify resources, or when Terraform applies partial changes due to errors. Drift is detected by running terraform plan and observing that Terraform plans to make changes even though you haven't modified your code.
How do I detect Terraform infrastructure drift automatically?
Run terraform plan -detailed-exitcode on a scheduled basis in your CI/CD pipeline. Exit code 2 indicates drift was detected (a non-empty plan). Configure the pipeline to send an alert (Slack, Teams, or email) when drift is detected. Third-party tools like Driftctl, Firefly, Spacelift, and env0 provide more sophisticated drift detection with reporting and remediation workflows.
How do I fix infrastructure drift in Terraform?
You have four options depending on the situation. Run terraform apply to revert the infrastructure back to match your code (may cause downtime if the manual change was intentional). Update your Terraform code to match the current infrastructure state, then verify with terraform plan that the result is a clean no-op. Use terraform import to bring unmanaged resources under Terraform management. Use terraform state rm to remove resources from Terraform management if they should be managed by a different configuration.
How do I prevent infrastructure drift in the first place?
The primary control is restricting direct console access using IAM policies and Service Control Policies (SCPs) so engineers cannot make changes outside of Terraform. Enforce a PR review process for all Terraform changes. Use break-glass procedures for emergency changes that require immediate manual intervention, with a mandatory follow-up to codify the change in Terraform within 24 hours. Regular automated drift detection ensures any drift that does occur is caught and remediated quickly.
What causes infrastructure drift in AWS and Azure environments?
The most common causes are emergency production fixes applied directly in the console or via CLI, developers using the cloud console for convenience rather than updating IaC, auto-scaling groups modifying resource counts, managed service integrations that create resources automatically (AWS creating ENIs, security groups, or IAM roles), and incomplete Terraform runs that partially apply a change before failing.
Get weekly security insights
Cloud security, zero trust, and identity guides — straight to your inbox.
Microsoft Cloud Solution Architect
Cloud Solution Architect with deep expertise in Microsoft Azure and a strong background in systems and IT infrastructure. Passionate about cloud technologies, security best practices, and helping organizations modernize their infrastructure.
Share this article
Questions & Answers
Related Articles
Need Help with Your Security?
Our team of security experts can help you implement the strategies discussed in this article.
Contact Us