Cloud Security11 min read

Infrastructure Drift: How to Detect It and What to Do About It

Infrastructure drift causes outages and security issues. Learn how to detect when your actual infrastructure differs from your code, and how to fix it.

I
Idan Ohayon
Microsoft Cloud Solution Architect
December 18, 2024
Infrastructure as CodeDrift DetectionTerraformComplianceDevOps

The Drift Problem

You've got beautiful Terraform code, well-organized modules, everything documented. Then someone makes a "quick fix" in the AWS console, and suddenly your code doesn't match reality.

That's drift. It starts small and grows until you have no idea what's actually running.

Why Drift Happens

The Usual Suspects

  1. Emergency fixes: Production is down, someone fixes it manually
  2. Console convenience: It's faster to click than to write code
  3. Automated processes: Auto-scaling modifies resources
  4. Service integrations: AWS services create resources on your behalf
  5. Lack of access control: Too many people with console access

The Cost of Drift

  • Security gaps: Hardcoded rules bypassing IaC review
  • Outages: Terraform destroys manually-created resources
  • Compliance failures: Auditors find undocumented changes
  • Lost time: Engineers debugging why environments differ

Detecting Drift

Terraform Plan

Run terraform plan -detailed-exitcode regularly. Exit code 2 means drift detected.

Automated Drift Detection

Set up a scheduled pipeline that runs terraform plan every few hours and alerts on drift.

Third-Party Solutions

Tools like Driftctl, Firefly, env0, and Spacelift provide sophisticated drift detection.

Remediation Strategies

Option 1: Update Infrastructure to Match Code

Run terraform apply to revert manual changes. Warning: This might cause downtime.

Option 2: Update Code to Match Infrastructure

Update your Terraform code to include the change, then verify with terraform plan.

Option 3: Import Unmanaged Resources

Write the resource block, run terraform import, adjust until plan shows no changes.

Option 4: Remove from State

Use terraform state rm to stop managing resources that should be managed elsewhere.

Preventing Future Drift

Technical Controls

  • Restrict console access using IAM policies
  • Enforce tags that identify IaC-managed resources
  • Use Service Control Policies (SCPs)

Process Controls

  • Document break-glass procedures
  • Require PR review for all changes
  • Conduct regular drift audits
  • Train the team on why IaC matters

Drift Response Playbook

  1. Assess: Is this expected?
  2. Document: Who, what, when, why
  3. Decide: Update code or infrastructure?
  4. Remediate: Make the fix
  5. Verify: Confirm drift resolved
  6. Prevent: How do we stop this recurring?

Key Takeaways

  • Drift is inevitable; detecting it quickly is what matters
  • Automated scanning should run at least daily
  • Prevention through access control is better than detection
  • Document everything

Zero drift is unrealistic. Quick detection and consistent remediation? That's achievable.

I

Idan Ohayon

Microsoft Cloud Solution Architect

Cloud Solution Architect with deep expertise in Microsoft Azure and a strong background in systems and IT infrastructure. Passionate about cloud technologies, security best practices, and helping organizations modernize their infrastructure.

Share this article

Questions & Answers

Related Articles

Need Help with Your Security?

Our team of security experts can help you implement the strategies discussed in this article.

Contact Us