How much does Terraform drift cost the average company?

For teams managing 50-100 cloud resources, drift typically causes $8K-40K/year in waste. The primary sources are manually provisioned instances ($2K-8K/year), orphaned storage volumes ($500-3K/year), and forgotten load balancers ($2K-5K/year). Larger environments (500+ resources) commonly leak $50K-150K/year.

How do I detect Terraform drift in AWS?

Run `terraform plan` in CI weekly (not just before applies). The output shows resources that exist in state but differ from reality. For resources outside Terraform entirely, use AWS Config rules or tools like Driftctl/Snyk IaC to scan for unmanaged resources and compare against your state files.

What causes Terraform drift?

Five primary causes: (1) Console clicks by engineers bypassing IaC, (2) Failed terraform destroy leaving partial resources, (3) Auto-scaling creating resources Terraform doesn't track, (4) Manual hotfixes during incidents never codified, (5) Third-party tools provisioning resources outside your IaC pipeline.

How often should I run Terraform drift detection?

Weekly minimum for cost control. Daily for security-sensitive environments. The optimal pattern: automated terraform plan in CI every Monday morning, with Slack alerts for any drift detected. Most drift accumulates within 1-2 weeks of a manual change, so weekly catches 90% before it compounds.

Can Terraform drift cause security issues beyond cost?

Yes. Drifted security groups with overly permissive rules, manually opened ports never closed, IAM roles created outside Terraform with excessive permissions, and public S3 buckets created via console are all common security drift patterns. Cost drift and security drift share the same root cause: resources outside IaC governance.

Back to Engineering Insights

Cloud Cost Optimization

May 8, 2026

By Ravi Kanani

Terraform Drift Is Silently Adding $8K-40K/Year to Your Cloud Bill (Here's How to Find It)

Key Takeaway

Terraform drift causes $8K-40K/year in hidden cloud waste for the average 50-100 resource AWS account. The top offenders: manually created instances never added to state ($2K-8K/year), orphaned EBS volumes from failed destroys ($500-3K/year), and security groups with attached ENIs blocking cleanup ($200-1K/year). A weekly 10-minute drift audit catches 90% of cost leaks before they compound.

The $40K Your Terraform State File Doesn't Know About

Here is a scenario we encounter in nearly every cloud cost assessment at LeanOps: a team manages their infrastructure with Terraform, has proper CI/CD for deployments, reviews pull requests on HCL changes, and still overspends by 15-30% on resources that Terraform has no idea exist.

The reason is drift. Not the kind that shows up in terraform plan (that is state drift, and most teams catch it). The expensive kind: resources provisioned outside Terraform entirely, never added to state, never tracked, and never cleaned up.

A developer spins up an RDS instance via the console to debug a production issue. An SRE creates a load balancer manually during an incident. A data engineer launches a large EC2 instance for a one-time migration and forgets to terminate it. An auto-scaling group creates instances that Terraform does not manage. Each of these is invisible to your IaC pipeline. They do not show up in terraform plan. They do not appear in your module outputs. They simply run, bill, and accumulate.

Across 30 AWS accounts we have audited in the past year, the average drift-caused waste is $8,000 to $40,000 per year. For larger environments (500+ managed resources), we have found drift waste exceeding $150,000 annually. The worst part: these resources often run for months or years before anyone notices, because they exist in a governance blind spot between "infrastructure the team manages" and "infrastructure that exists."

This post covers the 7 most expensive drift patterns, how to detect each one in under 10 minutes, and the exact playbook to prevent drift from becoming a recurring cost leak.

The 7 Most Expensive Terraform Drift Patterns

Not all drift is expensive. A drifted tag or a modified description costs nothing. These seven patterns are the ones that consistently show up as significant line items on cloud bills.

Pattern 1: Manually Provisioned Compute (Cost: $2,000-8,000/year)

What happens: An engineer creates an EC2 instance, ECS service, or Lambda function via the AWS Console or CLI for debugging, testing, or a quick fix. The intent is temporary. The resource becomes permanent because no one tracks it, no terraform destroy removes it, and billing alerts are set at the account level (not the resource level).

Why it persists: The resource has no Terraform state entry, so terraform plan never shows it. It is not tagged with an owning team or expiration date. The engineer who created it moves to a different project. Monthly cost reviews look at aggregate spending, not individual resource inventories.

Real example: We found a client running 3x m5.xlarge instances ($0.192/hour each) that were created 14 months earlier for a load test. Total waste: $4,147 over 14 months. Nobody noticed because the instances represented less than 5% of the account's total EC2 spend.

Detection:

# Find EC2 instances not managed by Terraform
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[?!Tags[?Key==`terraform`]].[InstanceId,InstanceType,LaunchTime]' \
  --output table

Pattern 2: Orphaned EBS Volumes (Cost: $500-3,000/year)

What happens: Terraform destroys an EC2 instance but the EBS volume has delete_on_termination = false (which is the default for additional volumes in many modules). The instance disappears from state. The volume remains, unattached, billing monthly.

Why it persists: Unattached EBS volumes generate no CloudWatch metrics, no alarms, and no alerts. They sit in "available" state indefinitely. At $0.10/GB/month for gp3, a 500GB volume costs $50/month ($600/year) doing absolutely nothing.

Typical accumulation: Teams running ephemeral workloads (CI runners, batch processing, dev environments) commonly accumulate 20-50 orphaned volumes over a year. At an average of 100GB each: 30 volumes x 100GB x $0.10 = $300/month = $3,600/year.

Detection:

# Find unattached EBS volumes with total cost
aws ec2 describe-volumes \
  --filters "Name=status,Values=available" \
  --query 'Volumes[].[VolumeId,Size,CreateTime]' \
  --output table

Pattern 3: Forgotten Load Balancers (Cost: $2,000-5,000/year)

What happens: An ALB or NLB is created for a service that later gets decommissioned. The Terraform module for the service is destroyed, but the load balancer was in a separate module or was created manually. It continues running with no targets, processing no traffic, but billing the fixed hourly rate.

Why it costs so much: An idle ALB costs $16.20/month in fixed charges ($0.0225/hour) plus $0.008/LCU-hour even at minimum. An NLB costs $16.20/month minimum. Over a year, a single forgotten ALB costs $194. But most environments have 3-8 forgotten load balancers: that is $600-1,550/year in fixed charges alone, plus the associated Elastic IPs, target groups, and WAF rules that often attach to them.

Detection:

# Find ALBs with no healthy targets
aws elbv2 describe-target-health \
  --query 'TargetHealthDescriptions[?TargetHealth.State!=`healthy`]' \
  --output json

Pattern 4: Stale NAT Gateways (Cost: $1,000-4,000/year)

What happens: A VPC is created with NAT Gateways for private subnet internet access. The workloads in those private subnets are later moved or decommissioned. The NAT Gateway remains because it is in a shared networking module that nobody wants to touch.

Cost structure: NAT Gateways cost $0.045/hour ($32.40/month) per gateway in fixed charges, plus $0.045/GB processed. Even with zero traffic, two NAT Gateways (one per AZ for HA) cost $64.80/month = $778/year. With even modest residual traffic (DNS lookups, health checks from resources in the subnet), the bill climbs to $100-150/month = $1,200-1,800/year.

Detection:

# Find NAT Gateways with < 1GB processed in last 7 days
aws cloudwatch get-metric-statistics \
  --namespace AWS/NATGateway \
  --metric-name BytesOutToDestination \
  --period 604800 --statistics Sum \
  --start-time $(date -v-7d +%Y-%m-%dT%H:%M:%S) \
  --end-time $(date +%Y-%m-%dT%H:%M:%S) \
  --dimensions Name=NatGatewayId,Value=nat-XXXXX

Pattern 5: Elastic IPs Without Attachments (Cost: $200-800/year)

What happens: Elastic IPs are allocated for instances or load balancers that are later terminated. The EIP remains allocated but unattached. AWS charges $0.005/hour ($3.60/month) for unattached EIPs.

Why it accumulates: Individual EIP cost ($3.60/month) is small enough to never trigger a billing alarm. But teams commonly accumulate 5-20 unused EIPs over time: that is $18-72/month = $216-864/year. Additionally, AWS limits EIPs to 5 per region by default, so orphaned EIPs can block new allocations, forcing limit increase requests.

Detection:

# Find all unassociated Elastic IPs
aws ec2 describe-addresses \
  --query 'Addresses[?!AssociationId].[PublicIp,AllocationId]' \
  --output table

Pattern 6: Oversized RDS Instances from Manual Scaling (Cost: $3,000-15,000/year)

What happens: During a traffic spike or performance issue, someone manually scales an RDS instance from db.r6g.large to db.r6g.2xlarge via the console. The crisis passes. The Terraform state still shows the old instance class. Nobody scales it back down because terraform plan shows a "change" that would cause downtime, and nobody wants to schedule the maintenance window.

Cost impact: The difference between db.r6g.large ($0.26/hr) and db.r6g.2xlarge ($0.52/hr) is $0.26/hour = $189.80/month = $2,278/year. For Multi-AZ (which doubles the cost), the drift waste is $4,556/year on a single instance. We have found clients with 3-5 manually scaled RDS instances that were never scaled back: $7,000-23,000/year in avoidable spend.

Detection:

# Compare Terraform state to actual RDS instance classes
terraform state show aws_db_instance.main | grep instance_class
aws rds describe-db-instances \
  --query 'DBInstances[].[DBInstanceIdentifier,DBInstanceClass]' \
  --output table

Pattern 7: Abandoned CloudWatch Log Groups (Cost: $500-5,000/year)

What happens: Terraform creates a service with CloudWatch Logs. The service is decommissioned, but the log group persists (log groups are not automatically deleted). Old logs accumulate in storage. With no retention policy set, logs grow indefinitely at $0.03/GB/month for storage.

Why it compounds: A single application generating 1GB/day of logs, running for a year with no retention policy, accumulates 365GB of stored logs = $10.95/month in perpetual storage. Multiply by 10-20 abandoned services, and storage costs reach $100-200/month = $1,200-2,400/year. The insidious part: the log ingestion stopped (no new charges), but the storage bill grows every month from historical data that nobody will ever query.

Detection:

# Find log groups with no new events in 30+ days but significant stored data
aws logs describe-log-groups \
  --query 'logGroups[?storedBytes > `1000000000`].[logGroupName,storedBytes,retentionInDays]' \
  --output table

The 10-Minute Weekly Drift Audit

You do not need expensive tooling to catch 90% of cost drift. This workflow takes 10 minutes every Monday morning and catches the patterns above before they compound.

Step 1: Run Terraform Plan in CI (2 minutes)

Add a scheduled terraform plan to your CI pipeline that runs every Monday at 9 AM. This catches state drift (resources that exist in Terraform but have changed).

# .github/workflows/drift-detection.yml
name: Weekly Drift Detection
on:
  schedule:
    - cron: "0 9 * * 1"
jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform init
      - run: terraform plan -detailed-exitcode
        continue-on-error: true
      - name: Notify on drift
        if: steps.plan.outcome == 'failure'
        run: |
          curl -X POST $SLACK_WEBHOOK \
            -d '{"text":"Terraform drift detected. Run terraform plan to review."}'

Step 2: Scan for Unmanaged Resources (3 minutes)

Use a script that compares resources in your AWS account against resources in your Terraform state.

#!/bin/bash
# quick-drift-scan.sh

echo "=== Unattached EBS Volumes ==="
aws ec2 describe-volumes --filters "Name=status,Values=available" \
  --query 'Volumes[].[VolumeId,Size,CreateTime]' --output table

echo "=== Unassociated Elastic IPs ==="
aws ec2 describe-addresses --query 'Addresses[?!AssociationId].[PublicIp,AllocationId]' --output table

echo "=== Running instances without terraform tag ==="
aws ec2 describe-instances \
  --filters "Name=instance-state-name,Values=running" \
  --query 'Reservations[].Instances[?!Tags[?Key==`ManagedBy` && Value==`terraform`]].[InstanceId,InstanceType,LaunchTime]' \
  --output table

echo "=== Load Balancers with no healthy targets ==="
for arn in $(aws elbv2 describe-load-balancers --query 'LoadBalancers[].LoadBalancerArn' --output text); do
  healthy=$(aws elbv2 describe-target-health --target-group-arn $arn 2>/dev/null | grep -c "healthy")
  if [ "$healthy" -eq 0 ]; then
    echo "No healthy targets: $arn"
  fi
done

Step 3: Review and Action (5 minutes)

For each finding:

If the resource is needed: Import it into Terraform state (terraform import) and add it to your HCL.
If the resource is not needed: Terminate/delete it immediately. Do not "plan to clean it up later." Later never comes.
If you are not sure: Tag it with drift-review: 2026-05-15 (one week from now). If nobody claims it by then, delete it.

The Drift Prevention Framework

Detection is reactive. Prevention is cheaper. These five practices stop drift from accumulating in the first place.

Practice 1: Enforce Console Read-Only in Production

Use AWS Organizations SCPs to make production accounts console-read-only. Engineers can view resources but cannot create, modify, or delete them without going through Terraform.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "DenyConsoleWritesInProd",
      "Effect": "Deny",
      "Action": [
        "ec2:RunInstances",
        "ec2:CreateVolume",
        "rds:CreateDBInstance",
        "elasticloadbalancing:CreateLoadBalancer"
      ],
      "Resource": "*",
      "Condition": {
        "StringNotLike": {
          "aws:PrincipalArn": "arn:aws:iam::*:role/TerraformRole"
        }
      }
    }
  ]
}

This eliminates Pattern 1 (manual compute), Pattern 3 (manual LBs), and Pattern 6 (manual RDS scaling) entirely. The policy allows the Terraform execution role to make changes but blocks console users.

Practice 2: Tag Everything at Creation with Source

Add a mandatory ManagedBy tag to all resources. Use AWS Config rules to flag any resource created without this tag.

Tag Key	Values	Purpose
ManagedBy	terraform, manual, auto-scaling	Identifies governance path
TerraformModule	module path	Links resource to code
ExpiresAt	ISO date	Auto-cleanup trigger
Owner	team email	Accountability

Resources tagged ManagedBy: manual get flagged in weekly audits. Resources tagged with ExpiresAt get auto-terminated by a Lambda function when the date passes.

Practice 3: Auto-Delete Temporary Resources

Deploy a simple Lambda function that runs daily and terminates resources past their expiration:

import boto3
from datetime import datetime

def handler(event, context):
    ec2 = boto3.client('ec2')
    today = datetime.now().strftime('%Y-%m-%d')

    instances = ec2.describe_instances(
        Filters=[
            {'Name': 'tag:ExpiresAt', 'Values': [today]},
            {'Name': 'instance-state-name', 'Values': ['running']}
        ]
    )

    for reservation in instances['Reservations']:
        for instance in reservation['Instances']:
            ec2.terminate_instances(InstanceIds=[instance['InstanceId']])
            print(f"Terminated expired instance: {instance['InstanceId']}")

This prevents temporary resources from becoming permanent cost leaks. Engineers must set an ExpiresAt tag when creating anything outside Terraform. If they forget, the weekly audit catches it.

Practice 4: Terraform State Reconciliation in CI

Run a state reconciliation step in your deploy pipeline that compares the expected resource count against the actual resource count:

# Count resources in state
STATE_COUNT=$(terraform state list | wc -l)

# Count resources in AWS (for the tagged resources)
AWS_COUNT=$(aws resourcegroupstaggingapi get-resources \
  --tag-filters Key=ManagedBy,Values=terraform \
  --query 'ResourceTagMappingList | length(@)')

DRIFT=$((AWS_COUNT - STATE_COUNT))
if [ $DRIFT -gt 5 ]; then
  echo "WARNING: $DRIFT resources in AWS not in Terraform state"
fi

Practice 5: Monthly Cost Attribution Review

Once per month, compare your Terraform-managed costs against your total account costs. The gap is your drift waste.

# Total account cost (last month)
TOTAL=$(aws ce get-cost-and-usage \
  --time-period Start=2026-04-01,End=2026-04-30 \
  --granularity MONTHLY --metrics BlendedCost \
  --query 'ResultsByTime[0].Total.BlendedCost.Amount')

# Cost of tagged (terraform-managed) resources
MANAGED=$(aws ce get-cost-and-usage \
  --time-period Start=2026-04-01,End=2026-04-30 \
  --granularity MONTHLY --metrics BlendedCost \
  --filter '{"Tags":{"Key":"ManagedBy","Values":["terraform"]}}' \
  --query 'ResultsByTime[0].Total.BlendedCost.Amount')

echo "Total: $TOTAL | Managed: $MANAGED | Gap (drift): $(($TOTAL - $MANAGED))"

If the gap exceeds 10% of total spend, you have a drift problem worth investigating.

Drift Cost Calculator: What Is Your Environment Leaking?

Use this table to estimate your drift waste based on environment size:

Environment Size	Typical Drift Resources	Estimated Annual Waste	Common Offenders
Small (10-50 resources)	2-5 orphaned resources	$1,000-5,000/year	EBS volumes, EIPs, small instances
Medium (50-200 resources)	5-15 orphaned resources	$5,000-20,000/year	+ Load balancers, NAT gateways, log groups
Large (200-500 resources)	15-40 orphaned resources	$20,000-80,000/year	+ RDS instances, ECS services, S3 buckets
Enterprise (500+ resources)	40-100+ orphaned resources	$50,000-200,000/year	+ Cross-account drift, multi-region duplication

The multiplier effect: Drift compounds over time. A single untracked resource costs X/month today. Without detection, similar resources accumulate at a rate of 1-3 per month. After 12 months, you are paying 12-36x that original resource cost.

Tools for Drift Detection (Free and Paid)

Tool	Cost	What It Detects	Best For
`terraform plan` (scheduled)	Free	State drift only	Teams already using Terraform
AWS Config Rules	~$2/rule/month	Compliance drift, untagged resources	AWS-native governance
Driftctl (open source)	Free	Unmanaged resources outside Terraform	Finding resources Terraform doesn't know about
Spacelift	$40/user/month	Drift + policy + cost	Teams needing full IaC governance
env0	Custom pricing	Drift + cost estimation + policy	Enterprise IaC platforms
Firefly	Custom pricing	Full cloud-to-code mapping	Multi-cloud IaC governance
Snyk IaC	Free tier available	Drift + security misconfigs	Security-focused teams

Our recommendation: Start with scheduled terraform plan + the bash script above (free, 10 minutes/week). If drift exceeds $10K/year or you have 200+ resources, invest in Driftctl or Spacelift for automated detection.

The ROI of Drift Detection

Investment	Time/Cost	Expected Annual Savings	ROI
Weekly manual script (10 min/week)	8.7 hours/year (~$1,300 eng time)	$8,000-40,000/year	6-30x
Driftctl (automated, open source)	4 hours setup + 1 hr/month	$15,000-60,000/year	10-40x
Spacelift ($40/user x 5 users)	$2,400/year	$30,000-100,000/year	12-42x

Even the simplest approach (a 10-minute weekly script) delivers 6-30x return on time invested. There is no scenario where drift detection does not pay for itself within the first month.

The Bottom Line

Terraform drift is the cloud cost equivalent of a slow water leak. It does not cause a flood. It does not trigger alarms. It just runs, bills, and compounds month after month until someone finally looks at the pipes.

The fix is not complex. A 10-minute weekly audit catches 90% of drift waste. An SCP policy preventing console writes prevents 60% of drift from occurring in the first place. Together, they save $8K-40K/year for a typical environment with zero risk and minimal effort.

If your cloud bill has been growing faster than your workload, drift is likely a contributing factor. Our cloud cost optimization team includes drift detection as part of every assessment, and we typically find $10K-50K in drift waste within the first week.

Start with the 10-minute audit script above. Run it next Monday. The results will surprise you.

Further reading:

Frequently Asked Questions

Stop Overpaying for Cloud Infrastructure

Our clients save 30-60% on their cloud bill within 90 days. Get a free Cloud Waste Assessment and see exactly where your money is going.

Free Cloud Waste Assessment Our Services

Related Insights

View All

Cloud Cost Optimization

May 10, 2026

Cloudflare R2 vs AWS S3 in 2026: When R2 Saves 60% and When S3 Still Wins

After migrating 14 client workloads from AWS S3 to Cloudflare R2, we found that zero egress saves 60%+ in read-heavy use cases but S3 wins on lifecycle policies, compliance, event-driven architectures, and multi-region replication. This decision framework maps your workload pattern to the right provider.

Cloud Cost Optimization

May 7, 2026

80% of GPU Spend Wasted: 3 K8s Changes That Cut AI Infra Bills 60% in 2 Weeks

Learn how to achieve cloud cost optimization for AI and machine learning workloads on Kubernetes using right-sizing, Karpenter, Spot strategies, and scale-to-zero techniques to modernize infrastructure and reduce cloud waste.

Cloud Cost Optimization

May 6, 2026

EKS vs GKE vs AKS Pricing in 2026: Control Plane, Node Costs, and the Real Bill at 50-500 Nodes

AWS EKS charges $0.10/hour ($73/month) per cluster for the control plane alone. GKE Standard is free but Autopilot charges per pod resource. AKS control plane is free. But the control plane is less than 5% of your total Kubernetes bill. We compare the real all-in cost at 50, 200, and 500 nodes including compute, networking, storage, and load balancers.

View All Insights