You Are Paying for Servers That Nobody Uses. Right Now. Today.
Go look at your cloud bill. I will wait.
Now find the line items for EC2, RDS, and EBS. See those numbers? Somewhere between 20% and 40% of that total is paying for resources that are running, accumulating charges, and serving absolutely nobody. No traffic. No queries. No users. Just burning money hour after hour, day after day.
These are ghost servers. And every cloud environment has them.
The Flexera 2025 State of the Cloud Report puts the average cloud waste at 28% of total spend. But that is the average, which means half of all organizations waste more than that. In our experience working with engineering teams, companies that have never done a proper ghost infrastructure audit waste 35% to 45%.
On a $100,000/month cloud bill, that is $35,000 to $45,000 every single month going to resources nobody uses, nobody needs, and in many cases, nobody even knows exist.
This post is going to show you exactly where ghost servers hide, how to find every single one, and 12 strategies to eliminate them permanently. By the time you finish reading, you will have a checklist you can execute this week that will cut your cloud bill by thousands of dollars.
The 8 Types of Ghost Infrastructure (And Where They Hide)
Ghost infrastructure is not just "unused servers." It comes in 8 distinct forms, and most teams only look for 2 or 3 of them. Here is the complete list:
1. Idle EC2 and VM Instances
The most obvious type. Instances that are running but doing nothing useful. Common origins:
- Dev environments spun up for a feature branch that was merged (or abandoned) weeks ago
- Load test infrastructure that was never torn down
- Staging servers for a product that was sunset
- Instances launched manually "to test something real quick"
How to find them: Pull CloudWatch CPU utilization metrics for all instances. Anything averaging below 5% CPU over 14 days with fewer than 10 network connections per day is almost certainly a ghost.
2. Orphaned EBS Volumes and Managed Disks
When you terminate an EC2 instance, its root EBS volume usually gets deleted. But additional attached volumes? They detach and persist by default. Nobody notices because they do not show up on the EC2 dashboard. They sit quietly in the EBS volume list, charging you $0.10/GB/month for gp3 or $0.125/GB/month for gp2.
A single orphaned 500GB volume costs $50/month. Multiply that by dozens of volumes across a team that launches and terminates instances regularly, and you are looking at $500 to $2,000/month in pure waste.
How to find them: In the AWS Console, go to EC2 > Volumes. Filter by state: "available." Every volume in "available" state is unattached to any instance. If it has been available for more than 7 days, it is almost certainly orphaned.
3. Forgotten Snapshots
Every time someone creates an EBS snapshot "just in case" before a deployment, that snapshot persists forever. Snapshots are stored in S3 behind the scenes at $0.05/GB/month. After a year of weekly deployments across 20 servers, you can easily accumulate 1,000+ snapshots totaling terabytes of storage.
How to find them: Use the AWS CLI: aws ec2 describe-snapshots --owner-ids self --query 'sort_by(Snapshots, &StartTime)' --output table. Look at the oldest snapshots. If they are from instances that no longer exist, they are ghosts.
4. Unused Elastic IPs and Static IPs
AWS charges $0.005/hour ($3.60/month) for every Elastic IP that is not attached to a running instance. Azure charges similarly for unassociated public IPs. It sounds trivial, but 20 unused IPs cost $72/month, and they accumulate without anyone noticing because they are invisible unless you specifically look for them.
How to find them: AWS: aws ec2 describe-addresses --query 'Addresses[?AssociationId==null]'. This returns every Elastic IP not currently associated with a running instance.
5. Empty and Idle Load Balancers
Load balancers charge a flat hourly rate whether they route traffic or not. An idle Application Load Balancer on AWS costs about $16/month plus LCU charges. An idle Network Load Balancer costs about $16/month. Classic Load Balancers cost about $18/month.
Teams create load balancers for services, then decommission the services without deleting the load balancer. The LB sits there with zero healthy targets, charging you every hour.
How to find them: Check every load balancer for healthy target count. Any LB with zero healthy targets for more than 7 days is a ghost.
6. Idle RDS and Database Instances
This is one of the most expensive types of ghost infrastructure because database instances are typically larger and more costly than application servers.
The classic pattern: someone creates an RDS instance for a dev project, chooses Multi-AZ "for safety," picks db.r6g.xlarge "because that is what production uses," and then the project gets deprioritized. The database runs for months with zero connections, costing $730/month (or $1,460/month with Multi-AZ).
How to find them: Check CloudWatch DatabaseConnections metric. Any RDS instance averaging zero connections over 7 days is a ghost. Also check for databases with zero read or write IOPS.
7. Stale Container Registries and Artifacts
ECR (Elastic Container Registry) images accumulate over time. Every CI/CD pipeline push creates a new image tag. Without lifecycle policies, old images persist forever. A registry with 500 images at 2GB each stores 1TB and costs $100/month.
How to find them: List all ECR repositories and check image count and total size. Any repository with more than 30 image tags is likely accumulating stale artifacts.
8. Unused NAT Gateways
NAT Gateways charge $0.045/hour ($32/month) just for existing, plus data processing charges. If you deleted the private subnet resources that used the NAT Gateway but forgot the gateway itself, you are paying $32/month for nothing. If there are multiple unused NAT Gateways across multiple AZs, that adds up fast.
How to find them: Check CloudWatch BytesOutToDestination metric. Any NAT Gateway processing near-zero bytes over 7 days is a ghost.
For an in-depth look at NAT Gateway costs specifically, read our guide on the hidden AWS bill from NAT gateways.
Strategy 1: Run a Full Ghost Infrastructure Audit This Week
Stop reading and schedule this. Block 2 hours. Pull up every cloud account your organization uses. Run through the 8 resource types listed above for each account.
Here is the exact checklist:
AWS Audit:
- EC2 instances with average CPU below 5% for 14+ days
- EBS volumes in "available" (unattached) state
- Snapshots from terminated instances (older than 90 days)
- Elastic IPs not associated with running instances
- Load balancers with zero healthy targets
- RDS instances with zero database connections for 7+ days
- ECR repositories with more than 30 untagged images
- NAT Gateways with near-zero data processing
Azure Audit:
- VMs with average CPU below 5% for 14+ days
- Unattached managed disks
- Unassociated public IP addresses
- App Service plans with zero deployed apps
- Azure SQL databases with zero DTU consumption
- Unused network security groups
GCP Audit:
- VM instances with average CPU below 5% for 14+ days
- Persistent disks not attached to any instance
- Unused static external IP addresses
- Idle Cloud SQL instances
- Unused forwarding rules and target pools
We guarantee you will find at minimum 10% of your total spend going to ghosts. Most teams find 20% or more on their first audit.
Strategy 2: Automate Detection With Native Cloud Tools
A manual audit is a one-time fix. Automated detection is a permanent solution. Here is how to set it up on each cloud:
AWS: Trusted Advisor + Config Rules + Custom Lambda
AWS Trusted Advisor checks for idle EC2 instances, underutilized EBS volumes, and unused Elastic IPs automatically. If you have Business or Enterprise support, enable all cost optimization checks.
For deeper detection, create AWS Config rules that flag:
- EC2 instances with CPU below 10% for 14 days
- EBS volumes in "available" state for more than 7 days
- Security groups with no associated instances
Azure: Advisor + Azure Policy
Azure Advisor provides cost recommendations including idle VM detection, unused disks, and right-sizing suggestions. Enable all cost recommendations and set up weekly email digests.
Use Azure Policy to enforce automatic tagging and resource lifecycle rules.
GCP: Recommender + Asset Inventory
GCP Recommender identifies idle VMs, unattached persistent disks, and overprovisioned instances. Enable recommendations for all projects.
GCP Cloud Asset Inventory provides a complete view of every resource across all projects, making it easy to spot resources that exist in projects nobody actively manages.
Strategy 3: Enforce Mandatory Tagging on Every Resource
You cannot manage what you cannot identify. And you cannot identify what is not tagged.
Here is the uncomfortable truth about untagged resources: they are almost always ghosts. When someone creates a resource carefully and intentionally, they tag it. When someone creates a resource quickly "just to test something," they skip tagging. That untagged resource becomes invisible, and invisible resources never get cleaned up.
Make tagging mandatory at the infrastructure level:
On AWS: Use Service Control Policies to deny resource creation without required tags. Tag keys: team, environment, service, cost-center, and expiry-date.
On Azure: Use Azure Policy with "deny" effect to block resource creation without required tags.
On GCP: Use Organization Policy constraints to require labels on resource creation.
The expiry-date tag is the most important one that most teams skip. Every non-production resource should have an expiry date. When that date passes, an automated process reviews or deletes the resource. This single tag prevents the majority of ghost infrastructure from ever accumulating.
Strategy 4: Schedule Non-Production Environments
Your dev, staging, QA, and sandbox environments do not need to run 24 hours a day, 7 days a week. Your team works about 50 hours per week. Those environments are idle for 118 hours per week. That is 70% of every week where you are paying for compute that nobody is using.
The math: If non-production is 40% of your total spend, and you shut it down for 70% of the week, you save 28% of your entire cloud bill. On a $100,000/month bill, that is $28,000/month.
Use AWS Instance Scheduler for EC2 and RDS. Use Azure Automation runbooks for VMs. Use GCP instance schedules for Compute Engine.
Do not forget to schedule the resources around the instances too. NAT Gateways, load balancers, and RDS databases in non-production accounts should all follow the same schedule. A dev NAT Gateway running 24/7 costs $32/month. Running only during business hours costs $9.50/month.
For the full implementation guide, read our post on automating cloud cost optimization.
Strategy 5: Implement Automated Cleanup Pipelines
Schedule a weekly Lambda function (AWS), Azure Function, or Cloud Function that automatically:
- Identifies orphaned EBS volumes in "available" state for more than 14 days
- Snapshots the volume as a safety net (snapshots are much cheaper than volumes)
- Deletes the original volume
- Logs the action to Slack and a CloudWatch log group
Do the same for:
- Unused Elastic IPs: Release any EIP not associated with a running instance for 7+ days
- Stale snapshots: Delete snapshots older than 90 days that reference terminated instances
- Empty load balancers: Delete any ALB/NLB with zero healthy targets for 14+ days
- Old container images: Apply ECR lifecycle policies to keep only the last 10 tagged images per repository
The key is the safety net. Do not just delete resources aggressively. Snapshot volumes before deleting them. Move container images to a cheap archive before removing them from the registry. This gives teams a recovery path if something was miscategorized, which reduces resistance to automated cleanup.
Strategy 6: Add Expiry Dates to Every Temporary Resource
This one simple practice prevents more ghost infrastructure than any other tactic combined.
Every time someone creates a resource that is not meant to be permanent (dev instances, test databases, feature branch environments, experiment clusters, one-off data processing jobs), they must set an expiry-date tag with a specific date.
An automated process runs daily, checks for resources past their expiry date, sends a notification to the owner, and terminates the resource 48 hours later if the owner does not extend it.
This works because it shifts the default from "resources persist forever" to "resources expire unless actively maintained." The behavioral shift is enormous. Instead of engineers needing to remember to clean up (which they forget), they need to actively extend the life of resources they still need (which they remember, because they are using them).
Strategy 7: Integrate Cleanup Into CI/CD Pipelines
Every pull request that creates infrastructure should also destroy it when the PR is merged or closed.
If your CI/CD pipeline creates a preview environment for each PR, the pipeline should also tear down that environment when the PR lifecycle ends. If it creates test databases, it should drop them after tests complete. If it spins up load test infrastructure, it should terminate it when the load test finishes.
The rule is simple: every pipeline that creates should also destroy. If creation and destruction are not in the same pipeline definition, destruction will eventually be forgotten.
For teams using Terraform, this means every terraform apply in CI should have a corresponding terraform destroy trigger. For Kubernetes, use TTL (time-to-live) labels on namespaces and a controller that garbage collects expired namespaces automatically.
Strategy 8: Run Monthly FinOps Reviews Focused on Ghost Detection
Add a ghost infrastructure check to your monthly FinOps review. Here is the specific agenda item:
- Pull the list of all resources created in the last 30 days
- Cross-reference with the list of all resources terminated in the last 30 days
- Calculate the "creation to termination ratio." If you created 200 resources and terminated 150, you have a net accumulation of 50 resources. Is that expected growth or ghost accumulation?
- Review the top 20 oldest non-production resources. Are they still needed?
- Check untagged resource spend. What percentage of your bill is untagged? If it is above 10%, you have a tagging enforcement problem.
This 15-minute agenda item catches drift that automated tools sometimes miss, especially for resources that are technically "in use" but serving no real business purpose.
For expert guidance on building your FinOps practice, explore our Cloud Cost Optimization and FinOps service.
Strategy 9: Set Up Anomaly Alerts for Resource Accumulation
Most cost anomaly alerts trigger on spend increases. But you should also alert on resource count increases.
If your EC2 instance count jumps from 85 to 120 in a single day, that could be legitimate autoscaling or it could be a runaway process creating instances. Either way, you want to know about it immediately.
Set up CloudWatch alarms on:
- Total EC2 instance count exceeding 120% of your 7-day average
- Total EBS volume count increasing by more than 20 in a single day
- Total snapshot count increasing by more than 50 in a single day
- Total RDS instance count increasing by more than 2 in a single day
These alerts catch ghost infrastructure at the moment of creation, not weeks later when it shows up on the bill.
Our guide on detecting and preventing AWS cost spikes covers the full anomaly detection setup.
Strategy 10: Create a "Cloud Hygiene Score" for Each Team
Gamification works. Create a simple score for each engineering team based on:
- Percentage of resources with complete tags (target: 95%+)
- Number of orphaned resources owned by the team (target: 0)
- Non-production scheduling compliance (target: 100%)
- Average resource age vs expected lifecycle
- Cost per engineer on the team (trending down is good)
Publish this score monthly. Celebrate the teams with the best scores. Work with the teams that score low to understand their barriers. This creates positive social pressure for cloud hygiene without making it punitive.
Strategy 11: Audit Your Kubernetes Namespaces and Pods
If you run Kubernetes, ghost infrastructure takes a different form. Instead of idle VMs, you have:
- Abandoned namespaces from old feature branches or decommissioned services
- Pods in CrashLoopBackOff that nobody has investigated, consuming node resources
- Persistent Volume Claims bound to volumes that no pod references
- Completed Jobs that persist after finishing, holding their pod resources
- DaemonSets and ConfigMaps from services that have been removed
Use kubectl get namespaces and check the creation date and last activity for each. Any namespace older than 90 days with no recent deployments is a cleanup candidate.
For Kubernetes cost optimization specifically, our Kubernetes cost optimization guide covers every lever available.
Strategy 12: Build a "Right to Exist" Review for Long-Running Resources
Once per quarter, review every resource that has been running for more than 6 months. For each one, ask a simple question: does this resource have a documented reason to exist?
If the answer is no, it gets flagged. The owner has 2 weeks to either document its purpose or shut it down.
This sounds bureaucratic, but it catches the most insidious type of ghost: resources that were once necessary but are not anymore. The migration server that finished migrating 4 months ago. The monitoring instance for an application that was decommissioned. The database replica that was needed for a reporting project that ended last quarter.
These are not technically "idle." They might even have some traffic. But they are no longer serving any business purpose, and they will never be cleaned up unless someone explicitly asks "should this still exist?"
The Ghost Infrastructure Quick-Find Commands
Bookmark this section. These commands find the most common ghosts across all three clouds:
AWS
# Unattached EBS volumes
aws ec2 describe-volumes --filters Name=status,Values=available --query 'Volumes[*].[VolumeId,Size,CreateTime]' --output table
# Unused Elastic IPs
aws ec2 describe-addresses --query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' --output table
# Old snapshots (sort by date, review oldest)
aws ec2 describe-snapshots --owner-ids self --query 'sort_by(Snapshots, &StartTime)[0:20].[SnapshotId,VolumeSize,StartTime,Description]' --output table
Azure
# Unattached managed disks
az disk list --query "[?managedBy==null].[name,diskSizeGb,timeCreated]" --output table
# Unassociated public IPs
az network public-ip list --query "[?ipConfiguration==null].[name,ipAddress]" --output table
GCP
# Unattached persistent disks
gcloud compute disks list --filter="-users:*" --format="table(name,sizeGb,zone,creationTimestamp)"
# Unused static IPs
gcloud compute addresses list --filter="status=RESERVED" --format="table(name,address,region)"
Run these commands right now. The results will surprise you.
Ghost Servers Are a Symptom. Build the Cure.
Here is what I want you to take away from this post. Finding and killing ghost servers is important, but it is a one-time fix. The real win is building systems that prevent ghosts from accumulating in the first place.
Mandatory tagging prevents unidentifiable resources. Expiry dates prevent forgotten resources. Automated scheduling prevents idle non-production environments. CI/CD integration prevents orphaned infrastructure. Monthly reviews prevent drift.
When you layer all of these together, ghost infrastructure drops from 30% of your spend to under 5%. And it stays there, month after month, because the prevention is automated and continuous.
The companies that build this discipline do not just save money. They move faster, because their environments are clean, understandable, and well-organized. They deploy with more confidence, because they know exactly what is running and why. And they modernize more easily, because there is no graveyard of legacy resources blocking their path.
Start with the audit. Find the ghosts. Kill them. Then build the systems that keep them from coming back.
Want to find out exactly how much ghost infrastructure is hiding in your cloud accounts? Take our free Cloud Waste and Risk Scorecard for a personalized assessment in under 5 minutes.
Related reading:
- The Hidden Zombie Infrastructure Draining 30% of Your Cloud Budget
- Stop the Bleed: 7 Tactics to Detect and Prevent AWS Cost Spikes
- How to Automate Cloud Cost Optimization So You Never Manually Right-Size Again
- The Hidden AWS Bill: NAT Gateways and AI Workloads
- Kubernetes Cost Optimization: The 2026 Guide to Cutting Your K8s Bill
- Real-Time Cloud Cost Optimization: 7 Strategies to Prevent Spend Spikes
External resources: