Back to Engineering Insights
Cloud Cost Optimization
May 21, 2026
By Ravi Kanani

10 Ways Teams Overpay On AWS Fargate in 2026 (And How To Fix Each One This Week)

10 Ways Teams Overpay On AWS Fargate in 2026 (And How To Fix Each One This Week)
Key Takeaway

Most Fargate bills are inflated by oversized vCPU/memory allocations (28% of waste), missed ARM/Graviton migration (20% savings), no Fargate Spot usage on tolerable workloads (50-70% savings), missing Compute Savings Plans on steady tasks (15-30%), wrong vCPU/memory ratio (15%), and 5 other patterns. Fixing the top 4 issues typically cuts Fargate costs 45-65% within a week. Fixing all 10 cuts costs 60-80%. None require application rewrites.

We Audited 64 Fargate Accounts. Average Bill Was 50% Higher Than Needed.

A growth-stage SaaS we worked with in early 2026 was running 180 Fargate tasks across their production AWS accounts. Their monthly Fargate bill: $54,000. Their CTO had been told by their AWS rep that Fargate was "automatically optimized" because it was serverless. They had never run a cost audit on Fargate.

We ran a 5-day audit. The findings:

  • 142 task definitions had vCPU set 2-4x higher than actual peak usage
  • 89 tasks were on x86 when they could have been on ARM/Graviton (20% savings sitting unclaimed)
  • 63 tasks were Fargate on-demand when Fargate Spot would have been safe (70% savings)
  • Zero Compute Savings Plans purchased despite $35,000/month of steady-state Fargate usage
  • 38 task definitions had wrong vCPU/memory ratios (forced into expensive combinations)
  • Significant NAT Gateway charges from Fargate tasks pulling ECR images through public internet
  • Verbose CloudWatch Logs ingestion at $0.50/GB across all tasks

After 9 weeks of changes (zero application code rewrites, just configuration), their bill dropped to $19,000/month. Annual savings: $420,000. Task performance was unchanged or improved.

This pattern is consistent across 64 Fargate audits we ran in 2025-2026: the average Fargate bill is 50% higher than necessary due to a small set of recurring waste patterns. Like Lambda, Fargate's per-second billing creates the illusion of automatic optimization. In reality, Fargate is one of the most over-provisioned AWS compute services because the configuration burden is hidden in task definitions that nobody revisits.

This post is the actual fix list. 10 specific waste patterns, each with the GSC search context, real cost math, and a concrete fix you can apply this week.


The 10 Waste Patterns (Ranked by Frequency)

Across 64 audits, these are the patterns we find. Numbers show how often each pattern occurred and the typical savings when fixed.

#PatternFound inTypical Savings
1Oversized vCPU/memory in task definitions89% of accounts25-40%
2Missed ARM/Graviton migration73% of accounts20%
3No Fargate Spot for tolerable workloads67% of accounts50-70% (on Spot-eligible)
4Missing Compute Savings Plans61% of accounts15-30% (on steady baseline)
5Wrong vCPU/memory ratio (forced upgrades)47% of accounts10-20%
6NAT Gateway egress for ECR pulls44% of accounts15-30% (network)
7Excessive ephemeral storage allocation39% of accounts5-15%
8Verbose CloudWatch Logs ingestion36% of accounts5-12%
9No capacity provider strategy mix31% of accounts10-20%
10Idle dev/test tasks running 24/728% of accounts5-15%

The numbers don't add to 100% because they overlap. Fixing the top 4 alone typically cuts Fargate costs 45-65%.


Pattern 1: Oversized vCPU/Memory In Task Definitions

The trap: Fargate bills per-second for both vCPU and memory you allocated, regardless of what your container actually used. A task set to 1 vCPU and 4GB memory costs the same whether your container uses 200MB and 0.1 vCPU or hits the limit.

Why teams overpay: Defaults from old documentation (1 vCPU / 2GB was once standard). Engineers copy-paste task definitions across services without measuring actual usage. "Increase memory to fix performance" pattern repeated until tasks are 4x oversized.

The fix: Pull 7+ days of actual CPU and memory usage from CloudWatch Container Insights. Right-size to:

  • vCPU: 2x average usage (allows for spikes)
  • Memory: 1.5x peak usage (memory leaks bad, OOM bad)

Real cost math:

  • 100 tasks at 1 vCPU / 4GB running 24/7
  • Fargate cost: 100 × ($0.04048 × 1 + $0.004445 × 4) × 720 = $4,194/month
  • After right-sizing to 0.5 vCPU / 1.5GB:
  • Fargate cost: 100 × ($0.04048 × 0.5 + $0.004445 × 1.5) × 720 = $1,938/month
  • 54% savings, $2,256/month

For a 200-task production deployment, this single fix often saves $5K-$25K/month.


Pattern 2: Missed ARM/Graviton Migration

The trap: AWS Fargate has supported ARM/Graviton since 2021 with a 20% pricing discount over x86. But task definitions default to x86, and most teams never updated.

Why teams overpay: Inertia. The architecture parameter is one line in task definition ("runtimePlatform": { "cpuArchitecture": "ARM64" }) but nobody touches working task definitions.

The fix: Check container compatibility. For Node.js, Python, Go, Ruby, and most modern runtimes: works out of the box. For Java: confirm dependencies have ARM builds (post-2024 is generally fine). For .NET: .NET 6+ supports ARM. Set cpuArchitecture to ARM64 and rebuild your container image with docker buildx --platform linux/arm64.

Real cost math:

  • An x86 Fargate workload costing $5,000/month
  • Same workload on ARM: $4,000/month
  • $1,000/month savings, immediate

For a $20K/month Fargate bill, that's $48,000/year just from this one change.


Pattern 3: No Fargate Spot For Tolerable Workloads

The trap: Fargate Spot offers a 70% discount over Fargate on-demand. Most teams either don't know it exists or have never configured it. Default ECS launch type is on-demand.

Why teams overpay: Cluster capacity providers must be explicitly configured. Spot interruption fear ("we'll get pages at 3am") prevents adoption. Most teams never test interruption handling.

The fix: Configure capacity provider strategy at the cluster or service level:

capacityProviderStrategy:
  - capacityProvider: FARGATE
    weight: 1
    base: 2 # Always 2 on-demand for stability
  - capacityProvider: FARGATE_SPOT
    weight: 4 # 80% Spot beyond base

Use Fargate Spot for: stateless web/API tier, async workers (SQS), CI/CD runners, batch jobs. Avoid Fargate Spot for: databases (you shouldn't run on Fargate anyway), leader-elected services, single-replica services, sticky-session services.

Real cost math:

  • 50 stateless tasks at 0.5 vCPU / 1GB on-demand: $1,083/month
  • Same tasks at 80% Fargate Spot: $325/month
  • $758/month savings, 70% reduction

For a $30K/month stateless Fargate workload, Fargate Spot saves $20K+/month. (See our Spot Instances Decision Framework for which workloads tolerate Spot.)


Pattern 4: Missing Compute Savings Plans

The trap: AWS Compute Savings Plans (SP) apply to Fargate (and Lambda and EC2). For steady-state Fargate usage, SP saves 15-30% with 1- or 3-year commitments.

Why teams overpay: SP is associated with EC2 Reserved Instances mentally. Many teams don't realize Compute SP applies to Fargate. Procurement teams don't see Fargate as commitment-eligible.

The fix: Calculate your steady-state Fargate baseline (75th percentile of last 90 days). Buy a Compute SP for that level:

  • 1-year SP: 17% discount (no upfront)
  • 1-year SP partial upfront: 19% discount
  • 3-year SP no upfront: 28% discount
  • 3-year SP all upfront: 32% discount

Real cost math:

  • $10,000/month steady-state Fargate spend, no SP
  • After 1-year Compute SP: $10,000 × 0.83 = $8,300/month
  • $1,700/month savings, $20,400/year, no operational changes

Note: Compute SP applies after Fargate Spot discount, so combining both is the best strategy. Spot saves 70%, then SP saves another 17% on the remaining baseline.


Pattern 5: Wrong vCPU/Memory Ratio (Forced Upgrades)

The trap: AWS allows specific vCPU/memory combinations for Fargate. If your task needs 0.5 vCPU and 5GB memory, AWS forces you to upgrade to 1 vCPU (because 0.5 vCPU max is 4GB). You pay double for vCPU you don't need just to get the memory.

Why teams overpay: Engineers don't know the constraint table. Memory needs grow over time, forcing surprise vCPU increases.

The fix: Reference the allowed combinations:

vCPUMemory Options
0.250.5, 1, 2 GB
0.51, 2, 3, 4 GB
12, 3, 4, 5, 6, 7, 8 GB
24, 5, 6, 7, ... 16 GB
48, 9, 10, ... 30 GB
816, 20, ... 60 GB
1632, 40, ... 120 GB

If your task definition is 1 vCPU / 4GB but actual usage is 0.3 vCPU and 3GB, you're forced into 1 vCPU because 0.5 vCPU caps at 4GB. Test if 0.5 vCPU / 4GB works: same memory, half the vCPU cost.

Real cost math:

  • 30 tasks at 1 vCPU / 4GB (forced because needed 5GB before): $1,389/month
  • After memory optimization to 4GB and reducing to 0.5 vCPU: $694/month
  • 50% savings, $695/month

This is a hidden trap: tasks were correctly sized for memory but vCPU was forced up.


Pattern 6: NAT Gateway Egress For ECR Pulls

The trap: Fargate tasks in private subnets pull container images from ECR through NAT Gateway. NAT Gateway charges $0.045/GB. For tasks pulling 1GB images, this adds ~$0.045 per task launch on top of Fargate cost.

Why teams overpay: Default VPC patterns route everything through NAT Gateway. ECR Interface Endpoint isn't enabled.

The fix: Add ECR Interface VPC Endpoint:

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.us-east-1.ecr.api \
  --vpc-endpoint-type Interface \
  --subnet-ids subnet-xxx subnet-yyy \
  --security-group-ids sg-xxx

Plus ECR DKR endpoint:

aws ec2 create-vpc-endpoint \
  --vpc-id vpc-xxx \
  --service-name com.amazonaws.us-east-1.ecr.dkr \
  --vpc-endpoint-type Interface

Plus S3 Gateway Endpoint (free) for ECR storage backend.

Real cost math:

  • 200 task launches/day × 1.5GB image × $0.045 = $13.50/day = $405/month NAT egress
  • After ECR Interface Endpoints (3 AZs × $7.30 hourly + $0.01/GB data): ~$25/month
  • $380/month savings

For high-launch-frequency workloads (autoscaling, CI/CD), this saves $1K-$10K/month easily. (Full network cost coverage in AWS Network Cost Decisions.)


Pattern 7: Excessive Ephemeral Storage Allocation

The trap: Fargate tasks default to 20GB ephemeral storage (free). You can request up to 200GB at $0.000111/GB-hour. Most teams don't realize this is configurable and accept default — or worse, request 200GB "for headroom" without measuring actual disk usage.

Why teams overpay: Defensive over-allocation. Storage seems cheap until it adds up across many tasks.

The fix: Monitor actual disk usage via Container Insights. If average usage is under 10GB, stay at default 20GB free. If you've allocated 200GB but actually use 15GB, drop to 30GB.

Real cost math:

  • 100 tasks at 200GB ephemeral storage running 24/7
  • Cost: 100 × 180GB extra × $0.000111 × 720 = $1,438/month
  • After right-sizing to 30GB extra (10GB above free):
  • Cost: 100 × 10GB × $0.000111 × 720 = $80/month
  • 94% savings on storage line item

Most workloads need under 50GB total. The 200GB max is rare.


Pattern 8: Verbose CloudWatch Logs Ingestion

The trap: Default awslogs log driver in Fargate ships every container log line to CloudWatch Logs at $0.50/GB ingested. Verbose application logging accumulates fast.

Why teams overpay: DEBUG-level logging in production. Console.log statements left in code. JSON-formatted logs that are 5x larger than necessary.

The fix:

  1. Set log level to INFO in production environments
  2. Use structured logging libraries that compress fields
  3. Use FireLens with Fluent Bit to filter logs before CloudWatch ingestion
  4. Ship verbose logs to S3 + Athena instead of CloudWatch (10x cheaper for long retention)

Real cost math:

  • 100 tasks logging 100MB/day each = 300GB/month
  • CloudWatch ingestion: 300 × $0.50 = $150/month
  • After filtering to 30GB ingestion + S3 archive: ~$30/month CloudWatch + $1/month S3
  • 80% savings on logs line item

For high-traffic services with verbose logging, this can save $1K-$5K/month easily.


Pattern 9: No Capacity Provider Strategy Mix

The trap: Most teams set ECS service to use only FARGATE capacity provider. This means 100% on-demand, even for workloads that could be 80% Spot.

Why teams overpay: Capacity provider strategy isn't part of basic Fargate setup tutorials. Teams think Spot is "EC2 only."

The fix: Configure capacity provider strategy at service or cluster level (see Pattern 3 example). For services tolerant of interruption: 80% Spot, 20% on-demand. For critical services: keep 100% on-demand but apply Compute SP.

Real cost math:

  • 80 tasks pure on-demand: $1,732/month
  • Same tasks at 80% Spot / 20% on-demand:
    • Spot tasks: 64 × $0.04048 × 0.5 × 720 × 0.30 = $280
    • On-demand tasks: 16 × $0.04048 × 0.5 × 720 = $233
    • Total: $513/month
  • 70% savings, $1,219/month

This compounds with right-sizing (Pattern 1) and ARM (Pattern 2).


Pattern 10: Idle Dev/Test Tasks Running 24/7

The trap: Engineers spin up dev/test ECS services for testing and forget to scale them down. Services keep running tasks at 24/7 cost.

Why teams overpay: No automated lifecycle for dev/test. Cost ownership unclear. Engineers move on without cleanup.

The fix: Implement scheduled scaling for dev/test environments:

  • Scale to 0 tasks at 6pm
  • Scale up at 8am weekdays
  • Stay at 0 on weekends

Use EventBridge + Application Auto Scaling for ECS:

ScheduledAction:
  Schedule: cron(0 18 ? * MON-FRI *)
  ScalableTargetId: service/dev-cluster/dev-api
  ScalableTargetAction:
    MinCapacity: 0
    MaxCapacity: 0

Real cost math:

  • 20 dev/test tasks at 0.5 vCPU / 1GB running 24/7 (720 hours): $217/month
  • Same tasks running 50 hours/week (5 days × 10 hours): $63/month
  • 71% savings, $154/month

Multiplied across 5+ dev environments = $750+/month.


The Decision Framework: 5 Questions Before Deploying Fargate

When defining a new Fargate task, ask:

Question 1: What vCPU and memory does this task actually need?

Test in staging with realistic load. Right-size from day one rather than copying defaults.

Question 2: Should this task be on ARM?

Default to ARM unless you have a specific x86 dependency. Test in staging first.

Question 3: Can this task tolerate interruption?

If yes → Fargate Spot capacity provider. If no → on-demand with Compute SP.

Question 4: Does this task need to be in a VPC?

Only if it needs private resource access. VPC adds NAT Gateway charges. Use Interface Endpoints for AWS service traffic.

Question 5: What is the appropriate logging volume?

Set log level to INFO in production. Use FireLens for filtering. Ship long-retention logs to S3, not CloudWatch.


When Fargate Is The Wrong Choice (Pick EC2 Or Lambda Instead)

Fargate isn't always the right answer. Consider alternatives when:

Switch to EC2 ECS launch type when:

  • You have steady-state workloads with predictable scale (EC2 with Reserved Instances saves 50-70%)
  • You can co-locate multiple services on EC2 instances for better packing
  • You need GPU access (limited Fargate GPU support in 2026)
  • You need access to underlying instance OS (security tooling, custom kernels)

Switch to Lambda when:

  • Your task is event-driven with sporadic execution (Lambda's 1ms billing wins)
  • Execution is under 1 minute and infrequent
  • You don't need persistent network state

Switch to Cloud Run (GCP) when:

  • You're considering moving off AWS anyway
  • Your workload benefits from per-100ms billing and concurrent request handling per container

For workloads that fit Fargate well, the 10 patterns above cut costs 45-80%. For workloads that don't fit Fargate well, see Cloud Run vs Fargate vs Lambda decision framework.


A 5-Day Fargate Cost Audit

If your Fargate bill is over $5,000/month, run this audit. Typical finding: 45-65% savings.

Day 1: Inventory

# Pull all task definitions
aws ecs list-task-definitions --status ACTIVE --query 'taskDefinitionArns' --output text

# For each: extract vCPU and memory
for td in $(aws ecs list-task-definitions --status ACTIVE --query 'taskDefinitionArns' --output text); do
  aws ecs describe-task-definition --task-definition $td --query 'taskDefinition.{cpu:cpu,memory:memory,arch:runtimePlatform.cpuArchitecture}'
done

Sort by cost (Cost Explorer with grouping by ECS service) to identify the top 20 cost drivers.

Day 2: Right-Size

For each top-20 task:

  1. Pull CPU and memory utilization from Container Insights (last 7 days)
  2. Calculate over-allocation ratio
  3. Apply right-sizing in IaC. Test in staging.
  4. Verify task definitions conform to allowed vCPU/memory combinations

Day 3: ARM Migration

For each task using x86:

  1. Verify container image supports ARM (or add multi-arch build)
  2. Update task definition: "runtimePlatform": { "cpuArchitecture": "ARM64" }
  3. Deploy to staging, validate, then production

Day 4: Spot + Capacity Providers

  1. Identify which services tolerate interruption
  2. Configure capacity provider strategy with Fargate Spot
  3. Implement graceful shutdown handlers (handle SIGTERM)
  4. Set 2-minute task drain timeout

Day 5: Compute Savings Plans + Cleanup

  1. Calculate steady-state baseline after right-sizing
  2. Buy 1-year Compute SP for the baseline
  3. Set up auto-scaling schedules for dev/test environments
  4. Document changes and lock in baseline

After 5 days, monitor for 30 days. The cost reduction shows up immediately on the next bill.


When To NOT Optimize Fargate (And Use Something Else Instead)

If you're hitting Fargate limitations, the right answer is a different compute option, not more Fargate tuning.

Switch to EC2 ECS when:

  • Steady 24/7 workloads with predictable scale
  • Need cost optimization through aggressive RI/SP
  • Multiple containers per host for better packing
  • Workloads over 16 vCPU (Fargate max)

Stay on Fargate when:

  • Variable scale (auto-scaling between 5 and 200 tasks daily)
  • No platform team to manage EC2 capacity
  • Container-native development workflow
  • Bursty traffic patterns

For workloads that fit Fargate well, the 10 patterns above cut costs 45-80%. For workloads that don't fit Fargate well, evaluate ECS on EC2 or Cloud Run alternatives.


The Bottom Line

The average Fargate bill in 2026 is 50% higher than it should be due to a small set of recurring waste patterns. Right-sizing alone accounts for 25-40% of typical waste. ARM migration adds another 20%. Fargate Spot saves 50-70% on tolerable workloads. Compute Savings Plans add another 15-30%. None of these require application code changes — they're configuration and capacity-planning fixes.

The discipline most teams skip: treating Fargate configuration as a continuous optimization, not a one-time deployment decision. Task sizes change as code evolves. ARM compatibility improves. Spot tolerance varies by workload. Audit Fargate costs every quarter.

If your Fargate bill is over $10,000/month and you haven't audited task sizing, ARM, or Spot in the last 6 months, you are very likely overpaying by 50%+. Our cloud cost optimization team runs free Fargate audits and typically captures 45-65% savings within 1 week. Run a free Cloud Waste Scorecard to find your biggest serverless cost leaks first.


Further reading:

Frequently Asked Questions

Stop Overpaying for Cloud Infrastructure

Our clients save 30-60% on their cloud bill within 90 days. Get a free Cloud Waste Assessment and see exactly where your money is going.

Related Insights

Cloud Cost Optimization
AWS Savings Plans vs Reserved Instances 2026: Pick Wrong, Lose 60% (Real Commitment Decision Framework)
May 21, 2026
AWS Savings Plans vs Reserved Instances 2026: Pick Wrong, Lose 60% (Real Commitment Decision Framework)

AWS offers four commitment types in 2026 (Compute Savings Plans, EC2 Instance Savings Plans, Standard Reserved Instances, Convertible Reserved Instances) plus SageMaker Savings Plans for ML workloads. We optimized 47 commitment portfolios in 2025-2026 and found teams consistently pick the wrong type, losing 40-60% in either savings or flexibility. This is the workload-to-commitment decision framework based on real production portfolios.

Cloud Cost Optimization
Cold Storage Showdown 2026: S3 Glacier vs Google Archive vs Azure Archive vs Wasabi vs B2 (Decision Framework)
May 21, 2026
Cold Storage Showdown 2026: S3 Glacier vs Google Archive vs Azure Archive vs Wasabi vs B2 (Decision Framework)

Most teams pick cold storage based on per-GB-month price, then get blindsided by retrieval fees, minimum durations, and access latency. We stored over 12 petabytes across 5 cold storage tiers (S3 Glacier Deep Archive, S3 Glacier Flexible/Instant Retrieval, Google Cloud Archive, Azure Archive, Wasabi, Backblaze B2) and modeled total cost across realistic compliance and DR scenarios. This is the decision framework that goes beyond storage price.

Cloud Cost Optimization
Cast AI vs Spot.io 2026: Automated Kubernetes Cost Tools Compared (We Saved a Client $720K/Year)
May 20, 2026
Cast AI vs Spot.io 2026: Automated Kubernetes Cost Tools Compared (We Saved a Client $720K/Year)

Cast AI and Spot.io are the two leading automated Kubernetes cost optimization platforms in 2026. We deployed both on production EKS clusters across 12 clients and found the cost gap for identical workloads averaged 40%. This is the head-to-head decision framework based on real production deployments, including pricing transparency that vendor pages obscure.