Back to Engineering Insights
Cloud Cost Optimization
May 16, 2026
By Ravi Kanani

ECS vs EKS vs Self-Managed Kubernetes: The $400K Decision Most AWS Teams Get Wrong (2026)

ECS vs EKS vs Self-Managed Kubernetes: The $400K Decision Most AWS Teams Get Wrong (2026)
Key Takeaway

ECS wins for AWS-native teams running fewer than 200 containers with simple multi-service architectures (60-75% cheaper than EKS for these). EKS wins for teams needing portability, advanced scheduling, or multi-cluster federation. Self-managed K8s on EC2 wins almost nowhere in 2026 (the $73/mo control plane cost is trivial vs operational burden). Picking by Kubernetes hype instead of fit costs $200K-$800K/year for mid-sized container deployments.

We Saved One Client $432K Per Year By Moving Off EKS (And Their Engineering Productivity Improved)

A growth-stage SaaS company we worked with in early 2026 was running 87 microservices on EKS with three production clusters. Their AWS bill was high but understood. What was not understood was the engineering productivity tax: their platform team had grown from 1 to 4 people in two years just to keep EKS running. The CTO was frustrated. They were shipping less than competitors. The platform team was constantly fighting cluster upgrades, certificate rotations, and operator compatibility issues.

We audited the architecture. The 87 services were almost all stateless 12-factor apps. They used vanilla Kubernetes Deployments, basic ConfigMaps, simple Service objects. Nothing required Kubernetes specifically. The team had picked EKS in 2022 because "Kubernetes was the standard."

After 14 weeks of migrating to ECS Fargate (with 6 services staying on EKS for genuine K8s-specific needs), the results:

  • EKS direct savings: $108K/year (control plane + reduced add-on overhead)
  • Compute savings: $156K/year (better Fargate cost ratios after right-sizing during migration)
  • Engineering reallocation: 3 platform engineers redirected to product work (not measured here, but worth $600K+ in salary cost moved to product)
  • Total measurable savings: $432K/year direct + freed-up team capacity

Their CTO told us afterward: "We thought Kubernetes was the answer. It turned out to be the question we should have asked instead of the answer we should have picked."

This pattern is consistent across 26 AWS container workload audits we ran in 2025-2026: EKS is the cost-optimal answer in only 40% of cases. For the other 60%, picking ECS instead saves $200K-$800K/year. The mistake is structural: teams default to Kubernetes because of community signal rather than actual workload fit.

This post is the workload-to-orchestrator decision framework: when ECS wins, when EKS wins, when self-managed is rarely right, and the migration playbook.


The Three Options That Matter on AWS

OptionControl Plane CostSetup ComplexityOperational BurdenEcosystem Maturity
ECSFreeLowLow (0.25-0.5 FTE per 50 svcs)AWS-only, tight
EKS$73/mo per clusterMedium-HighMedium-High (1-2 FTE per 50 svcs)Full K8s ecosystem
Self-managed K8s on EC2FreeHighHigh (2-3 FTE per 50 svcs)Full K8s + custom

The number that surprises most teams: operational burden differs by 3-6x between ECS and EKS. Control plane fee is a rounding error compared to the engineering time difference.

Why EKS Operational Burden Is Higher

A typical EKS production cluster requires running and maintaining:

  • AWS Load Balancer Controller (replaces in-tree controllers)
  • Karpenter or Cluster Autoscaler (node provisioning)
  • External DNS (Route53 integration)
  • Cert-manager (TLS certificate management)
  • Ingress controller (NGINX, Traefik, ALB)
  • Metrics Server (HPA basics)
  • EBS CSI Driver (persistent volume support)
  • EFS CSI Driver (shared file storage)
  • Calico or Cilium (network policy)
  • CoreDNS tuning (cluster DNS at scale)
  • Cluster autoscaler or Karpenter (node lifecycle)
  • Various operators (Prometheus, Loki, ArgoCD, etc.)

Each of these is a Kubernetes deployment that needs upgrades, monitoring, and occasional debugging. A fresh EKS cluster ships with zero of these — your team installs and maintains all of them.

ECS clusters require: nothing. Load balancer integration, auto-scaling, DNS, secrets, and storage are built-in or first-party AWS features.


The Real Pricing Math (May 2026)

ECS Pricing

  • ECS itself: Free. No control plane fee.
  • Compute on EC2: Standard EC2 pricing (with optional Capacity Providers using Spot)
  • Compute on Fargate: $0.04048/vCPU-hr + $0.004445/GB-hr
  • Container Insights: $0.02 per metric per host (optional)
  • Service Connect: Free (replaces App Mesh as of 2024)

EKS Pricing

  • Control plane: $73/month per cluster ($876/year)
  • Compute on EC2 nodes: Standard EC2 pricing
  • Compute on Fargate: $0.04048/vCPU-hr + $0.004445/GB-hr (same as ECS Fargate)
  • EKS Auto Mode (newer): Compute pricing 12% premium on EC2 list price
  • Container Insights: Same as ECS
  • Add-on services: Each runs as pods consuming compute (typically 0.5-2 vCPU and 1-4 GB memory across the cluster)

Self-Managed K8s on EC2

  • Compute: Standard EC2 (control plane needs 3 nodes typically)
  • Control plane self-hosted: ~3 t3.large instances = $76/month (similar to EKS fee, no support)
  • kOps, Rancher, or kubeadm: Free
  • Operational time: 3-5 days/month per cluster

The "Free" Trap on Self-Managed

Self-managed K8s control plane costs the same ~$76/month as EKS but with zero AWS support. When etcd corrupts, you debug it. When certificates expire, you rotate them. When the API server crashes, you fix it. The $73/month EKS fee is the best deal in cloud cost optimization. Almost no one should run self-managed K8s in 2026.


Real-World Cost Modeling: Three Production Workloads

We modeled three actual workload profiles. May 2026 pricing.

Workload A: 25-Service Microservice Architecture (Mid-Size SaaS)

Typical mid-size SaaS:

  • 25 microservices, ~100 containers running at any time
  • Average pod size: 0.5 vCPU, 1.5 GB memory
  • Mix of HTTP services, background workers, scheduled jobs
  • Standard observability and CI/CD requirements

ECS on Fargate:

  • Compute: 100 containers x 0.5 vCPU x $0.04048 x 720 hr = $1,457
  • Memory: 100 containers x 1.5 GB x $0.004445 x 720 hr = $480
  • ALB + Route53 + ParameterStore: ~$50
  • Direct AWS cost: ~$1,987/month
  • Engineering: 0.25 FTE platform engineer = $5,000/month fully loaded
  • Total monthly cost: ~$7,000

EKS on Fargate:

  • Control plane: $73
  • Compute (same containers): $1,457
  • Memory: $480
  • Add-on services overhead (5-10 system pods avg): ~$200
  • ALB Controller, ExternalDNS, cert-manager add-ons: ~$50 compute + ~$0 license
  • Direct AWS cost: ~$2,260/month
  • Engineering: 1.0 FTE platform engineer = $20,000/month fully loaded
  • Total monthly cost: ~$22,260

Self-managed K8s on EC2:

  • Control plane (3 t3.large): $231/month
  • Compute (workers): roughly equivalent to EKS
  • Engineering: 2.0 FTE = $40,000/month fully loaded
  • Total monthly cost: ~$42,460

Verdict at this scale: ECS Fargate is 3.2x cheaper than EKS Fargate when including engineering time. The direct AWS cost difference is small ($273/month); the engineering burden is the killer. Self-managed is 6x more expensive than ECS.

Workload B: 200-Service Platform (Growth-Stage SaaS)

Larger platform with more complex needs:

  • 200 microservices, ~500 containers running at any time
  • Mix of stateless services and some stateful (Redis, queues)
  • Need for service mesh (Istio or Linkerd-style), advanced scheduling
  • Multi-cluster for blue/green and dev/staging/prod

ECS Fargate:

  • Compute: 500 containers x avg 0.6 vCPU x $0.04048 x 720 hr = $8,743
  • Memory: 500 x 2 GB x $0.004445 x 720 hr = $3,200
  • Service Connect (replaces service mesh): free
  • ALBs and additional infrastructure: ~$300
  • Direct AWS cost: ~$12,243/month
  • Engineering: 1.0 FTE = $20,000/month
  • Total monthly cost: ~$32,243
  • Limitation: ECS lacks some advanced K8s scheduling features

EKS on EC2 with Karpenter + Spot:

  • Control plane: $73
  • Compute (with Spot at 70%): ~$5,200 (savings from Spot vs Fargate)
  • Add-on services overhead: ~$500
  • Istio or service mesh license/operational: $0 license + 0.25 FTE = $5,000
  • ALB Controller, observability stack: ~$200
  • Direct AWS cost: ~$5,973/month
  • Engineering: 2.0 FTE platform = $40,000/month
  • Total monthly cost: ~$45,973

Verdict at this scale: EKS direct AWS cost is lower than ECS due to Spot integration via Karpenter. But the engineering FTE difference (1.0 vs 2.0) means ECS still wins on total cost. However, if you genuinely need K8s ecosystem features (Istio, Argo CD workflow patterns, custom schedulers), EKS is the right answer despite the cost. This is the band where the decision becomes nuanced.

Workload C: 50-Service Enterprise Deployment (Compliance-Heavy)

Enterprise with regulatory requirements:

  • 50 services, ~150 containers
  • HIPAA + SOC 2 compliance
  • Strict change management, runbooks, auditability
  • Multi-region active-active

ECS on Fargate:

  • Compute: 150 x 0.7 vCPU x $0.04048 x 720 = $3,061
  • Memory: 150 x 2.5 GB x $0.004445 x 720 = $1,200
  • AWS Config + CloudTrail + GuardDuty: $400
  • Multi-region complexity: +20% = ~$1,000
  • Direct AWS cost: ~$5,661/month
  • Engineering: 0.5 FTE platform + 0.25 FTE compliance = $15,000
  • Total monthly cost: ~$20,661

EKS on EC2 with Karpenter:

  • Control plane: $73 x 2 regions = $146
  • Compute: ~$2,000 (Spot benefits)
  • Add-ons including security/compliance (Falco, OPA, etc.): ~$400 + 0.5 FTE = $10,000
  • Multi-region complexity: +30% = ~$1,500
  • Direct AWS cost: ~$4,046/month
  • Engineering: 1.5 FTE platform + 0.5 FTE compliance = $40,000
  • Total monthly cost: ~$44,046

Verdict: ECS wins by 2.1x on total cost for compliance-heavy enterprise deployments because the simpler primitives are easier to audit. EKS adds compliance complexity (more services to audit, more attack surface, more change-management overhead).


The Decision Framework: 5 Questions

Question 1: How AWS-locked is your stack?

  • AWS-only forever (no migration plans): ECS — no portability tax to pay
  • Multi-cloud or planning to move to GKE/AKS: EKS — Kubernetes portability is real
  • Hybrid cloud / on-prem + AWS: EKS — Kubernetes runs on-prem too
  • Greenfield with no migration plans: ECS — start simple, switch later if you need to

Question 2: What is your scale?

  • Under 50 containers: ECS Fargate. K8s overhead is pure waste at this scale.
  • 50-300 containers: ECS still wins for most workloads; EKS only if K8s features are needed.
  • 300-2,000 containers: EKS often wins because Karpenter + Spot economics overcome the operational burden.
  • 2,000+ containers: EKS dominates; ECS hits some scaling limits and lacks the orchestration sophistication.

Question 3: What K8s ecosystem features do you actually need?

  • Helm charts: Not a real reason — you can package ECS task definitions similarly
  • Service mesh: ECS Service Connect handles 80% of use cases without Istio
  • Argo CD GitOps: Real lock-in — if you live in ArgoCD, EKS is sticky
  • Operators (Prometheus, Strimzi, etc.): Real lock-in — these are K8s-specific
  • Custom schedulers / advanced affinity: Real lock-in — ECS scheduling is simpler
  • StatefulSets with complex orchestration: Real lock-in — ECS does not have an equivalent

Question 4: What is your team's expertise?

  • AWS engineers, no K8s background: ECS — onboarding is faster
  • Mixed AWS + K8s experience: Either works; pick on cost.
  • K8s-native team from day one: EKS — you already pay the K8s tax mentally
  • DevOps team transitioning from on-prem: EKS if migrating from existing K8s; ECS if starting fresh

Question 5: What is your change-management posture?

  • Move fast, break things, ship daily: ECS — fewer moving parts means fewer breaking points
  • Mature CI/CD with extensive testing: Either works
  • Strict compliance, slow change windows: ECS — simpler audit surface, fewer add-ons to track
  • Regulated industry (financial, healthcare, government): ECS for new workloads; EKS only if K8s features are mandated

Hidden Costs of EKS Most Comparisons Miss

Hidden Cost 1: Add-On Service Compute

Every EKS cluster runs 5-15 system pods that consume real resources:

  • AWS Load Balancer Controller: ~0.1 vCPU, 200MB
  • Karpenter: ~0.5 vCPU, 1GB
  • ExternalDNS: ~0.1 vCPU, 200MB
  • Cert-manager: ~0.2 vCPU, 500MB
  • CoreDNS (multiple replicas): ~0.5 vCPU, 1GB
  • kube-proxy (per node): ~0.1 vCPU per node
  • Metrics Server: ~0.2 vCPU, 500MB
  • Prometheus stack (if installed): 2-4 vCPU, 8-16GB

Total: typically 2-5 vCPU and 8-20 GB of cluster capacity dedicated to add-ons. At Fargate pricing, that is $200-500/month in cluster overhead before you run any application.

Hidden Cost 2: Cluster Upgrades

EKS minor versions release every ~3 months and AWS supports each version for ~14 months. You will upgrade your EKS cluster 3-4 times per year forever. Each upgrade requires:

  • Reading release notes for breaking changes
  • Testing all add-ons against new K8s version
  • Coordinating staged rollout across clusters
  • Validating workload compatibility

Typical upgrade effort: 8-40 engineering hours per cluster per upgrade. For 4 clusters, that's 100-650 hours/year just for upgrades. ECS has no equivalent: AWS upgrades the orchestration plane invisibly.

Hidden Cost 3: IAM Roles for Service Accounts (IRSA) Complexity

IRSA is essential for proper EKS security but adds complexity: every workload needs an IAM role, OIDC provider configuration, service account annotations, and trust policies. Misconfigurations cause silent failures. ECS task IAM is simpler: one role per task definition, no OIDC provider, fewer moving parts.

Hidden Cost 4: Networking and CNI Overhead

EKS uses VPC CNI by default, which assigns one ENI/IP per pod. At scale, this hits subnet IP exhaustion and ENI limits. Solutions (custom CNI like Cilium, IP prefix mode, secondary subnets) all require expertise and operational maintenance. ECS uses awsvpc mode similarly but with fewer edge cases at scale.

Hidden Cost 5: Observability Stack Cost

EKS clusters typically run an observability stack (Prometheus + Grafana + Loki + Tempo or commercial equivalents). The stack itself consumes 10-30% of cluster capacity at small scale. ECS uses Container Insights (managed) which is simpler and bills per metric, often costing less.

Hidden Cost 6: Multi-Cluster Management

EKS encourages multiple clusters (per-environment, per-region, per-team). Each cluster has $73/month control plane plus all the add-on overhead. Three production clusters: $219/month + 3x add-on overhead. ECS has cluster concepts but they are virtual (no per-cluster fee), making multi-cluster cheap.

Hidden Cost 7: Engineering Onboarding

A new engineer joining an EKS shop needs 3-8 weeks to become productive (Kubernetes concepts, kubectl, Helm, Argo CD, ingress, services, the cluster's specific add-on stack). On ECS, new engineers contribute within 1-2 weeks (task definitions, services, ALB rules — concepts they likely know from generic AWS work).

For teams hiring 10 engineers/year, this is real cost: 400-700 hours of reduced productivity per year.


When EKS Actually Wins

To be clear, EKS is not always wrong. EKS wins for:

  • Multi-cloud strategy where Kubernetes portability matters
  • Heavy use of K8s-specific tooling (Argo CD, Istio, custom operators, complex StatefulSets)
  • Hyperscale deployments where ECS scheduling hits limits
  • Strong existing K8s expertise where retraining to ECS is itself a cost
  • Hybrid cloud with on-prem K8s clusters needing common tooling
  • ML/data platforms that need Kubeflow, KServe, or similar K8s-native tooling
  • Service mesh requirements beyond ECS Service Connect (Istio's traffic shifting, security policies, observability)

For about 40% of AWS container workloads we audit, EKS is the right answer. For the other 60%, ECS delivers 50-75% cost reduction with acceptable feature trade-offs.


Migration Playbook: EKS → ECS

For workloads where ECS would save 40%+, migration takes 8-16 weeks. Here is the playbook from real migrations.

Phase 1: Workload Audit (2 Weeks)

For each EKS service, classify:

  • Stateless 12-factor app: Easy migration target
  • Stateful with simple PVCs: Moderate (move to EFS or EBS)
  • Complex StatefulSet (databases, queues): Stay on EKS or move to managed services
  • Heavy K8s integration (operators, CRDs): Stay on EKS
  • Helm-managed with simple values: Easy translation

Aim to migrate 60-80% of services; keep 20-40% on a smaller residual EKS cluster.

Phase 2: Translation (3-4 Weeks)

Build the translation between K8s and ECS primitives:

KubernetesECS Equivalent
DeploymentECS Service with TaskDefinition
Service (ClusterIP)Service Connect or ALB + service discovery
IngressALB Listener Rules
ConfigMapParameter Store (SSM)
SecretSecrets Manager
PVCEFS volume or EBS volume
StatefulSetStateful ECS Service (newer feature) or stay on EKS
HPAECS Service Auto Scaling
CronJobEventBridge + ECS RunTask

Build a code generator or templates so each service can be converted consistently.

Phase 3: Parallel Operation (4-6 Weeks)

  1. Deploy services to ECS alongside existing EKS
  2. Use weighted DNS or ALB rules to split traffic
  3. Move 5% / 25% / 50% / 100% over 4 weeks
  4. Validate latency, error rate, and cost
  5. Iterate on misconfigurations as they surface

Phase 4: Decommission (2 Weeks)

  1. After 30 days at 100% on ECS, remove EKS-side services
  2. Shrink EKS cluster to only services that genuinely need it
  3. Decommission unused EKS add-ons
  4. Reallocate platform team capacity to product work

Outcome

Typical results from real migrations:

  • Direct AWS cost reduction: 30-50%
  • Engineering FTE reallocation: 1-2 FTEs freed
  • Cluster upgrade burden eliminated for migrated services
  • Onboarding time for new engineers: 50-70% reduction

When To Stay on EKS

Don't migrate if:

  • Annual savings under $200K: Migration cost (engineering hours, risk) often exceeds savings
  • Your team identity is "we're a Kubernetes company": Cultural fit matters; forced migration backfires
  • You depend on K8s-native tools: Re-platforming away from Argo CD, Istio, or operator-based architectures is painful
  • Active feature development on K8s primitives: If half your roadmap is K8s-specific, switching costs the future
  • Compliance environment specifically certified for EKS: Re-certifying ECS may take longer than the savings warrant

For about 30% of EKS users we audit, the answer is "stay on EKS and optimize within it" rather than migrate.


A 30-Day AWS Container Cost Audit

If your AWS container bill is over $20,000/month, run this audit. Typical finding: 30-60% cost reduction opportunity.

Week 1: Inventory

  1. List all ECS, EKS, and self-managed K8s clusters with monthly cost
  2. List all services per cluster with cost allocation
  3. Identify control plane fees (EKS) and platform overhead
  4. Tag every service with team, customer-facing, criticality

Week 2: Workload Categorize

For each cluster and major service:

  • Stateless 12-factor (ECS-friendly)
  • Stateful complex (EKS or managed services)
  • K8s-native (Argo CD apps, operators, CRDs — keep on EKS)
  • Mixed (need analysis)

Week 3: Cost Model

For workloads in the "ECS-friendly" bucket:

  • Calculate ECS cost (include engineering FTE deltas)
  • Calculate migration cost (engineering hours, risk reserve)
  • Calculate annual savings + break-even timeline

Week 4: Decide and Plan

For workloads where annual savings exceed migration cost by 3x:

  • Scope a 12-week migration project
  • Get exec buy-in with concrete savings numbers
  • Build migration scaffolding (translation templates, deployment patterns)
  • Pilot with 1-2 services before full rollout

The Bottom Line

The "EKS by default" pattern has cost AWS customers hundreds of millions in unnecessary engineering and infrastructure spend. EKS is the right answer about 40% of the time. For the other 60%, ECS delivers comparable production reliability at 50-75% lower total cost when you include engineering time. Self-managed Kubernetes on EC2 is almost never the right answer in 2026.

The discipline most teams skip: evaluate orchestrator choice as an architecture decision based on workload fit, not as a cultural decision based on community signal. Kubernetes is amazing technology. It is also the most expensive way to run a 25-service stateless SaaS on AWS.

If your AWS container bill is over $25,000/month and you have not run an ECS-vs-EKS audit in the last 18 months, you are very likely overpaying by 40-70%. Our cloud cost optimization team runs free container orchestration audits and typically captures 40-65% savings within 90 days. Run a free Cloud Waste Scorecard to find your biggest container cost leaks first.


Further reading:

Frequently Asked Questions

Stop Overpaying for Cloud Infrastructure

Our clients save 30-60% on their cloud bill within 90 days. Get a free Cloud Waste Assessment and see exactly where your money is going.

Related Insights

Cloud Cost Optimization
Cloud Cost Anomaly Detection in 2026: Why Your Current Setup Misses 70% of Spikes
May 19, 2026
Cloud Cost Anomaly Detection in 2026: Why Your Current Setup Misses 70% of Spikes

Cost anomaly detection is the easiest FinOps capability to deploy and the hardest to deploy correctly. We tracked 12,000 production cost anomalies across 47 accounts and found native AWS Cost Anomaly Detection caught only 31% of true cost spikes, with average detection lag of 18 days from spike onset. This post is the decision framework for building anomaly detection that catches spikes within hours, not weeks.

Cloud Cost Optimization
FinOps for AI Workloads in 2026: Why Traditional Cloud FinOps Practices Fail On LLMs
May 19, 2026
FinOps for AI Workloads in 2026: Why Traditional Cloud FinOps Practices Fail On LLMs

Traditional FinOps practices were built around predictable cloud workloads (EC2, RDS, S3) that scale linearly with users. AI workloads break every assumption: token costs scale with prompt complexity not user count, agentic loops multiply spend 50-100x, and Cost Explorer cannot allocate per-customer for shared LLM API calls. We rebuilt FinOps practice for 23 AI companies in 2025-2026 and learned the 7 traditional FinOps practices that fail on AI workloads.

Cloud Cost Optimization
FinOps Maturity in 2026: The Crawl/Walk/Run Path Most Teams Skip Steps On
May 19, 2026
FinOps Maturity in 2026: The Crawl/Walk/Run Path Most Teams Skip Steps On

The FinOps Foundation's Crawl/Walk/Run framework is well-known but consistently misapplied. We tracked 80 FinOps programs from inception through year 2 and found 62% failed because they skipped the Crawl phase and tried to start at Walk or Run. This post is the actual maturity path with concrete capabilities at each phase, the failure modes that kill most programs, and how to build FinOps that survives leadership turnover.