Back to Engineering Insights
Cloud Cost Optimization
May 16, 2026
By Ravi Kanani

Kubernetes Rightsizing in 2026: Why VPA, HPA, KRR, and Karpenter Each Solve Different Problems

Kubernetes Rightsizing in 2026: Why VPA, HPA, KRR, and Karpenter Each Solve Different Problems
Key Takeaway

VPA rightsizes pod requests but causes pod restarts that break stateful services. HPA scales replicas horizontally but cannot fix oversized base requests. KRR (Krr) recommends but does not enforce; safe but requires human follow-through. Karpenter rightsizes nodes but does nothing about pod-level waste. The right rightsizing strategy uses all four together with a clear decision matrix. Picking one tool and ignoring the others is the most common mistake we see in K8s cost audits.

Your VPA Is Saving 22%. Karpenter Is Saving 18%. Why Are You Still Wasting 60% On K8s?

A growth-stage SaaS client we worked with in early 2026 had done everything the K8s cost optimization blog posts told them to do. They had VPA running in recommendation mode. They had Karpenter provisioning nodes. They had HPA scaling their public APIs. They were proud of their setup. Their EKS bill: $284,000 per month.

We audited the cluster. The VPA recommendations had been ignored for 8 months because applying them required restarting stateful pods nobody wanted to touch. Karpenter was packing nodes efficiently — but with pods that requested 4x what they actually used. HPA was scaling replicas of services where the base request was already 6x oversized. Each tool was working perfectly. The combination was failing.

After 10 weeks of coordinated rightsizing across all four layers (KRR for stateful workloads, VPA Auto for stateless, HPA tuning, Karpenter consolidation), their bill dropped to $108,000/month. Annual savings: $2.1 million. No service degradation. No outages.

This pattern is consistent across 80 production K8s clusters we audited in 2025-2026: teams pick one rightsizing tool, assume it solves cost, and leave 40-65% of waste in place because they do not understand how the four major tools complement each other. This post is the decision framework: which tool solves which problem, when to use them together, and the playbook to actually capture the savings.

If you are running production Kubernetes and your monthly bill is over $20,000, you are almost certainly in this trap.


The Four Tools That Matter (And What They Actually Do)

ToolLayerWhat It ChangesWhat It Doesn't Touch
VPA (Vertical Pod Autoscaler)PodCPU and memory requests/limits on podsReplica count, node capacity
HPA (Horizontal Pod Autoscaler)WorkloadNumber of pod replicasPod size, node capacity
KRR (Krr by Robusta)RecommendationNothing (read-only); recommends new requestsAnything (it's a tool, not an autoscaler)
KarpenterNodeWhich EC2/equivalent instances back the clusterPod-level requests, replica count

Each of these tools fixes a different class of waste. Using one and ignoring the others is the most common rightsizing mistake.

Why Each Tool Alone Fails

  • VPA alone: Optimizes pod requests but with the same number of replicas on the same node types. You shrink pods but still pay for oversized nodes. Savings: 15-25%.
  • HPA alone: Scales replicas based on demand but with the same oversized requests. Adding more replicas of an oversized pod multiplies waste. Savings: 5-10%.
  • KRR alone: Recommends but does not enforce. Recommendations gather dust. Savings: 0-5% (only what humans actually apply).
  • Karpenter alone: Packs oversized pods onto smaller-than-naive nodes but cannot fix the pod-level waste. Savings: 15-25%.

The savings compound when used correctly. VPA + HPA + Karpenter together saves 50-70%.


The 9 Cost Leaks Each Tool Catches

Across 80 cluster audits, here are the actual waste sources we find. Knowing which tool catches which leak is the key to designing your rightsizing program.

Leak% of Cluster WasteTool That Catches It
Oversized memory requests (set once, never updated)22%VPA or KRR
Oversized CPU requests (defensive padding)18%VPA or KRR
Wrong instance types (Karpenter not configured)14%Karpenter
Idle replicas (HPA misconfigured)9%HPA tuning
Stranded persistent volumes7%Manual cleanup (no tool)
Idle dev/test clusters running 24/76%Manual scheduling (no tool)
Over-provisioned daemonsets5%VPA
Ignored Spot instance opportunity5%Karpenter
Unused namespaces with running workloads4%Manual cleanup
Unallocated node capacity (bin-packing failure)4%Karpenter
Idle GPU nodes (over-provisioned for ML)3%Karpenter + scheduling
Other3%Various

VPA/KRR catches 45% of waste. Karpenter catches 26%. HPA catches 9%. Manual processes catch ~17%.

The implication: a rightsizing program that does not use VPA/KRR misses almost half the savings. A program that does not use Karpenter misses another quarter. There is no single tool that catches more than 50%.


VPA Deep Dive: When Auto Mode Saves Money And When It Causes Outages

Vertical Pod Autoscaler is the highest-leverage rightsizing tool because it directly fixes the largest waste category: oversized pod requests. But VPA in Auto mode is also the tool that has caused the most production incidents we have seen.

How VPA Works

VPA monitors actual pod CPU and memory usage over time. It calculates new request and limit values that match observed usage with safety margin. In Auto mode, it evicts pods so they restart with the new values.

The Three VPA Modes

ModeBehaviorBest For
OffCalculates recommendations, does nothingProduction stateful services; learning phase
InitialSets requests at pod creation onlyWorkloads that should not be evicted mid-run
AutoEvicts pods to apply new requestsStateless workloads; non-critical paths

The Real Risk Profile

VPA Auto mode evicts pods. This is fine for stateless web services. It is dangerous for:

  • Stateful services with leader election (etcd, ZooKeeper, Kafka, Cassandra) — eviction triggers leader changes and momentary unavailability
  • Long-running batch jobs — eviction kills the job partway through
  • JVM applications with slow warmup — eviction causes 30-60s of latency spikes during startup
  • Services with sticky sessions — eviction breaks user sessions
  • Services with in-flight long-running connections (WebSockets, streaming) — eviction drops connections

Rule of thumb: Use VPA Auto for stateless HTTP services that already tolerate rolling deploys. Use VPA Off (recommendation only) for everything else and apply manually with deploy windows.

Real Cost Math: VPA on a 200-Pod Production Cluster

A typical client cluster before VPA:

  • 200 pods averaging 1.5 vCPU request, 4GB memory request
  • Actual usage: 0.4 vCPU, 1.5GB memory (typical pattern)
  • Cluster compute cost: ~$12,000/month

After VPA (Auto for stateless 70%, Off + manual for stateful 30%):

  • New requests: 0.6 vCPU, 2GB (with 50% safety margin on actual usage)
  • Cluster compute cost: ~$5,200/month
  • Savings: $6,800/month, 57% reduction

The compute reduction is what enables the next layer (Karpenter consolidation) to add another 20-25% on top.


KRR: The Safer Alternative That Most Teams Underuse

KRR (Krr by Robusta) is a CLI/dashboard tool that analyzes Prometheus metrics and recommends new pod requests. Unlike VPA, it does not change anything.

Why KRR Is Often Better Than VPA For Initial Rightsizing

  • No eviction risk: KRR is read-only. You see what should change before any restart happens.
  • Clear visibility: KRR's dashboard shows current vs recommended for every workload, sortable by potential savings.
  • Easier rollout: Apply recommendations in batches via PRs to your manifests/Helm charts. Reviewable in code review.
  • Works with GitOps: Integrates naturally with ArgoCD/Flux workflows where manifests are the source of truth.
  • Better for stateful workloads: No surprise restarts during business hours.

The KRR Workflow

  1. Install KRR (CLI: pip install robusta-krr or run as a Kubernetes pod)
  2. Point at your Prometheus instance
  3. Run krr simple --cluster prod
  4. Get a CSV of recommendations sorted by waste
  5. Apply top 20% by savings via PRs over 1-2 sprints
  6. Re-run monthly

Most teams capture 60-70% of VPA's savings with KRR alone, with zero eviction risk. The remaining 30-40% requires VPA Auto mode for workloads that change usage patterns frequently.

When To Use KRR vs VPA

ScenarioUse
First time rightsizing a clusterKRR (low risk, build trust)
Stateful services (databases, queues)KRR (avoid eviction)
Stateless web services with rolling deploysVPA Auto
Batch/cron jobsKRR (eviction would kill jobs)
ML training/inference podsVPA Initial mode
Services with usage that changes by 10x dailyVPA Auto
Compliance environments with strict change controlKRR (manual approval)
GitOps-heavy organizationsKRR (PR-driven changes)

HPA: The Tool That Looks Right But Usually Isn't Tuned

Most teams enable HPA with the default 70% CPU target and forget about it. This is rarely optimal.

The HPA Targets That Actually Matter

  • CPU utilization target: 70% is the default. For latency-sensitive services, drop to 50%. For batch services, raise to 85%.
  • Memory utilization target: Generally not used for HPA (memory does not free easily under load). Use as fallback only.
  • Custom metrics: Queue depth, request rate, p99 latency. Often work better than CPU for real workloads.
  • External metrics: Datadog, CloudWatch, Prometheus metrics outside the cluster.

HPA Mistakes That Cost Money

  1. Min replicas set too high "for safety" — if min replicas is 5 but average load needs 2, you are paying for 3 extra replicas 24/7
  2. Stabilization window too aggressive — scaling down too slowly burns money during off-peak
  3. CPU target set defensively (e.g., 30%) — you provision 3x the capacity you need
  4. HPA on metrics that do not predict load — scaling on average CPU when p99 latency is the real driver
  5. No HPA at all on workloads that should have it — fixed replica counts on services with daily traffic patterns

Real Cost Impact

A 24-hour traffic pattern API:

  • Without HPA, fixed at 10 replicas (sized for peak): $1,500/month
  • With HPA min=2 max=12, average around 4 replicas: $700/month
  • Savings: 53%

HPA savings are smaller than VPA in absolute terms but compound — every replica you eliminate is a request size that VPA already shrunk, packed onto a node Karpenter sized.


Karpenter: The Node Layer Most Teams Run Wrong

Karpenter (or Cluster Autoscaler with multiple node groups) provisions the actual EC2/equivalent instances backing your cluster. It is the only tool that touches the underlying cloud bill directly.

What Karpenter Does Well

  • Picks the cheapest instance type that fits the pending pods (not just the type you specified in node groups)
  • Right-sizes the cluster by adding nodes only when needed and removing them when underutilized
  • Spot integration with automatic fallback to on-demand
  • Multi-architecture (x86 and ARM/Graviton) where appropriate
  • Consolidation — actively repacks pods onto fewer nodes when possible

Karpenter Configuration Mistakes That Cost Money

  1. Restricting instance families too tightly — telling Karpenter "only m5/m6" prevents it from picking cheaper c5/c6 types when CPU-bound
  2. Not enabling consolidation — pods stay on oversized nodes after VPA shrinks them; consolidation packs them onto smaller nodes
  3. Wrong consolidation policyWhenEmpty only consolidates fully empty nodes; WhenEmptyOrUnderutilized is usually better
  4. No Spot instance configuration — missing 60-90% savings on interruptible workloads
  5. Missing taints/tolerations strategy — Spot pods running on on-demand nodes due to scheduling constraints
  6. Provisioner per workload type — many teams over-segment with custom provisioners; usually 2-3 provisioners is enough
  7. TTLSecondsAfterEmpty too high — keeps idle nodes alive longer than needed

Real Cost Math: Karpenter on Top of VPA

Continuing the example from above (cluster after VPA at $5,200/month):

After Karpenter optimization:

  • Switched mostly to ARM/Graviton (20% off list)
  • 70% Spot instances on stateless workloads (70% off list)
  • Consolidation enabled, TTLSecondsAfterEmpty=30s
  • New compute cost: ~$2,800/month

Total program savings (VPA + Karpenter): $12,000 → $2,800 = 77% reduction.


The Decision Framework: 4 Questions To Pick The Right Combination

Question 1: What is your cluster's workload profile?

  • Mostly stateless web services: VPA Auto + HPA + Karpenter (full automation works)
  • Mix of stateful + stateless: KRR for stateful + VPA Auto for stateless + HPA + Karpenter
  • Heavily stateful (databases, queues): KRR + manual application + Karpenter only
  • Batch and cron jobs: KRR + manual + Karpenter (avoid VPA Auto, no HPA)
  • ML training/inference: KRR + VPA Initial mode + Karpenter with GPU support

Question 2: What is your team's GitOps maturity?

  • GitOps-heavy (ArgoCD/Flux managing everything): KRR fits naturally (PR-driven changes), VPA Off, manual updates
  • Hybrid GitOps + imperative: Mix of VPA Auto for stateless services and KRR for stateful
  • Imperative kubectl/Helm: VPA Auto everywhere safe, manual for sensitive workloads

Question 3: What is your change-tolerance window?

  • 24/7 critical service tier: No VPA Auto. KRR + manual deploy windows.
  • Business-hours critical: VPA Auto with eviction restricted to off-hours via PodDisruptionBudgets
  • Internal-only / non-critical: VPA Auto with default eviction policy

Question 4: What is your cluster scale?

  • Small (under 50 pods): Manual rightsizing or KRR is enough; VPA adds complexity
  • Medium (50-500 pods): KRR for visibility + VPA Auto on stateless services
  • Large (500-5000 pods): Full stack: VPA Auto + HPA + Karpenter + KRR for monitoring
  • Massive (5000+ pods): Add capacity reservations and detailed cost allocation; consider commercial tools (Cast AI, nOps)

Common Anti-Patterns (Avoid These)

Anti-Pattern 1: VPA + HPA on the Same Workload (CPU)

VPA changes pod CPU requests. HPA scales based on CPU utilization. When both target CPU, they fight: VPA shrinks the request → utilization rises → HPA scales replicas → VPA sees less per-pod → shrinks again → infinite loop.

Fix: If you need both, use HPA on a custom metric (RPS, queue depth) and VPA on memory only. Or split: VPA on dev/staging, HPA on production.

Anti-Pattern 2: Trusting VPA Recommendations During Low-Traffic Periods

VPA learns from observed usage. If your service is in a 3-day low-traffic window when VPA collects data, it will recommend tiny requests that fail under real load.

Fix: Wait at least 7 days (ideally 14) of representative traffic before applying VPA recommendations. Use multi-week observation windows.

Anti-Pattern 3: Karpenter Without VPA First

Karpenter rightsizes nodes for your current requests. If your requests are 4x oversized, Karpenter happily provisions 4x larger nodes. Karpenter consolidation cannot fix pod-level waste.

Fix: Always rightsize pods (VPA/KRR) before deploying Karpenter consolidation. Otherwise you save 15-25% instead of 50-70%.

Anti-Pattern 4: Setting CPU Limits = CPU Requests

This is "guaranteed QoS" but causes throttling under burst. Pods that briefly need 1.2x their request get throttled and add latency.

Fix: Set CPU requests at p95 of observed usage. Set limits 2-3x request (or omit limits entirely; let the kernel handle bursting via cgroup CPU shares).

Anti-Pattern 5: HPA Min Replicas = 1

A single replica means any restart, eviction, or node failure causes 100% downtime briefly. Plus, scaling from 1 to many adds cold-start latency.

Fix: Min replicas = 2 for any non-trivial service. The slight overhead is worth the resilience.

Anti-Pattern 6: No Cost Allocation Tags

You cannot rightsize what you cannot measure. Without per-namespace, per-team, per-service cost tags, you cannot prove savings or assign accountability.

Fix: Use Kubecost, OpenCost, or Vantage to allocate cluster cost by namespace/label/service. Cross-charge teams (showback if not chargeback).

Anti-Pattern 7: Optimizing One Cluster Manually Forever

Manual rightsizing degrades. Workloads change. New services arrive. Without automation (VPA Auto + Karpenter consolidation), entropy creeps back within 60-90 days.

Fix: Automate the layers that can be automated. Keep manual review only for genuinely high-risk changes.


A 90-Day Kubernetes Rightsizing Program

For clusters spending over $50,000/month, here is the program we run for clients. Typical outcome: 50-70% cost reduction in 90 days.

Days 1-14: Visibility

  1. Install KRR (or use existing Kubecost)
  2. Tag every namespace and major workload with team and service labels
  3. Deploy Grafana dashboards for cluster cost by namespace
  4. Pull baseline cluster cost for last 90 days
  5. Identify top 20% of workloads driving 80% of cost

Days 15-30: Stateful Workloads (KRR)

  1. Run KRR on top 20% workloads
  2. Generate PRs with recommended request changes
  3. Apply over 2-3 sprints, monitoring for service degradation
  4. Capture: ~25-35% cost reduction on rightsized services

Days 31-50: Stateless Workloads (VPA Auto)

  1. Identify stateless workloads safe for VPA Auto
  2. Deploy VPA in Auto mode on those workloads
  3. Set PodDisruptionBudgets to limit eviction velocity
  4. Monitor for 2 weeks; adjust thresholds
  5. Capture: additional 15-25% cost reduction

Days 51-70: HPA Tuning

  1. Audit existing HPA configurations
  2. Right-size min replicas (most are too high)
  3. Tune CPU/custom metric targets
  4. Add HPA to fixed-replica services that have traffic patterns
  5. Capture: additional 5-10% cost reduction

Days 71-90: Karpenter Migration

  1. If on Cluster Autoscaler, migrate to Karpenter
  2. Configure consolidation: WhenEmptyOrUnderutilized
  3. Enable Spot instances for stateless workloads (70-90%)
  4. Add ARM/Graviton instance types for compatible workloads
  5. Set TTLSecondsAfterEmpty=30s for fast scale-down
  6. Capture: additional 20-30% cost reduction

Day 91: Lock In Savings

  • Document the new baseline
  • Set up monthly KRR runs
  • Set up alerts for cost regression
  • Establish quarterly rightsizing reviews
  • Communicate savings to leadership with before/after numbers

When To Add Commercial Tools

VPA, HPA, KRR, and Karpenter are open-source. They cover most use cases. Commercial tools (Cast AI, nOps, Spot.io, Kubecost Enterprise) add value when:

  • Cluster scale exceeds 5,000 pods — operational overhead of managing four tools manually becomes painful
  • Your team lacks K8s rightsizing expertise — managed tools embed best practices
  • You need automated Spot orchestration — Cast AI's bid management and rebalancer handle complexity OSS Karpenter does not
  • Multi-cluster coordination needed — commercial tools span clusters; OSS tools are per-cluster
  • Compliance/audit requirements — commercial tools provide audit trails and approval workflows

For 80% of clusters under $200K/month, the OSS stack is enough. Above that, commercial tools usually pay for themselves within 30 days.


When To Stay Manual

Some workloads should never be auto-rightsized:

  • Critical leader-elected services (etcd, ZooKeeper) where eviction has cascading effects
  • Compliance-bound workloads where any pod change requires change-management approval
  • Highly tuned ML training jobs where custom resource allocation is part of the model
  • Pre-production environments where over-provisioning is the point (testing under capacity)
  • Workloads with documented SLOs requiring fixed capacity

For these, KRR-as-recommendation + scheduled human review is the right pattern.


The Bottom Line

Kubernetes rightsizing in 2026 is not about picking one tool. It is about composing four tools (VPA, HPA, KRR, Karpenter) so they fix complementary classes of waste. Picking one and ignoring the others leaves 40-65% of cost savings on the table.

The most expensive mistake we see: running Karpenter on top of unrightsized pods. The second most expensive: running VPA Auto on stateful workloads and causing outages that scare leadership away from rightsizing for years.

If your Kubernetes bill is over $50,000/month and you have not run a coordinated VPA + HPA + Karpenter program in the last 12 months, you are very likely overpaying by 40-70%. Our cloud cost optimization team runs free K8s rightsizing audits and typically captures 50-70% savings within 90 days. Run a free Cloud Waste Scorecard to find your biggest Kubernetes cost leaks first.


Further reading:

Frequently Asked Questions

Stop Overpaying for Cloud Infrastructure

Our clients save 30-60% on their cloud bill within 90 days. Get a free Cloud Waste Assessment and see exactly where your money is going.

Related Insights

Cloud Cost Optimization
Cloud Cost Anomaly Detection in 2026: Why Your Current Setup Misses 70% of Spikes
May 19, 2026
Cloud Cost Anomaly Detection in 2026: Why Your Current Setup Misses 70% of Spikes

Cost anomaly detection is the easiest FinOps capability to deploy and the hardest to deploy correctly. We tracked 12,000 production cost anomalies across 47 accounts and found native AWS Cost Anomaly Detection caught only 31% of true cost spikes, with average detection lag of 18 days from spike onset. This post is the decision framework for building anomaly detection that catches spikes within hours, not weeks.

Cloud Cost Optimization
FinOps for AI Workloads in 2026: Why Traditional Cloud FinOps Practices Fail On LLMs
May 19, 2026
FinOps for AI Workloads in 2026: Why Traditional Cloud FinOps Practices Fail On LLMs

Traditional FinOps practices were built around predictable cloud workloads (EC2, RDS, S3) that scale linearly with users. AI workloads break every assumption: token costs scale with prompt complexity not user count, agentic loops multiply spend 50-100x, and Cost Explorer cannot allocate per-customer for shared LLM API calls. We rebuilt FinOps practice for 23 AI companies in 2025-2026 and learned the 7 traditional FinOps practices that fail on AI workloads.

Cloud Cost Optimization
FinOps Maturity in 2026: The Crawl/Walk/Run Path Most Teams Skip Steps On
May 19, 2026
FinOps Maturity in 2026: The Crawl/Walk/Run Path Most Teams Skip Steps On

The FinOps Foundation's Crawl/Walk/Run framework is well-known but consistently misapplied. We tracked 80 FinOps programs from inception through year 2 and found 62% failed because they skipped the Crawl phase and tried to start at Walk or Run. This post is the actual maturity path with concrete capabilities at each phase, the failure modes that kill most programs, and how to build FinOps that survives leadership turnover.