What is the difference between VPA, HPA, and Karpenter?

VPA (Vertical Pod Autoscaler) adjusts CPU and memory requests on individual pods so they match actual usage. HPA (Horizontal Pod Autoscaler) changes the number of pod replicas based on metrics like CPU or custom signals. Karpenter rightsizes the underlying nodes by provisioning the cheapest, smallest instances that fit your scheduled pods. They operate at three different layers (request size, replica count, node capacity) and the right strategy uses all three together. Using only one leaves 30-60% waste.

Should I use VPA or KRR for Kubernetes rightsizing?

KRR (Krr by Robusta) is a recommendation-only tool that analyzes Prometheus metrics and suggests new requests, but does not change them. VPA can run in recommendation mode (Off) or auto-update mode (Auto). For stateful services where pod restarts are expensive, KRR plus manual updates is safer. For stateless services, VPA in Auto mode is faster. Most production teams use KRR for stateful workloads and VPA Auto for stateless ones. Running both together (VPA in Off mode for recommendations, KRR for the dashboard view) is also common.

Does Karpenter eliminate the need for VPA or HPA?

No. Karpenter rightsizes nodes but does nothing about pod-level waste. If your pods request 4x more CPU than they use, Karpenter will dutifully provision 4x more node capacity than you actually need. Karpenter and VPA solve complementary problems: VPA shrinks requests, then Karpenter packs them efficiently onto smaller nodes. Using Karpenter without rightsizing pod requests first usually saves 15-25%. Combining Karpenter + VPA + HPA saves 50-70%.

How much can Kubernetes rightsizing save in 2026?

For most production clusters we audit, properly applied rightsizing saves 40-65% of cluster cost. The breakdown is typically: VPA on pod requests saves 25-35% by aligning requests to actual usage, HPA optimization saves 5-10% by avoiding over-replication, Karpenter saves 20-30% by packing pods onto smaller cheaper nodes (especially with Spot instances). The savings are not additive in a simple way; some compound, some overlap. A complete program typically takes 6-12 weeks and saves $200K-$2M annually for clusters over $50K/month.

When should I not use VPA in auto mode?

Avoid VPA Auto mode for: stateful workloads where pod restarts cause data loss or split-brain conditions, services with leader election where restarts trigger failovers, JVM applications with long warmup times where restarts cause latency spikes, batch jobs that cannot tolerate eviction mid-run, and services running with sticky sessions or in-flight long-running connections. For these workloads, use VPA in Off (recommendation only) mode or KRR with scheduled human-driven updates.

Back to Engineering Insights

Cloud Cost Optimization

May 16, 2026

By Ravi Kanani

Kubernetes Rightsizing in 2026: Why VPA, HPA, KRR, and Karpenter Each Solve Different Problems

Key Takeaway

VPA rightsizes pod requests but causes pod restarts that break stateful services. HPA scales replicas horizontally but cannot fix oversized base requests. KRR (Krr) recommends but does not enforce; safe but requires human follow-through. Karpenter rightsizes nodes but does nothing about pod-level waste. The right rightsizing strategy uses all four together with a clear decision matrix. Picking one tool and ignoring the others is the most common mistake we see in K8s cost audits.

Your VPA Is Saving 22%. Karpenter Is Saving 18%. Why Are You Still Wasting 60% On K8s?

A growth-stage SaaS client we worked with in early 2026 had done everything the K8s cost optimization blog posts told them to do. They had VPA running in recommendation mode. They had Karpenter provisioning nodes. They had HPA scaling their public APIs. They were proud of their setup. Their EKS bill: $284,000 per month.

We audited the cluster. The VPA recommendations had been ignored for 8 months because applying them required restarting stateful pods nobody wanted to touch. Karpenter was packing nodes efficiently — but with pods that requested 4x what they actually used. HPA was scaling replicas of services where the base request was already 6x oversized. Each tool was working perfectly. The combination was failing.

After 10 weeks of coordinated rightsizing across all four layers (KRR for stateful workloads, VPA Auto for stateless, HPA tuning, Karpenter consolidation), their bill dropped to $108,000/month. Annual savings: $2.1 million. No service degradation. No outages.

This pattern is consistent across 80 production K8s clusters we audited in 2025-2026: teams pick one rightsizing tool, assume it solves cost, and leave 40-65% of waste in place because they do not understand how the four major tools complement each other. This post is the decision framework: which tool solves which problem, when to use them together, and the playbook to actually capture the savings.

If you are running production Kubernetes and your monthly bill is over $20,000, you are almost certainly in this trap.

The Four Tools That Matter (And What They Actually Do)

Tool	Layer	What It Changes	What It Doesn't Touch
VPA (Vertical Pod Autoscaler)	Pod	CPU and memory requests/limits on pods	Replica count, node capacity
HPA (Horizontal Pod Autoscaler)	Workload	Number of pod replicas	Pod size, node capacity
KRR (Krr by Robusta)	Recommendation	Nothing (read-only); recommends new requests	Anything (it's a tool, not an autoscaler)
Karpenter	Node	Which EC2/equivalent instances back the cluster	Pod-level requests, replica count

Each of these tools fixes a different class of waste. Using one and ignoring the others is the most common rightsizing mistake.

Why Each Tool Alone Fails

VPA alone: Optimizes pod requests but with the same number of replicas on the same node types. You shrink pods but still pay for oversized nodes. Savings: 15-25%.
HPA alone: Scales replicas based on demand but with the same oversized requests. Adding more replicas of an oversized pod multiplies waste. Savings: 5-10%.
KRR alone: Recommends but does not enforce. Recommendations gather dust. Savings: 0-5% (only what humans actually apply).
Karpenter alone: Packs oversized pods onto smaller-than-naive nodes but cannot fix the pod-level waste. Savings: 15-25%.

The savings compound when used correctly. VPA + HPA + Karpenter together saves 50-70%.

The 9 Cost Leaks Each Tool Catches

Across 80 cluster audits, here are the actual waste sources we find. Knowing which tool catches which leak is the key to designing your rightsizing program.

Leak	% of Cluster Waste	Tool That Catches It
Oversized memory requests (set once, never updated)	22%	VPA or KRR
Oversized CPU requests (defensive padding)	18%	VPA or KRR
Wrong instance types (Karpenter not configured)	14%	Karpenter
Idle replicas (HPA misconfigured)	9%	HPA tuning
Stranded persistent volumes	7%	Manual cleanup (no tool)
Idle dev/test clusters running 24/7	6%	Manual scheduling (no tool)
Over-provisioned daemonsets	5%	VPA
Ignored Spot instance opportunity	5%	Karpenter
Unused namespaces with running workloads	4%	Manual cleanup
Unallocated node capacity (bin-packing failure)	4%	Karpenter
Idle GPU nodes (over-provisioned for ML)	3%	Karpenter + scheduling
Other	3%	Various

VPA/KRR catches 45% of waste. Karpenter catches 26%. HPA catches 9%. Manual processes catch ~17%.

The implication: a rightsizing program that does not use VPA/KRR misses almost half the savings. A program that does not use Karpenter misses another quarter. There is no single tool that catches more than 50%.

VPA Deep Dive: When Auto Mode Saves Money And When It Causes Outages

Vertical Pod Autoscaler is the highest-leverage rightsizing tool because it directly fixes the largest waste category: oversized pod requests. But VPA in Auto mode is also the tool that has caused the most production incidents we have seen.

How VPA Works

VPA monitors actual pod CPU and memory usage over time. It calculates new request and limit values that match observed usage with safety margin. In Auto mode, it evicts pods so they restart with the new values.

The Three VPA Modes

Mode	Behavior	Best For
Off	Calculates recommendations, does nothing	Production stateful services; learning phase
Initial	Sets requests at pod creation only	Workloads that should not be evicted mid-run
Auto	Evicts pods to apply new requests	Stateless workloads; non-critical paths

The Real Risk Profile

VPA Auto mode evicts pods. This is fine for stateless web services. It is dangerous for:

Stateful services with leader election (etcd, ZooKeeper, Kafka, Cassandra) — eviction triggers leader changes and momentary unavailability
Long-running batch jobs — eviction kills the job partway through
JVM applications with slow warmup — eviction causes 30-60s of latency spikes during startup
Services with sticky sessions — eviction breaks user sessions
Services with in-flight long-running connections (WebSockets, streaming) — eviction drops connections

Rule of thumb: Use VPA Auto for stateless HTTP services that already tolerate rolling deploys. Use VPA Off (recommendation only) for everything else and apply manually with deploy windows.

Real Cost Math: VPA on a 200-Pod Production Cluster

A typical client cluster before VPA:

200 pods averaging 1.5 vCPU request, 4GB memory request
Actual usage: 0.4 vCPU, 1.5GB memory (typical pattern)
Cluster compute cost: ~$12,000/month

After VPA (Auto for stateless 70%, Off + manual for stateful 30%):

New requests: 0.6 vCPU, 2GB (with 50% safety margin on actual usage)
Cluster compute cost: ~$5,200/month
Savings: $6,800/month, 57% reduction

The compute reduction is what enables the next layer (Karpenter consolidation) to add another 20-25% on top.

KRR: The Safer Alternative That Most Teams Underuse

KRR (Krr by Robusta) is a CLI/dashboard tool that analyzes Prometheus metrics and recommends new pod requests. Unlike VPA, it does not change anything.

Why KRR Is Often Better Than VPA For Initial Rightsizing

No eviction risk: KRR is read-only. You see what should change before any restart happens.
Clear visibility: KRR's dashboard shows current vs recommended for every workload, sortable by potential savings.
Easier rollout: Apply recommendations in batches via PRs to your manifests/Helm charts. Reviewable in code review.
Works with GitOps: Integrates naturally with ArgoCD/Flux workflows where manifests are the source of truth.
Better for stateful workloads: No surprise restarts during business hours.

The KRR Workflow

Install KRR (CLI: pip install robusta-krr or run as a Kubernetes pod)
Point at your Prometheus instance
Run krr simple --cluster prod
Get a CSV of recommendations sorted by waste
Apply top 20% by savings via PRs over 1-2 sprints
Re-run monthly

Most teams capture 60-70% of VPA's savings with KRR alone, with zero eviction risk. The remaining 30-40% requires VPA Auto mode for workloads that change usage patterns frequently.

When To Use KRR vs VPA

Scenario	Use
First time rightsizing a cluster	KRR (low risk, build trust)
Stateful services (databases, queues)	KRR (avoid eviction)
Stateless web services with rolling deploys	VPA Auto
Batch/cron jobs	KRR (eviction would kill jobs)
ML training/inference pods	VPA Initial mode
Services with usage that changes by 10x daily	VPA Auto
Compliance environments with strict change control	KRR (manual approval)
GitOps-heavy organizations	KRR (PR-driven changes)

HPA: The Tool That Looks Right But Usually Isn't Tuned

Most teams enable HPA with the default 70% CPU target and forget about it. This is rarely optimal.

The HPA Targets That Actually Matter

CPU utilization target: 70% is the default. For latency-sensitive services, drop to 50%. For batch services, raise to 85%.
Memory utilization target: Generally not used for HPA (memory does not free easily under load). Use as fallback only.
Custom metrics: Queue depth, request rate, p99 latency. Often work better than CPU for real workloads.
External metrics: Datadog, CloudWatch, Prometheus metrics outside the cluster.

HPA Mistakes That Cost Money

Min replicas set too high "for safety" — if min replicas is 5 but average load needs 2, you are paying for 3 extra replicas 24/7
Stabilization window too aggressive — scaling down too slowly burns money during off-peak
CPU target set defensively (e.g., 30%) — you provision 3x the capacity you need
HPA on metrics that do not predict load — scaling on average CPU when p99 latency is the real driver
No HPA at all on workloads that should have it — fixed replica counts on services with daily traffic patterns

Real Cost Impact

A 24-hour traffic pattern API:

Without HPA, fixed at 10 replicas (sized for peak): $1,500/month
With HPA min=2 max=12, average around 4 replicas: $700/month
Savings: 53%

HPA savings are smaller than VPA in absolute terms but compound — every replica you eliminate is a request size that VPA already shrunk, packed onto a node Karpenter sized.

Karpenter: The Node Layer Most Teams Run Wrong

Karpenter (or Cluster Autoscaler with multiple node groups) provisions the actual EC2/equivalent instances backing your cluster. It is the only tool that touches the underlying cloud bill directly.

What Karpenter Does Well

Picks the cheapest instance type that fits the pending pods (not just the type you specified in node groups)
Right-sizes the cluster by adding nodes only when needed and removing them when underutilized
Spot integration with automatic fallback to on-demand
Multi-architecture (x86 and ARM/Graviton) where appropriate
Consolidation — actively repacks pods onto fewer nodes when possible

Karpenter Configuration Mistakes That Cost Money

Restricting instance families too tightly — telling Karpenter "only m5/m6" prevents it from picking cheaper c5/c6 types when CPU-bound
Not enabling consolidation — pods stay on oversized nodes after VPA shrinks them; consolidation packs them onto smaller nodes
Wrong consolidation policy — WhenEmpty only consolidates fully empty nodes; WhenEmptyOrUnderutilized is usually better
No Spot instance configuration — missing 60-90% savings on interruptible workloads
Missing taints/tolerations strategy — Spot pods running on on-demand nodes due to scheduling constraints
Provisioner per workload type — many teams over-segment with custom provisioners; usually 2-3 provisioners is enough
TTLSecondsAfterEmpty too high — keeps idle nodes alive longer than needed

Real Cost Math: Karpenter on Top of VPA

Continuing the example from above (cluster after VPA at $5,200/month):

After Karpenter optimization:

Switched mostly to ARM/Graviton (20% off list)
70% Spot instances on stateless workloads (70% off list)
Consolidation enabled, TTLSecondsAfterEmpty=30s
New compute cost: ~$2,800/month

Total program savings (VPA + Karpenter): $12,000 → $2,800 = 77% reduction.

The Decision Framework: 4 Questions To Pick The Right Combination

Question 1: What is your cluster's workload profile?

Mostly stateless web services: VPA Auto + HPA + Karpenter (full automation works)
Mix of stateful + stateless: KRR for stateful + VPA Auto for stateless + HPA + Karpenter
Heavily stateful (databases, queues): KRR + manual application + Karpenter only
Batch and cron jobs: KRR + manual + Karpenter (avoid VPA Auto, no HPA)
ML training/inference: KRR + VPA Initial mode + Karpenter with GPU support

Question 2: What is your team's GitOps maturity?

GitOps-heavy (ArgoCD/Flux managing everything): KRR fits naturally (PR-driven changes), VPA Off, manual updates
Hybrid GitOps + imperative: Mix of VPA Auto for stateless services and KRR for stateful
Imperative kubectl/Helm: VPA Auto everywhere safe, manual for sensitive workloads

Question 3: What is your change-tolerance window?

24/7 critical service tier: No VPA Auto. KRR + manual deploy windows.
Business-hours critical: VPA Auto with eviction restricted to off-hours via PodDisruptionBudgets
Internal-only / non-critical: VPA Auto with default eviction policy

Question 4: What is your cluster scale?

Small (under 50 pods): Manual rightsizing or KRR is enough; VPA adds complexity
Medium (50-500 pods): KRR for visibility + VPA Auto on stateless services
Large (500-5000 pods): Full stack: VPA Auto + HPA + Karpenter + KRR for monitoring
Massive (5000+ pods): Add capacity reservations and detailed cost allocation; consider commercial tools (Cast AI, nOps)

Common Anti-Patterns (Avoid These)

Anti-Pattern 1: VPA + HPA on the Same Workload (CPU)

VPA changes pod CPU requests. HPA scales based on CPU utilization. When both target CPU, they fight: VPA shrinks the request → utilization rises → HPA scales replicas → VPA sees less per-pod → shrinks again → infinite loop.

Fix: If you need both, use HPA on a custom metric (RPS, queue depth) and VPA on memory only. Or split: VPA on dev/staging, HPA on production.

Anti-Pattern 2: Trusting VPA Recommendations During Low-Traffic Periods

VPA learns from observed usage. If your service is in a 3-day low-traffic window when VPA collects data, it will recommend tiny requests that fail under real load.

Fix: Wait at least 7 days (ideally 14) of representative traffic before applying VPA recommendations. Use multi-week observation windows.

Anti-Pattern 3: Karpenter Without VPA First

Karpenter rightsizes nodes for your current requests. If your requests are 4x oversized, Karpenter happily provisions 4x larger nodes. Karpenter consolidation cannot fix pod-level waste.

Fix: Always rightsize pods (VPA/KRR) before deploying Karpenter consolidation. Otherwise you save 15-25% instead of 50-70%.

Anti-Pattern 4: Setting CPU Limits = CPU Requests

This is "guaranteed QoS" but causes throttling under burst. Pods that briefly need 1.2x their request get throttled and add latency.

Fix: Set CPU requests at p95 of observed usage. Set limits 2-3x request (or omit limits entirely; let the kernel handle bursting via cgroup CPU shares).

Anti-Pattern 5: HPA Min Replicas = 1

A single replica means any restart, eviction, or node failure causes 100% downtime briefly. Plus, scaling from 1 to many adds cold-start latency.

Fix: Min replicas = 2 for any non-trivial service. The slight overhead is worth the resilience.

Anti-Pattern 6: No Cost Allocation Tags

You cannot rightsize what you cannot measure. Without per-namespace, per-team, per-service cost tags, you cannot prove savings or assign accountability.

Fix: Use Kubecost, OpenCost, or Vantage to allocate cluster cost by namespace/label/service. Cross-charge teams (showback if not chargeback).

Anti-Pattern 7: Optimizing One Cluster Manually Forever

Manual rightsizing degrades. Workloads change. New services arrive. Without automation (VPA Auto + Karpenter consolidation), entropy creeps back within 60-90 days.

Fix: Automate the layers that can be automated. Keep manual review only for genuinely high-risk changes.

A 90-Day Kubernetes Rightsizing Program

For clusters spending over $50,000/month, here is the program we run for clients. Typical outcome: 50-70% cost reduction in 90 days.

Days 1-14: Visibility

Install KRR (or use existing Kubecost)
Tag every namespace and major workload with team and service labels
Deploy Grafana dashboards for cluster cost by namespace
Pull baseline cluster cost for last 90 days
Identify top 20% of workloads driving 80% of cost

Days 15-30: Stateful Workloads (KRR)

Run KRR on top 20% workloads
Generate PRs with recommended request changes
Apply over 2-3 sprints, monitoring for service degradation
Capture: ~25-35% cost reduction on rightsized services

Days 31-50: Stateless Workloads (VPA Auto)

Identify stateless workloads safe for VPA Auto
Deploy VPA in Auto mode on those workloads
Set PodDisruptionBudgets to limit eviction velocity
Monitor for 2 weeks; adjust thresholds
Capture: additional 15-25% cost reduction

Days 51-70: HPA Tuning

Audit existing HPA configurations
Right-size min replicas (most are too high)
Tune CPU/custom metric targets
Add HPA to fixed-replica services that have traffic patterns
Capture: additional 5-10% cost reduction

Days 71-90: Karpenter Migration

If on Cluster Autoscaler, migrate to Karpenter
Configure consolidation: WhenEmptyOrUnderutilized
Enable Spot instances for stateless workloads (70-90%)
Add ARM/Graviton instance types for compatible workloads
Set TTLSecondsAfterEmpty=30s for fast scale-down
Capture: additional 20-30% cost reduction

Day 91: Lock In Savings

Document the new baseline
Set up monthly KRR runs
Set up alerts for cost regression
Establish quarterly rightsizing reviews
Communicate savings to leadership with before/after numbers

When To Add Commercial Tools

VPA, HPA, KRR, and Karpenter are open-source. They cover most use cases. Commercial tools (Cast AI, nOps, Spot.io, Kubecost Enterprise) add value when:

Cluster scale exceeds 5,000 pods — operational overhead of managing four tools manually becomes painful
Your team lacks K8s rightsizing expertise — managed tools embed best practices
You need automated Spot orchestration — Cast AI's bid management and rebalancer handle complexity OSS Karpenter does not
Multi-cluster coordination needed — commercial tools span clusters; OSS tools are per-cluster
Compliance/audit requirements — commercial tools provide audit trails and approval workflows

For 80% of clusters under $200K/month, the OSS stack is enough. Above that, commercial tools usually pay for themselves within 30 days.

When To Stay Manual

Some workloads should never be auto-rightsized:

Critical leader-elected services (etcd, ZooKeeper) where eviction has cascading effects
Compliance-bound workloads where any pod change requires change-management approval
Highly tuned ML training jobs where custom resource allocation is part of the model
Pre-production environments where over-provisioning is the point (testing under capacity)
Workloads with documented SLOs requiring fixed capacity

For these, KRR-as-recommendation + scheduled human review is the right pattern.

The Bottom Line

Kubernetes rightsizing in 2026 is not about picking one tool. It is about composing four tools (VPA, HPA, KRR, Karpenter) so they fix complementary classes of waste. Picking one and ignoring the others leaves 40-65% of cost savings on the table.

The most expensive mistake we see: running Karpenter on top of unrightsized pods. The second most expensive: running VPA Auto on stateful workloads and causing outages that scare leadership away from rightsizing for years.

If your Kubernetes bill is over $50,000/month and you have not run a coordinated VPA + HPA + Karpenter program in the last 12 months, you are very likely overpaying by 40-70%. Our cloud cost optimization team runs free K8s rightsizing audits and typically captures 50-70% savings within 90 days. Run a free Cloud Waste Scorecard to find your biggest Kubernetes cost leaks first.

Further reading:

Frequently Asked Questions

Stop Overpaying for Cloud Infrastructure

Our clients save 30-60% on their cloud bill within 90 days. Get a free Cloud Waste Assessment and see exactly where your money is going.

Free Cloud Waste Assessment Our Services

Related Insights

View All

Cloud Cost Optimization

May 19, 2026

Cloud Cost Anomaly Detection in 2026: Why Your Current Setup Misses 70% of Spikes

Cost anomaly detection is the easiest FinOps capability to deploy and the hardest to deploy correctly. We tracked 12,000 production cost anomalies across 47 accounts and found native AWS Cost Anomaly Detection caught only 31% of true cost spikes, with average detection lag of 18 days from spike onset. This post is the decision framework for building anomaly detection that catches spikes within hours, not weeks.

Cloud Cost Optimization

May 19, 2026

FinOps for AI Workloads in 2026: Why Traditional Cloud FinOps Practices Fail On LLMs

Traditional FinOps practices were built around predictable cloud workloads (EC2, RDS, S3) that scale linearly with users. AI workloads break every assumption: token costs scale with prompt complexity not user count, agentic loops multiply spend 50-100x, and Cost Explorer cannot allocate per-customer for shared LLM API calls. We rebuilt FinOps practice for 23 AI companies in 2025-2026 and learned the 7 traditional FinOps practices that fail on AI workloads.

Cloud Cost Optimization

May 19, 2026

FinOps Maturity in 2026: The Crawl/Walk/Run Path Most Teams Skip Steps On

The FinOps Foundation's Crawl/Walk/Run framework is well-known but consistently misapplied. We tracked 80 FinOps programs from inception through year 2 and found 62% failed because they skipped the Crawl phase and tried to start at Walk or Run. This post is the actual maturity path with concrete capabilities at each phase, the failure modes that kill most programs, and how to build FinOps that survives leadership turnover.

View All Insights