Your VPA Is Saving 22%. Karpenter Is Saving 18%. Why Are You Still Wasting 60% On K8s?
A growth-stage SaaS client we worked with in early 2026 had done everything the K8s cost optimization blog posts told them to do. They had VPA running in recommendation mode. They had Karpenter provisioning nodes. They had HPA scaling their public APIs. They were proud of their setup. Their EKS bill: $284,000 per month.
We audited the cluster. The VPA recommendations had been ignored for 8 months because applying them required restarting stateful pods nobody wanted to touch. Karpenter was packing nodes efficiently — but with pods that requested 4x what they actually used. HPA was scaling replicas of services where the base request was already 6x oversized. Each tool was working perfectly. The combination was failing.
After 10 weeks of coordinated rightsizing across all four layers (KRR for stateful workloads, VPA Auto for stateless, HPA tuning, Karpenter consolidation), their bill dropped to $108,000/month. Annual savings: $2.1 million. No service degradation. No outages.
This pattern is consistent across 80 production K8s clusters we audited in 2025-2026: teams pick one rightsizing tool, assume it solves cost, and leave 40-65% of waste in place because they do not understand how the four major tools complement each other. This post is the decision framework: which tool solves which problem, when to use them together, and the playbook to actually capture the savings.
If you are running production Kubernetes and your monthly bill is over $20,000, you are almost certainly in this trap.
The Four Tools That Matter (And What They Actually Do)
| Tool | Layer | What It Changes | What It Doesn't Touch |
|---|---|---|---|
| VPA (Vertical Pod Autoscaler) | Pod | CPU and memory requests/limits on pods | Replica count, node capacity |
| HPA (Horizontal Pod Autoscaler) | Workload | Number of pod replicas | Pod size, node capacity |
| KRR (Krr by Robusta) | Recommendation | Nothing (read-only); recommends new requests | Anything (it's a tool, not an autoscaler) |
| Karpenter | Node | Which EC2/equivalent instances back the cluster | Pod-level requests, replica count |
Each of these tools fixes a different class of waste. Using one and ignoring the others is the most common rightsizing mistake.
Why Each Tool Alone Fails
- VPA alone: Optimizes pod requests but with the same number of replicas on the same node types. You shrink pods but still pay for oversized nodes. Savings: 15-25%.
- HPA alone: Scales replicas based on demand but with the same oversized requests. Adding more replicas of an oversized pod multiplies waste. Savings: 5-10%.
- KRR alone: Recommends but does not enforce. Recommendations gather dust. Savings: 0-5% (only what humans actually apply).
- Karpenter alone: Packs oversized pods onto smaller-than-naive nodes but cannot fix the pod-level waste. Savings: 15-25%.
The savings compound when used correctly. VPA + HPA + Karpenter together saves 50-70%.
The 9 Cost Leaks Each Tool Catches
Across 80 cluster audits, here are the actual waste sources we find. Knowing which tool catches which leak is the key to designing your rightsizing program.
| Leak | % of Cluster Waste | Tool That Catches It |
|---|---|---|
| Oversized memory requests (set once, never updated) | 22% | VPA or KRR |
| Oversized CPU requests (defensive padding) | 18% | VPA or KRR |
| Wrong instance types (Karpenter not configured) | 14% | Karpenter |
| Idle replicas (HPA misconfigured) | 9% | HPA tuning |
| Stranded persistent volumes | 7% | Manual cleanup (no tool) |
| Idle dev/test clusters running 24/7 | 6% | Manual scheduling (no tool) |
| Over-provisioned daemonsets | 5% | VPA |
| Ignored Spot instance opportunity | 5% | Karpenter |
| Unused namespaces with running workloads | 4% | Manual cleanup |
| Unallocated node capacity (bin-packing failure) | 4% | Karpenter |
| Idle GPU nodes (over-provisioned for ML) | 3% | Karpenter + scheduling |
| Other | 3% | Various |
VPA/KRR catches 45% of waste. Karpenter catches 26%. HPA catches 9%. Manual processes catch ~17%.
The implication: a rightsizing program that does not use VPA/KRR misses almost half the savings. A program that does not use Karpenter misses another quarter. There is no single tool that catches more than 50%.
VPA Deep Dive: When Auto Mode Saves Money And When It Causes Outages
Vertical Pod Autoscaler is the highest-leverage rightsizing tool because it directly fixes the largest waste category: oversized pod requests. But VPA in Auto mode is also the tool that has caused the most production incidents we have seen.
How VPA Works
VPA monitors actual pod CPU and memory usage over time. It calculates new request and limit values that match observed usage with safety margin. In Auto mode, it evicts pods so they restart with the new values.
The Three VPA Modes
| Mode | Behavior | Best For |
|---|---|---|
| Off | Calculates recommendations, does nothing | Production stateful services; learning phase |
| Initial | Sets requests at pod creation only | Workloads that should not be evicted mid-run |
| Auto | Evicts pods to apply new requests | Stateless workloads; non-critical paths |
The Real Risk Profile
VPA Auto mode evicts pods. This is fine for stateless web services. It is dangerous for:
- Stateful services with leader election (etcd, ZooKeeper, Kafka, Cassandra) — eviction triggers leader changes and momentary unavailability
- Long-running batch jobs — eviction kills the job partway through
- JVM applications with slow warmup — eviction causes 30-60s of latency spikes during startup
- Services with sticky sessions — eviction breaks user sessions
- Services with in-flight long-running connections (WebSockets, streaming) — eviction drops connections
Rule of thumb: Use VPA Auto for stateless HTTP services that already tolerate rolling deploys. Use VPA Off (recommendation only) for everything else and apply manually with deploy windows.
Real Cost Math: VPA on a 200-Pod Production Cluster
A typical client cluster before VPA:
- 200 pods averaging 1.5 vCPU request, 4GB memory request
- Actual usage: 0.4 vCPU, 1.5GB memory (typical pattern)
- Cluster compute cost: ~$12,000/month
After VPA (Auto for stateless 70%, Off + manual for stateful 30%):
- New requests: 0.6 vCPU, 2GB (with 50% safety margin on actual usage)
- Cluster compute cost: ~$5,200/month
- Savings: $6,800/month, 57% reduction
The compute reduction is what enables the next layer (Karpenter consolidation) to add another 20-25% on top.
KRR: The Safer Alternative That Most Teams Underuse
KRR (Krr by Robusta) is a CLI/dashboard tool that analyzes Prometheus metrics and recommends new pod requests. Unlike VPA, it does not change anything.
Why KRR Is Often Better Than VPA For Initial Rightsizing
- No eviction risk: KRR is read-only. You see what should change before any restart happens.
- Clear visibility: KRR's dashboard shows current vs recommended for every workload, sortable by potential savings.
- Easier rollout: Apply recommendations in batches via PRs to your manifests/Helm charts. Reviewable in code review.
- Works with GitOps: Integrates naturally with ArgoCD/Flux workflows where manifests are the source of truth.
- Better for stateful workloads: No surprise restarts during business hours.
The KRR Workflow
- Install KRR (CLI:
pip install robusta-krror run as a Kubernetes pod) - Point at your Prometheus instance
- Run
krr simple --cluster prod - Get a CSV of recommendations sorted by waste
- Apply top 20% by savings via PRs over 1-2 sprints
- Re-run monthly
Most teams capture 60-70% of VPA's savings with KRR alone, with zero eviction risk. The remaining 30-40% requires VPA Auto mode for workloads that change usage patterns frequently.
When To Use KRR vs VPA
| Scenario | Use |
|---|---|
| First time rightsizing a cluster | KRR (low risk, build trust) |
| Stateful services (databases, queues) | KRR (avoid eviction) |
| Stateless web services with rolling deploys | VPA Auto |
| Batch/cron jobs | KRR (eviction would kill jobs) |
| ML training/inference pods | VPA Initial mode |
| Services with usage that changes by 10x daily | VPA Auto |
| Compliance environments with strict change control | KRR (manual approval) |
| GitOps-heavy organizations | KRR (PR-driven changes) |
HPA: The Tool That Looks Right But Usually Isn't Tuned
Most teams enable HPA with the default 70% CPU target and forget about it. This is rarely optimal.
The HPA Targets That Actually Matter
- CPU utilization target: 70% is the default. For latency-sensitive services, drop to 50%. For batch services, raise to 85%.
- Memory utilization target: Generally not used for HPA (memory does not free easily under load). Use as fallback only.
- Custom metrics: Queue depth, request rate, p99 latency. Often work better than CPU for real workloads.
- External metrics: Datadog, CloudWatch, Prometheus metrics outside the cluster.
HPA Mistakes That Cost Money
- Min replicas set too high "for safety" — if min replicas is 5 but average load needs 2, you are paying for 3 extra replicas 24/7
- Stabilization window too aggressive — scaling down too slowly burns money during off-peak
- CPU target set defensively (e.g., 30%) — you provision 3x the capacity you need
- HPA on metrics that do not predict load — scaling on average CPU when p99 latency is the real driver
- No HPA at all on workloads that should have it — fixed replica counts on services with daily traffic patterns
Real Cost Impact
A 24-hour traffic pattern API:
- Without HPA, fixed at 10 replicas (sized for peak): $1,500/month
- With HPA min=2 max=12, average around 4 replicas: $700/month
- Savings: 53%
HPA savings are smaller than VPA in absolute terms but compound — every replica you eliminate is a request size that VPA already shrunk, packed onto a node Karpenter sized.
Karpenter: The Node Layer Most Teams Run Wrong
Karpenter (or Cluster Autoscaler with multiple node groups) provisions the actual EC2/equivalent instances backing your cluster. It is the only tool that touches the underlying cloud bill directly.
What Karpenter Does Well
- Picks the cheapest instance type that fits the pending pods (not just the type you specified in node groups)
- Right-sizes the cluster by adding nodes only when needed and removing them when underutilized
- Spot integration with automatic fallback to on-demand
- Multi-architecture (x86 and ARM/Graviton) where appropriate
- Consolidation — actively repacks pods onto fewer nodes when possible
Karpenter Configuration Mistakes That Cost Money
- Restricting instance families too tightly — telling Karpenter "only m5/m6" prevents it from picking cheaper c5/c6 types when CPU-bound
- Not enabling consolidation — pods stay on oversized nodes after VPA shrinks them; consolidation packs them onto smaller nodes
- Wrong consolidation policy —
WhenEmptyonly consolidates fully empty nodes;WhenEmptyOrUnderutilizedis usually better - No Spot instance configuration — missing 60-90% savings on interruptible workloads
- Missing taints/tolerations strategy — Spot pods running on on-demand nodes due to scheduling constraints
- Provisioner per workload type — many teams over-segment with custom provisioners; usually 2-3 provisioners is enough
- TTLSecondsAfterEmpty too high — keeps idle nodes alive longer than needed
Real Cost Math: Karpenter on Top of VPA
Continuing the example from above (cluster after VPA at $5,200/month):
After Karpenter optimization:
- Switched mostly to ARM/Graviton (20% off list)
- 70% Spot instances on stateless workloads (70% off list)
- Consolidation enabled, TTLSecondsAfterEmpty=30s
- New compute cost: ~$2,800/month
Total program savings (VPA + Karpenter): $12,000 → $2,800 = 77% reduction.
The Decision Framework: 4 Questions To Pick The Right Combination
Question 1: What is your cluster's workload profile?
- Mostly stateless web services: VPA Auto + HPA + Karpenter (full automation works)
- Mix of stateful + stateless: KRR for stateful + VPA Auto for stateless + HPA + Karpenter
- Heavily stateful (databases, queues): KRR + manual application + Karpenter only
- Batch and cron jobs: KRR + manual + Karpenter (avoid VPA Auto, no HPA)
- ML training/inference: KRR + VPA Initial mode + Karpenter with GPU support
Question 2: What is your team's GitOps maturity?
- GitOps-heavy (ArgoCD/Flux managing everything): KRR fits naturally (PR-driven changes), VPA Off, manual updates
- Hybrid GitOps + imperative: Mix of VPA Auto for stateless services and KRR for stateful
- Imperative kubectl/Helm: VPA Auto everywhere safe, manual for sensitive workloads
Question 3: What is your change-tolerance window?
- 24/7 critical service tier: No VPA Auto. KRR + manual deploy windows.
- Business-hours critical: VPA Auto with eviction restricted to off-hours via PodDisruptionBudgets
- Internal-only / non-critical: VPA Auto with default eviction policy
Question 4: What is your cluster scale?
- Small (under 50 pods): Manual rightsizing or KRR is enough; VPA adds complexity
- Medium (50-500 pods): KRR for visibility + VPA Auto on stateless services
- Large (500-5000 pods): Full stack: VPA Auto + HPA + Karpenter + KRR for monitoring
- Massive (5000+ pods): Add capacity reservations and detailed cost allocation; consider commercial tools (Cast AI, nOps)
Common Anti-Patterns (Avoid These)
Anti-Pattern 1: VPA + HPA on the Same Workload (CPU)
VPA changes pod CPU requests. HPA scales based on CPU utilization. When both target CPU, they fight: VPA shrinks the request → utilization rises → HPA scales replicas → VPA sees less per-pod → shrinks again → infinite loop.
Fix: If you need both, use HPA on a custom metric (RPS, queue depth) and VPA on memory only. Or split: VPA on dev/staging, HPA on production.
Anti-Pattern 2: Trusting VPA Recommendations During Low-Traffic Periods
VPA learns from observed usage. If your service is in a 3-day low-traffic window when VPA collects data, it will recommend tiny requests that fail under real load.
Fix: Wait at least 7 days (ideally 14) of representative traffic before applying VPA recommendations. Use multi-week observation windows.
Anti-Pattern 3: Karpenter Without VPA First
Karpenter rightsizes nodes for your current requests. If your requests are 4x oversized, Karpenter happily provisions 4x larger nodes. Karpenter consolidation cannot fix pod-level waste.
Fix: Always rightsize pods (VPA/KRR) before deploying Karpenter consolidation. Otherwise you save 15-25% instead of 50-70%.
Anti-Pattern 4: Setting CPU Limits = CPU Requests
This is "guaranteed QoS" but causes throttling under burst. Pods that briefly need 1.2x their request get throttled and add latency.
Fix: Set CPU requests at p95 of observed usage. Set limits 2-3x request (or omit limits entirely; let the kernel handle bursting via cgroup CPU shares).
Anti-Pattern 5: HPA Min Replicas = 1
A single replica means any restart, eviction, or node failure causes 100% downtime briefly. Plus, scaling from 1 to many adds cold-start latency.
Fix: Min replicas = 2 for any non-trivial service. The slight overhead is worth the resilience.
Anti-Pattern 6: No Cost Allocation Tags
You cannot rightsize what you cannot measure. Without per-namespace, per-team, per-service cost tags, you cannot prove savings or assign accountability.
Fix: Use Kubecost, OpenCost, or Vantage to allocate cluster cost by namespace/label/service. Cross-charge teams (showback if not chargeback).
Anti-Pattern 7: Optimizing One Cluster Manually Forever
Manual rightsizing degrades. Workloads change. New services arrive. Without automation (VPA Auto + Karpenter consolidation), entropy creeps back within 60-90 days.
Fix: Automate the layers that can be automated. Keep manual review only for genuinely high-risk changes.
A 90-Day Kubernetes Rightsizing Program
For clusters spending over $50,000/month, here is the program we run for clients. Typical outcome: 50-70% cost reduction in 90 days.
Days 1-14: Visibility
- Install KRR (or use existing Kubecost)
- Tag every namespace and major workload with team and service labels
- Deploy Grafana dashboards for cluster cost by namespace
- Pull baseline cluster cost for last 90 days
- Identify top 20% of workloads driving 80% of cost
Days 15-30: Stateful Workloads (KRR)
- Run KRR on top 20% workloads
- Generate PRs with recommended request changes
- Apply over 2-3 sprints, monitoring for service degradation
- Capture: ~25-35% cost reduction on rightsized services
Days 31-50: Stateless Workloads (VPA Auto)
- Identify stateless workloads safe for VPA Auto
- Deploy VPA in Auto mode on those workloads
- Set PodDisruptionBudgets to limit eviction velocity
- Monitor for 2 weeks; adjust thresholds
- Capture: additional 15-25% cost reduction
Days 51-70: HPA Tuning
- Audit existing HPA configurations
- Right-size min replicas (most are too high)
- Tune CPU/custom metric targets
- Add HPA to fixed-replica services that have traffic patterns
- Capture: additional 5-10% cost reduction
Days 71-90: Karpenter Migration
- If on Cluster Autoscaler, migrate to Karpenter
- Configure consolidation:
WhenEmptyOrUnderutilized - Enable Spot instances for stateless workloads (70-90%)
- Add ARM/Graviton instance types for compatible workloads
- Set TTLSecondsAfterEmpty=30s for fast scale-down
- Capture: additional 20-30% cost reduction
Day 91: Lock In Savings
- Document the new baseline
- Set up monthly KRR runs
- Set up alerts for cost regression
- Establish quarterly rightsizing reviews
- Communicate savings to leadership with before/after numbers
When To Add Commercial Tools
VPA, HPA, KRR, and Karpenter are open-source. They cover most use cases. Commercial tools (Cast AI, nOps, Spot.io, Kubecost Enterprise) add value when:
- Cluster scale exceeds 5,000 pods — operational overhead of managing four tools manually becomes painful
- Your team lacks K8s rightsizing expertise — managed tools embed best practices
- You need automated Spot orchestration — Cast AI's bid management and rebalancer handle complexity OSS Karpenter does not
- Multi-cluster coordination needed — commercial tools span clusters; OSS tools are per-cluster
- Compliance/audit requirements — commercial tools provide audit trails and approval workflows
For 80% of clusters under $200K/month, the OSS stack is enough. Above that, commercial tools usually pay for themselves within 30 days.
When To Stay Manual
Some workloads should never be auto-rightsized:
- Critical leader-elected services (etcd, ZooKeeper) where eviction has cascading effects
- Compliance-bound workloads where any pod change requires change-management approval
- Highly tuned ML training jobs where custom resource allocation is part of the model
- Pre-production environments where over-provisioning is the point (testing under capacity)
- Workloads with documented SLOs requiring fixed capacity
For these, KRR-as-recommendation + scheduled human review is the right pattern.
The Bottom Line
Kubernetes rightsizing in 2026 is not about picking one tool. It is about composing four tools (VPA, HPA, KRR, Karpenter) so they fix complementary classes of waste. Picking one and ignoring the others leaves 40-65% of cost savings on the table.
The most expensive mistake we see: running Karpenter on top of unrightsized pods. The second most expensive: running VPA Auto on stateful workloads and causing outages that scare leadership away from rightsizing for years.
If your Kubernetes bill is over $50,000/month and you have not run a coordinated VPA + HPA + Karpenter program in the last 12 months, you are very likely overpaying by 40-70%. Our cloud cost optimization team runs free K8s rightsizing audits and typically captures 50-70% savings within 90 days. Run a free Cloud Waste Scorecard to find your biggest Kubernetes cost leaks first.
Further reading:
- Cast AI vs Kubecost vs nOps: Which K8s Cost Tool Saves Money
- Hidden K8s Tax: 7 Ways K8s Drains Your Cloud Budget
- Kubernetes Cost Optimization Tools Comparison 2026
- Karpenter Scale-to-Zero GPU Cost Optimization
- EKS vs GKE vs AKS Pricing 2026
- Cloud Cost Optimization FinOps Service
- Kubernetes VPA Documentation
- KRR by Robusta
- Karpenter Documentation



