We Saved One Client $720K/Year By Migrating From Spot.io To Cast AI (And One $480K/Year Going The Opposite Direction)
A growth-stage SaaS company we worked with in early 2026 had been on Spot.io for 18 months. They were paying 10% of "savings" each month — a fee that had grown from $4,000/month at signup to $14,000/month as their cluster grew. Their CTO suspected they were overpaying but couldn't get clear answers from Spot.io about how the savings baseline was calculated.
We deployed Cast AI in a parallel test cluster running identical workloads for 60 days. The results:
- Spot.io cluster: $86,000/month direct cloud cost + $14,000/month Spot.io fees = $100,000/month total
- Cast AI cluster: $52,000/month direct cloud cost + $400/month Cast AI license = $52,400/month total
Cast AI saved an additional $47,600/month (47%) by being more aggressive with bin-packing and Spot allocation. The migration to Cast AI took 3 weeks. Annual savings: $720,000.
But here's the twist: a different client running heavy Apache Spark workloads on EKS got the opposite result. Cast AI's Spot management caused unacceptable interruption patterns for their ML training. Spot.io Ocean's mature Spark integration handled checkpointing and graceful interruption better. Migrating from Cast AI to Spot.io saved them $480K/year because they could finally use 70% Spot capacity safely.
This pattern is consistent across 12 head-to-head deployments we ran in 2025-2026: Cast AI and Spot.io each have workloads where they win, and the cost gap is typically 30-50% in either direction. Picking by brand recognition or vendor demo loses you significant money.
This post is the head-to-head decision framework: which tool wins for which workload, what each one actually costs (Cast AI is transparent; Spot.io is not), and the migration playbook for moving between them.
If you're running automated Kubernetes cost optimization in 2026 and haven't compared the alternative, you are very likely overpaying by 30-50%.
The Core Architecture Difference
Cast AI and Spot.io approach automated K8s cost optimization differently. Understanding the architectural difference is the foundation of the right choice.
Cast AI: Replace The Cluster Autoscaler
Cast AI installs as a complete cluster autoscaler replacement. It takes over node provisioning decisions entirely, making real-time choices about:
- Which instance type to provision (across hundreds of options)
- Whether to use Spot or on-demand for each new node
- When to consolidate workloads onto fewer nodes
- When to evict pods to repack onto cheaper nodes
The aggressive automation is the strength: Cast AI typically delivers 50-65% K8s compute savings in 30 days with minimal configuration. The trade-off: you lose granular control. Some teams uncomfortable handing scheduling decisions to a third-party.
Spot.io: Augment Existing Auto Scaling
Spot.io (rebranded as NetApp Spot after acquisition) layers on top of existing AWS Auto Scaling Groups via Elastigroup. For Kubernetes, Spot.io's "Ocean" product replaces the cluster autoscaler but with a more conservative philosophy than Cast AI.
The integration approach is the strength: teams already on AWS Auto Scaling Groups can adopt Spot.io with minimal disruption. Ocean Insights provides recommendations rather than always taking automated action. The trade-off: less aggressive savings, more configuration burden.
The Practical Difference
| Dimension | Cast AI | Spot.io (Ocean) |
|---|---|---|
| Deployment | Replaces cluster autoscaler | Augments or replaces |
| Speed of action | Real-time | Real-time + scheduled |
| Aggressiveness | High | Medium |
| Configuration burden | Low | Medium |
| Stateful workload support | Good | Excellent (mature) |
| Multi-cloud | AWS, GCP, Azure (mature) | AWS (mature), GCP/Azure (newer) |
| Ecosystem integration | Cloud-native | NetApp ecosystem, AWS Auto Scaling |
The Real 2026 Pricing (Including The Hidden Parts)
Cast AI Pricing (Transparent)
- Free tier: Up to 50 pods, basic features
- Growth tier: $0.50/pod/month, all automation features
- Enterprise tier: Custom pricing, typically $0.75-$1.00/pod/month or 7.5-15% of savings (negotiable for large commits)
For a 200-pod cluster, Growth tier costs $100/month. For 500 pods, $250/month. For 2,000 pods, ~$1,000/month. Pricing is published and predictable.
Spot.io Pricing (Opaque)
- No published pricing
- Standard model: 10% of savings achieved
- Catch: "Savings achieved" is calculated by Spot.io against a baseline they define (typically all on-demand, no commitments). This baseline is usually higher than your actual pre-Spot.io spend, making the "savings" appear larger than reality.
- Enterprise: Flat fee contracts at $50K-$300K/year depending on cluster scale
Real-World Cost Comparison
For a typical 500-pod EKS cluster spending $50,000/month on cloud compute:
Cast AI:
- License: $250/month (500 pods × $0.50)
- Direct cost reduction: $20,000/month (40% savings)
- Net cost: $30,250/month
Spot.io (10% of savings):
- "Savings" calculated by Spot.io: typically $25,000/month (sometimes higher than reality)
- Spot.io fee: $2,500/month
- Direct cost reduction: $18,000/month (36% savings, slightly less aggressive)
- Net cost: $34,500/month
The gap: $4,250/month or $51,000/year in favor of Cast AI for this scale. At $200K/month cloud spend, the gap can be $200K+/year.
For larger enterprises (>$1M/month K8s spend) with negotiated flat-fee Spot.io contracts, the math can flip. Always demand pricing transparency before committing.
Real-World Cost Modeling: Three Production Workloads
Workload A: Stateless Web Tier (500 Pods, EKS)
A SaaS application with 500 stateless pods, mostly Node.js services on EKS:
- 70% interruption-tolerant (web/API tier)
- 30% require on-demand (checkout/critical paths)
Cast AI: 50% of pods on Spot, real-time consolidation. Typical savings: 55-62%. Spot.io Ocean: 50% of pods on Spot via Ocean. Typical savings: 45-52%.
Verdict: Cast AI wins by ~10 percentage points on identical workload. Cast AI's aggressive consolidation algorithms outperform Spot.io's more conservative defaults.
Workload B: Stateful Data Platform (Apache Spark on EKS, 1,200 Pods)
ML training workloads with heavy state:
- Spark executors that checkpoint
- Some long-running training jobs
- Need graceful interruption handling
Cast AI: Aggressive Spot causes some interruption damage. Typical savings: 25-35%. Spot.io Ocean for Apache Spark: Mature Spark integration. Graceful checkpointing. Typical savings: 50-60%.
Verdict: Spot.io wins decisively for stateful Spark workloads. Their Spark-specific Ocean product handles interruption patterns Cast AI cannot match.
Workload C: Multi-Cloud (EKS + GKE + AKS, 800 Pods Total)
Enterprise running across all three major clouds:
- Different workloads on different clouds
- Need unified visibility and policy
Cast AI: Native multi-cloud support since 2022. Single dashboard, consistent automation across clouds. Typical savings: 50-58%. Spot.io: AWS-mature, GCP/Azure newer. Multi-cloud is more a "supports it" than "optimized for it." Typical savings: 35-45%.
Verdict: Cast AI wins for multi-cloud teams. Spot.io's AWS focus shows in this scenario.
The Decision Framework: 5 Questions
Question 1: What is your workload composition?
- Mostly stateless web/API: Cast AI (more aggressive savings)
- Heavy stateful (databases, queues, leader-elected): Spot.io Ocean (better stateful handling)
- Apache Spark / ML training with checkpointing: Spot.io Ocean for Spark (mature) — Cast AI struggles
- Mix of stateless and stateful: Either works; depends on which dominates
Question 2: What is your cloud distribution?
- AWS-only: Both work; Spot.io has slight maturity edge but Cast AI is competitive
- AWS + GCP or AWS + Azure: Cast AI strongly preferred (mature multi-cloud)
- All three clouds: Cast AI is the right answer
- Single cloud + Karpenter already deployed: Question whether you need either; Karpenter + rightsizing tooling may be enough
Question 3: What is your platform engineering capacity?
- Strong platform team (3+ engineers): Karpenter + manual tuning may rival Cast AI/Spot.io for free
- Small platform team (1-2): Cast AI's "set it and forget it" pays off via reduced operational burden
- No dedicated platform team: Cast AI wins (more autopilot)
Question 4: What is your ecosystem?
- NetApp storage / Trident: Spot.io integrates better
- Existing AWS Auto Scaling Groups: Spot.io Elastigroup leverages this
- Cloud-native, no NetApp / no ASG: Cast AI is cleaner fit
- Argo CD / Helm / cloud-native CI: Cast AI integrates more naturally
Question 5: What is your cluster scale?
- Under 200 pods: Cast AI Growth tier free or very cheap; Spot.io overpriced
- 200-2000 pods: Either works; calculate total cost both ways
- Over 2000 pods: Negotiate hard with both; flat-fee enterprise pricing makes the math complex
Side-By-Side Feature Comparison
| Feature | Cast AI | Spot.io Ocean |
|---|---|---|
| Cluster autoscaler replacement | Yes | Yes |
| Real-time Spot management | Yes (best-in-class) | Yes |
| Automated rightsizing of pod requests | Yes | Recommendations only |
| Bin-packing/consolidation | Aggressive | Conservative |
| Multi-cloud | AWS, GCP, Azure | AWS (mature), GCP/Azure (newer) |
| Stateful workload (Spark) | Limited | Excellent (Ocean for Spark) |
| Integration with AWS ASG | Indirect | Native (Elastigroup) |
| GitOps friendly | Yes | Yes |
| FinOps reporting | Strong | Strong |
| Cost transparency | High (per-pod published) | Low (% of savings, opaque baseline) |
| Free tier | Yes (up to 50 pods) | No |
| Time to first savings | 7-14 days | 14-30 days |
| Engineering setup | Low | Medium |
The Hidden Costs Most Comparisons Miss
Hidden Cost 1: Spot.io's Opaque Baseline
Spot.io charges 10% of "savings achieved." But how is the baseline calculated? In practice, it's based on what your cluster would have cost on all on-demand with no Karpenter, no Spot, no commitments. Most teams already have some of these in place pre-Spot.io, so the "savings" Spot.io claims credit for include savings the team would have achieved anyway. This inflates the fee.
Mitigation: Demand baseline transparency in your contract. Get specific about what counts as "savings" vs already-realized optimization. Many teams find their actual incremental savings are 40-60% of what Spot.io reports.
Hidden Cost 2: Cast AI's Pod Eviction Storms
Cast AI's aggressive consolidation evicts pods to repack onto cheaper nodes. For teams without robust PodDisruptionBudgets, this can cause more service disruption than expected.
Mitigation: Configure PDBs on every workload before enabling Cast AI consolidation. Set conservative consolidation policies for first 30 days; tune more aggressive after measurement.
Hidden Cost 3: Multi-Cluster Configuration Burden
Both tools require per-cluster configuration. Teams running 10+ clusters face significant initial setup time. Cast AI's Terraform provider is mature; Spot.io's is also good but has more historical quirks.
Mitigation: Plan 1-2 weeks of platform engineering time per cluster for initial setup, regardless of which tool.
Hidden Cost 4: Spot.io Lock-In via Elastigroup
Spot.io Elastigroup configurations don't translate to Cast AI or other tools easily. If you adopt Spot.io heavily, migrating away is a real engineering project.
Mitigation: Avoid deep Elastigroup customization. Keep cluster configurations portable.
Hidden Cost 5: Cast AI's Per-Pod Pricing At Scale
For very large clusters (5,000+ pods), per-pod pricing can exceed enterprise flat-fee Spot.io contracts. Always model both at your projected scale.
Mitigation: Negotiate Cast AI Enterprise tier with flat-fee or lower per-pod for large commitments.
Hidden Cost 6: Tool Outages
Both tools have had availability incidents. When the tool is down, your cluster autoscaler functionality degrades. Have fallback to native AWS Cluster Autoscaler tested.
Mitigation: Test failover regularly. Document what manual intervention looks like during tool outages.
Migration Playbook: Spot.io → Cast AI (When To Switch)
If you're on Spot.io and want to evaluate Cast AI, here's the playbook:
Phase 1: Parallel Test (Weeks 1-2)
- Spin up a parallel test cluster (or namespace) with identical workload
- Install Cast AI on the test environment
- Configure equivalent automation policies
- Run for 14 days, measuring direct cloud cost
Phase 2: Cost Comparison (Week 3)
- Calculate actual cost reduction on Cast AI
- Compare against Spot.io claimed savings
- Validate against true baseline (what cluster cost with Karpenter + manual rightsizing alone)
- Calculate net savings differential
Phase 3: Production Migration (Weeks 4-6)
- If Cast AI wins by 15%+, plan migration
- Deploy Cast AI alongside Spot.io initially (Cast AI takes new pods, Spot.io continues managing existing)
- Gradually shift workloads to Cast AI nodes
- Decommission Spot.io after 30 days of clean operation
- Cancel Spot.io contract (note: 30-90 day notice typically required)
Phase 4: Tune Cast AI (Weeks 7-8)
- Adjust consolidation aggressiveness for your workload
- Configure PDBs and tolerations for stateful services
- Set up Cast AI's anomaly alerting
- Document new operational runbook
Typical outcome from Spot.io → Cast AI migrations: 25-50% additional savings on top of Spot.io baseline, plus license fee reduction (Cast AI typically cheaper as % of savings or per-pod).
Migration Playbook: Cast AI → Spot.io (Less Common, But Valid)
If you're on Cast AI and have heavy stateful workloads (Spark, ML training) where Cast AI struggles, consider Spot.io Ocean for Spark:
When To Migrate
- 30%+ of workloads are Apache Spark or ML training
- You're seeing unacceptable interruption damage on stateful workloads
- Your team has NetApp ecosystem investment
- You have AWS-only deployment with mature ASG patterns
Migration Process
- Evaluate Spot.io Ocean for Spark on test cluster (4 weeks)
- Compare interruption recovery cost vs Cast AI baseline
- If Spot.io wins on stateful workload economics, migrate stateful tier only
- Keep Cast AI for stateless tier, use Spot.io for stateful (mixed deployment is fine)
Some of our enterprise clients run both: Cast AI on stateless services, Spot.io Ocean on Spark/ML. Total tool cost is higher but workload-fit savings exceed the duplicate license cost.
When Karpenter Alone Is Enough (Don't Buy Either)
Before committing to Cast AI or Spot.io, evaluate whether free Karpenter is sufficient:
Karpenter alone delivers 70-80% of Cast AI/Spot.io savings when:
- AWS-only deployment
- Strong platform engineering team
- Workloads are mostly stateless and easy to scale
- You're willing to invest in tuning Karpenter consolidation policies, NodePool definitions, and Spot/on-demand mixes manually
When the paid tools win:
- Multi-cloud deployment
- Limited platform engineering capacity
- Need automated pod-level rightsizing (Karpenter only does node-level)
- Need real-time bin-packing across many instance types
- Cluster scale where manual tuning becomes prohibitive
We've helped clients save $200K-$2M/year migrating from Cast AI/Spot.io to well-tuned Karpenter when their workload was a fit. The reverse (Karpenter to Cast AI) is more common but Karpenter wins more often than people expect.
A 30-Day Tool Selection Process
If you're evaluating Cast AI vs Spot.io for the first time, here's the process:
Week 1: Baseline
- Calculate current cluster cost without automation tooling
- Identify what % is Spot, what % is rightsized, what's already optimized
- Document workload composition (stateless vs stateful, Spark, ML, etc.)
Week 2: Vendor Demos
- Get demos from both Cast AI and Spot.io
- Demand pricing transparency (push hard on Spot.io baseline calculation)
- Get reference customers at similar scale and workload type
- Run technical Q&A on stateful workload handling
Week 3: Free Trial / POC
- Both tools offer trials. Run them on the same test cluster sequentially (not simultaneously)
- Measure: time-to-first-savings, cluster stability, eviction rate, support quality
- Score each on your specific workload concerns
Week 4: Decision and Negotiation
- Pick the winner based on workload + cost + ecosystem fit
- Negotiate aggressively (both have list-price markup of 30-50%)
- Get pricing transparency in writing
- Document expected savings and review cadence
When To NOT Use Either Tool
Skip both Cast AI and Spot.io if:
- Cluster spend under $20K/month: License fees often exceed savings at this scale; tune Karpenter manually
- Strict compliance requiring no third-party scheduling control: Some regulated environments don't allow third-party autoscalers
- Heavily customized Karpenter with workflow-specific NodePools: Switching to Cast AI/Spot.io would lose customization investment
- No platform engineering capacity to manage tool itself: "Set and forget" sounds nice but tools require some operational care
The Bottom Line
Cast AI and Spot.io are both legitimate automated K8s cost optimization tools, and the right choice depends on workload composition, cloud distribution, and ecosystem fit. Cast AI wins for cloud-native multi-cloud teams with mostly stateless workloads. Spot.io wins for AWS-locked enterprises with NetApp ecosystem investment or heavy Spark/ML stateful workloads.
The discipline most teams skip: running a real head-to-head trial with measured cost comparison instead of accepting vendor demos at face value. The 40% cost gap we measured between the two on identical workloads goes either direction depending on workload type.
If your K8s automated cost tool was selected more than 18 months ago, the landscape has shifted enough to warrant re-evaluation. Our cloud cost optimization team runs free head-to-head Cast AI vs Spot.io evaluations and typically identifies 25-50% additional savings opportunity. Run a free Cloud Waste Scorecard to find your biggest K8s cost leaks first.
Further reading:
- Cast AI vs Kubecost vs nOps: Which K8s Cost Tool Saves Money
- Kubernetes Rightsizing: VPA vs HPA vs KRR vs Karpenter
- ECS vs EKS: The $400K Decision Most AWS Teams Get Wrong
- AWS Spot Instances: When To Use, When Not To
- Karpenter Scale-to-Zero GPU Cost Optimization
- FinOps Platforms by Cloud Spend Tier
- Cloud Cost Optimization FinOps Service
- Cast AI Pricing
- NetApp Spot (Spot.io) Documentation



