Back to Engineering Insights
Cloud Cost Optimization
Apr 3, 2026
By Ravi Kanani

Zilliz Cloud vs Self-Hosted Milvus: 2-4x Cost Gap at 100M+ Vectors (Modeled)

Zilliz Cloud vs Self-Hosted Milvus: 2-4x Cost Gap at 100M+ Vectors (Modeled)
Key Takeaway

Zilliz Cloud (managed Milvus) charges $0.096 per CU-hour for compute and $0.02/GB/month for storage in 2026. For 1 million vectors at 1536 dimensions, expect $80-150/month on Zilliz Serverless. At 10M vectors with moderate traffic, costs run $250-500/month. At 100M+ vectors, self-hosted Milvus on Kubernetes ($300-600/month) is 2-4x cheaper than Zilliz Dedicated ($600-1,200/month). Zilliz shines for distributed workloads above 50M vectors where its segment-based architecture outperforms competitors on horizontal scaling.

The Distributed Vector Database Built for Billions of Vectors (Priced Like It Too)

Milvus is the vector database you graduate to. Most teams start with Pinecone (simple, managed, fast to set up) or Qdrant (cheap, fast, Rust-native). Then their dataset hits 50 million vectors, they need multi-node distribution, and they discover that neither Pinecone nor Qdrant was designed for true horizontal scaling the way Milvus was.

Milvus was built from day one as a distributed system. It separates storage (MinIO/S3), metadata (etcd), message queuing (Pulsar/Kafka), and query/index nodes. Each layer scales independently. That architecture is overkill for 1 million vectors. It is exactly right for 1 billion.

The managed version is Zilliz Cloud, built by the same team that maintains the open-source Milvus project. Zilliz Cloud handles all the distributed complexity (etcd clusters, MinIO backends, node orchestration) and charges you in Compute Units.

We have deployed Milvus at LeanOps for clients running large-scale recommendation systems, image similarity search, and multi-modal RAG pipelines. The pattern is consistent: teams below 20M vectors are usually better served by Pinecone or Qdrant. Teams above 50M vectors where horizontal scaling and high availability matter? Milvus wins on architecture and total cost of ownership.

This post gives you the complete Zilliz Cloud pricing in 2026, models real costs at production scale, compares against every alternative, and provides a framework for deciding when Milvus justifies its added complexity.


Zilliz Cloud Pricing in 2026: Complete Breakdown

Zilliz Cloud offers three deployment models: Serverless, Dedicated, and Enterprise. Each uses Compute Units (CUs) as the core billing metric.

Deployment Options

PlanTargetArchitectureStarting Cost
Free TierPrototypingServerless, shared$0 (100 CU-hours + 5GB)
ServerlessVariable workloadsAuto-scaling, multi-tenant$0.096/CU-hour + storage
DedicatedProductionSingle-tenant, fixed capacityFrom 1 CU ($70/month)
EnterpriseLarge-scaleCustom, SLA, VPCCustom pricing

Compute Pricing (CU-Hours)

ComponentServerless RateDedicated RateNotes
Query CU$0.096/CU-hour$0.096/CU-hour (reserved)Scales with query volume and complexity
Index CU$0.096/CU-hourIncluded in clusterConsumed during index building
Ingestion CU$0.096/CU-hourIncluded in clusterConsumed during data loading

How Compute Units work: A CU represents a unit of compute capacity (roughly equivalent to 1 vCPU + 4GB RAM). On Serverless, CUs scale automatically based on demand. On Dedicated, you provision a fixed number of CUs that run continuously.

The math:

  • 1 CU running 24/7 for a month = 720 CU-hours = $69.12/month
  • 2 CUs for a month = $138.24/month
  • 4 CUs for a month = $276.48/month

On Serverless, you only pay for CU-hours consumed during active queries and indexing. Idle time costs nothing for compute (you still pay storage).

Storage Pricing

Storage TypeRateNotes
Standard (SSD)$0.02/GB/monthVector data + metadata + indexes
Warm storage$0.008/GB/monthInfrequently accessed collections
Backup storage$0.015/GB/monthAutomated snapshots

Storage math for vectors (1536 dimensions, float32):

Vector CountRaw Data SizeWith Index OverheadMonthly Storage Cost
1M6.1 GB~12 GB$0.24
10M61 GB~120 GB$2.40
50M305 GB~600 GB$12.00
100M610 GB~1.2 TB$24.00

Index overhead roughly doubles the raw vector size because Milvus stores both the raw vectors and the index structures (IVF, HNSW, or DiskANN depending on configuration).

Free Tier Details

FeatureLimit
Compute100 CU-hours/month
Storage5 GB
Vector capacity~820K vectors (1536-dim)
CollectionsUnlimited
API rate limitModerate
DurationPermanent (no expiration)
Regions1 (US, EU, or Asia)

The free tier is generous compared to competitors. Pinecone gives 2GB of storage. Qdrant gives 1GB of RAM. Zilliz gives 5GB of storage plus 100 CU-hours of compute. For prototyping a RAG application with under 500K vectors, the free tier lasts months without hitting limits.

Additional Costs

FeatureCost
Cross-region replication2x compute + storage in each region
GPU-accelerated search$0.35/GPU-CU-hour (T4) to $1.20/GPU-CU-hour (A100)
VPC peeringEnterprise plan only
Priority supportEnterprise plan (custom)
RBAC/Security featuresIncluded in all plans
Monitoring (Grafana dashboards)Included

Real-World Cost Modeling: What Zilliz Cloud Actually Costs

Scenario 1: Small RAG Application (1M Vectors, Low Traffic)

A knowledge base, FAQ search, or small product catalog with 1M vectors at 1536 dimensions, queried a few hundred times per day.

ComponentCalculationMonthly Cost
Storage12 GB x $0.02$0.24
Compute (Serverless)~3-5 CU-hours/day x 30 x $0.096$9-14
Backups12 GB x $0.015$0.18
Total (Serverless, minimal traffic)~$10-15/month
Total (Serverless, moderate traffic)~$80-150/month

The range is wide because compute dominates and scales directly with query volume and complexity. A few hundred simple searches per day stays in the $10-15 range. Add filtering, hybrid search, or 10,000+ daily queries and costs jump to $80-150.

Comparison at this scale:

  • Pinecone Serverless: $35-60/month
  • Qdrant Cloud: $25-45/month (or free tier for very low traffic)
  • Weaviate Cloud: $45-80/month
  • Self-hosted Milvus: $40-80/month (but massive overengineering for 1M vectors)

Verdict: Milvus is overbuilt for 1M vectors. Use Pinecone or Qdrant instead.

Scenario 2: Production Workload (10M Vectors, Moderate Traffic)

A recommendation engine, semantic search platform, or enterprise RAG with 10M vectors and 10,000-50,000 queries per day.

ComponentCalculationMonthly Cost
Storage120 GB x $0.02$2.40
Compute (Dedicated, 2 CU)2 x 720 hours x $0.096$138
Backups120 GB x $0.015$1.80
Total (Dedicated)~$140-300/month
Total (Serverless, moderate)~$250-500/month

Why is Serverless more expensive here? Because sustained moderate traffic on Serverless accumulates more CU-hours than a fixed Dedicated cluster. If your traffic is predictable and steady, Dedicated saves money. If your traffic is bursty with idle periods, Serverless wins.

Comparison at this scale:

  • Pinecone Serverless: $170-370/month
  • Qdrant Cloud: $120-180/month
  • Weaviate Cloud: $200-400/month
  • Self-hosted Milvus: $100-200/month (3-node cluster on EKS)

Verdict: At 10M vectors, Qdrant Cloud is cheapest for pure vector search. Zilliz Dedicated is competitive if you need Milvus-specific features (GPU search, DiskANN, streaming inserts at scale).

Scenario 3: Large-Scale Production (100M+ Vectors, High Traffic)

An enterprise search platform, large-scale recommendation system, or multi-tenant SaaS serving 100M+ vectors with 500,000+ queries per day.

ComponentCalculationMonthly Cost
Storage1.2 TB x $0.02$24
Compute (Dedicated, 8 CU)8 x 720 hours x $0.096$553
Backups1.2 TB x $0.015$18
Total (Dedicated)~$600-1,200/month
Total (Enterprise, high availability)~$1,500-3,500/month

At this scale, Milvus architecture begins to shine. Its segment-based storage distributes data across nodes and allows independent scaling of query and indexing capacity.

Comparison at this scale:

  • Pinecone Pods (p2.x4 x 3): $5,760/month
  • Pinecone Serverless: $800-1,500/month
  • Qdrant Cloud (XLarge x 2): $1,800-2,500/month
  • Self-hosted Milvus on EKS: $300-600/month

Verdict: Self-hosted Milvus wins decisively at 100M+ vectors. If you must use managed, Zilliz Dedicated is 2-3x cheaper than Pinecone Pods.


Self-Hosted Milvus: Production Architecture at $300-600/Month

The $300-600/month number above is not hypothetical. Here is the exact 3-node EKS cluster architecture we deploy for clients running 100M+ vectors in production.

Infrastructure Breakdown (100M Vectors, 1536 Dimensions)

ComponentInstance/ServicePricingMonthly Cost
Query node1x r6g.xlarge (32GB RAM)$0.20/hr Spot$144
Index node1x r6g.large (16GB RAM)$0.10/hr Spot$72
Coordinator + etcd + proxy1x m6g.large (8GB RAM)$0.077/hr OD$56
EKS control planeManagedFlat fee$73
S3 storage (segments + indexes)~1.2TB$0.023/GB$25
Total infrastructure$370

The query node runs on r6g.xlarge because it needs RAM for loaded segments. With DiskANN or IVF_SQ8, 32GB handles 100M vectors comfortably. The index node only bursts during index builds (can use Spot Savings Plans for further reduction). The coordinator node runs on-demand because etcd availability is critical.

Why This Beats Zilliz Dedicated

MetricSelf-Hosted ($370/mo)Zilliz Dedicated ($600-1,200/mo)
Cost savingsBaseline1.6-3.2x more expensive
Query latency (p99)8-15ms5-10ms
Scaling flexibilityFull (add nodes)CU increments only
GPU supportYes (add GPU node)Yes (at $0.35-1.20/GPU-CU-hr)
Version controlPin any releaseZilliz-managed upgrades

The 1.6-3.2x savings are real, but they come with operational cost.

The Operational Reality

Budget 8-16 hours/month of SRE time for:

  • etcd maintenance (4-6 hrs/month): Compaction, defragmentation, backup verification. etcd is the single point of failure in Milvus. If it corrupts, you lose collection metadata.
  • Compaction monitoring (2-4 hrs/month): Milvus segments grow and need periodic compaction. Uncompacted segments degrade query performance and waste storage.
  • Version upgrades (2-4 hrs/month averaged): Milvus releases monthly. Not every release requires immediate upgrade, but falling more than 2 versions behind makes upgrades painful.
  • Capacity planning (1-2 hrs/month): Monitor segment loading times, query latency percentiles, and memory utilization trends to plan scaling.

At $150/hour SRE cost, 12 hours/month = $1,800/month in labor. This makes self-hosting economically viable only when the infrastructure savings exceed the labor cost. The break-even: if Zilliz Dedicated would cost $2,200+/month, self-hosting saves money even with SRE labor factored in. At 100M vectors, Zilliz Dedicated runs $600-1,200/month — self-hosting wins only if your SRE time is already allocated (existing platform team) or if you value the control regardless of labor cost.

Spot Instance Strategy

The query and index nodes use Spot pricing (60-70% discount). To handle Spot interruptions:

  1. Run the query node in a 2-instance Auto Scaling Group across 3 AZs
  2. Enable Milvus replica groups so a second query node can serve traffic during Spot reclamation
  3. Keep the coordinator on On-Demand (etcd cannot tolerate interruptions)

This adds ~$72/month (second query node Spot) but eliminates availability risk. Total with HA: $442/month.


DiskANN: The Game-Changer for 100M+ Vectors

At 100M vectors and above, the choice of index algorithm determines whether your infrastructure bill is $400/month or $4,000/month. DiskANN is what makes large-scale vector search economically viable.

How DiskANN Works

Traditional HNSW stores the entire graph structure and all vectors in RAM. At 100M vectors (1536 dimensions, float32), that requires approximately 64GB of RAM just for vectors, plus another 20-30GB for the HNSW graph. Total: 80-90GB RAM minimum.

DiskANN takes a different approach:

  1. Vectors are stored on NVMe SSD (not RAM)
  2. A compressed navigation graph (PQ-compressed) stays in RAM for traversal
  3. During search, the algorithm navigates the in-memory graph to identify candidate vectors, then fetches full vectors from SSD for final distance computation

The RAM footprint drops from 80-90GB to 12-16GB because only the compressed graph (product quantization) lives in memory. The full vectors sit on fast NVMe storage.

Cost Impact at Scale

ScaleHNSW (all RAM)DiskANN (RAM + NVMe)Savings
100M vectors64GB RAM = $460/month16GB RAM + 500GB NVMe = $18061%
250M vectors160GB RAM = $1,150/month32GB RAM + 1.2TB NVMe = $32072%
500M vectors320GB RAM = $2,300/month64GB RAM + 2.5TB NVMe = $58075%
1B vectors640GB RAM = $4,600/month128GB RAM + 5TB NVMe = $1,10076%

At 500M vectors, DiskANN is the only economically viable option under $1,000/month. HNSW at that scale requires a $2,300/month RAM-optimized cluster or multiple r6g.4xlarge instances.

Latency Tradeoff

DiskANN is not free lunch. The SSD reads add latency:

MetricHNSW (in-memory)DiskANN (NVMe SSD)Acceptable For
p50 latency1-3ms3-8msAll use cases
p99 latency2-5ms5-15msSearch, recommendations, RAG
p999 latency5-10ms15-40msAnything except real-time bidding

For most search use cases (semantic search, RAG, product recommendations, content discovery), 5-15ms p99 is indistinguishable from 2-5ms in user experience. The 3-10ms difference is invisible when the LLM response takes 500-2000ms anyway.

When DiskANN Does Not Work

  • Real-time bidding / ad serving: Needs sub-2ms p99. Stay with HNSW in RAM.
  • Extremely high QPS (100K+ queries/second): SSD IOPS become the bottleneck. At 100K QPS, you need multiple NVMe drives in RAID-0 or fall back to in-memory.
  • Frequent updates: DiskANN index rebuilds are slower than HNSW. If your vectors change hourly, the rebuild overhead makes DiskANN impractical.

DiskANN on Self-Hosted Milvus: The $180/Month 100M-Vector Setup

ComponentSpecMonthly Cost
Query node (r6g.xlarge)32GB RAM (16GB for PQ)$144 (Spot)
NVMe storage (i3.large)475GB NVMe local$108 (Spot)
OR: EBS io2 (500GB, 10K IOPS)Provisioned IOPS$65
Coordinator + etcdm6g.large$56
S3 (backup/cold segments)1.2TB$25
Total (EBS path)$290
Total (i3 NVMe path)$333

The EBS io2 path is cheaper but gives lower IOPS (10K provisioned vs 100K+ on NVMe). For most workloads under 50K QPS, EBS io2 is sufficient. For higher throughput, the i3 instance with local NVMe wins on raw performance.

Either way: 100M vectors, sub-15ms p99, under $350/month. Compare to Zilliz Dedicated at $600-1,200/month or Pinecone at $800-1,500/month for the same scale.


Zilliz Cloud vs Self-Hosted Milvus: The Architecture and Cost Trade-Off

Milvus is fully open source (Apache 2.0 license). Running it yourself is a legitimate production option. But Milvus self-hosting is not like deploying a single binary. It is a distributed system with multiple components.

Milvus Architecture (What You Are Managing)

ComponentPurposeInfrastructure Needed
Query NodesExecute searches1-N pods (scale with QPS)
Data NodesHandle inserts/deletes1-N pods (scale with write throughput)
Index NodesBuild and maintain indexes1-N pods (CPU/GPU intensive, burst)
ProxyAPI gateway and routing1-2 pods
Root CoordMetadata coordination1 pod
etcdMetadata storage3-pod cluster (HA)
MinIO/S3Object storage for segmentsS3 bucket or MinIO cluster
Pulsar/KafkaMessage queue for WAL3-node cluster (or managed Kafka)

That is at minimum 10-15 pods for a production deployment. Serious.

Self-Hosted Cost Model (AWS EKS)

ScaleNodesMonthly Infra CostCompared to Zilliz
10M vectors3 x m5.xlarge (4 vCPU, 16GB)$150-200/monthZilliz: $140-300
50M vectors3 x m5.2xlarge (8 vCPU, 32GB)$300-450/monthZilliz: $400-800
100M vectors5 x m5.2xlarge + 2 x m5.4xlarge$500-800/monthZilliz: $600-1,200
500M vectors8-12 nodes mixed sizes$1,500-3,000/monthZilliz: $3,000-6,000

The hidden costs of self-hosting Milvus:

  1. etcd is fragile. If your etcd cluster goes down, Milvus cannot serve queries. etcd requires careful monitoring, backup, and occasional compaction.

  2. Index building is expensive. Building an IVF or HNSW index on 100M vectors requires significant CPU (or GPU) for hours. On Zilliz, this happens transparently. Self-hosted, you provision index nodes and manage the scheduling.

  3. Version upgrades are complex. Milvus has a rapid release cycle. Upgrading a distributed system with etcd schema changes, storage format changes, and API changes requires planning and testing.

  4. MinIO/S3 costs add up. At 100M vectors, your segment storage in S3 can reach 1-2TB. S3 costs are minimal ($23/TB/month), but the GET/PUT operations during compaction and segment loading add $10-50/month.

Break-Even Analysis

FactorZilliz CloudSelf-Hosted Milvus
< 10M vectorsUsually cheaper (less overhead)Overengineered, not worth it
10-50M vectorsComparable (convenience premium)30-50% cheaper in infra
50M+ vectorsExpensive at scale50-70% cheaper in infra
Engineering timeZero8-20 hours/month minimum
Time to productionHoursDays to weeks
Disaster recoveryBuilt-inYou build and test it

The simple rule: if your monthly Zilliz bill exceeds $500 and your team has Kubernetes expertise, evaluate self-hosting. Below $500/month, the managed convenience is almost always worth it.


Milvus/Zilliz vs Competitors: When Each Wins

Cost and Feature Matrix at 10M Vectors

FactorZilliz CloudPinecone ServerlessQdrant CloudWeaviate Cloud
Monthly cost (10M vectors)$250-500$170-370$120-180$200-400
ArchitectureDistributed (multi-node)Serverless (proprietary)Single-node or replicatedSingle-node or replicated
Max vector countBillionsHundreds of millionsHundreds of millionsTens of millions
GPU-accelerated searchYes (NVIDIA)NoNoNo
DiskANN (on-disk vectors)YesNoLimitedNo
Streaming insertsYes (Kafka/Pulsar WAL)Yes (limited throughput)YesYes
Hybrid search (sparse+dense)YesLimitedNoYes (BM25)
Multi-vector (ColBERT)YesNoNoNo
Scale-to-zeroServerless onlyYesNoNear-zero
Open sourceApache 2.0NoApache 2.0BSD-3
Self-host pathFull feature parityN/AFull parityFull parity

Choose Milvus/Zilliz When:

  1. Your dataset will exceed 100M vectors. Milvus distributed architecture scales horizontally better than any competitor. Adding query nodes increases QPS linearly.

  2. You need GPU-accelerated search. For sub-millisecond latency at scale, Milvus supports NVIDIA GPUs for index building and query execution. No competitor offers this.

  3. You want DiskANN. Storing vectors on disk instead of RAM reduces memory costs by 10x. Milvus DiskANN maintains >95% recall with disk-based indexes.

  4. Streaming inserts matter. If your data changes continuously (real-time recommendations, live content indexing), Milvus WAL-based architecture handles concurrent reads and writes better than eventually-consistent alternatives.

  5. You plan to self-host at scale. Milvus on Kubernetes gives you complete control, full feature parity with managed, and 50-70% cost savings at scale.

Do Not Choose Milvus When:

  1. Your dataset is under 10M vectors. The distributed architecture is overkill. You will pay more for less simplicity.

  2. Your team lacks Kubernetes expertise. Self-hosted Milvus is complex. Zilliz Cloud removes this, but then you pay a premium over simpler alternatives.

  3. You want the simplest API. Pinecone's API is simpler. Qdrant's API is simpler. Milvus has more concepts (collections, partitions, segments, fields, indexes) that take time to learn.

  4. Cost at small scale matters most. Under 10M vectors, Qdrant Cloud ($120-180/month) beats Zilliz ($250-500/month) consistently.


Zilliz Cloud Cost Optimization: 6 Strategies

1. Use DiskANN for Large Collections

If recall > 95% is acceptable, DiskANN stores vectors on SSD instead of RAM. At 100M vectors:

  • HNSW (in-memory): requires 8-16 CUs ($550-1,100/month in compute)
  • DiskANN (on-disk): requires 2-4 CUs ($140-280/month in compute) + extra SSD storage

That is a 60-75% compute reduction with minimal latency impact for most workloads.

2. Choose Dedicated Over Serverless for Steady Workloads

If your queries run 16+ hours per day at consistent volume, Dedicated clusters are cheaper than Serverless pay-per-CU-hour. The crossover:

  • Serverless at 15 CU-hours/day = $43/month

  • Dedicated 1 CU (24/7) = $69/month

  • Serverless at 30 CU-hours/day = $86/month

  • Dedicated 2 CU (24/7) = $138/month

If your average daily consumption exceeds what a fixed CU allocation would provide, Serverless is better. If it is below the CU ceiling, Dedicated wastes less.

3. Partition Your Collections

Milvus supports partition keys that physically separate data. Benefits:

  • Queries against a single partition skip all other data (less compute)
  • Partitions can be loaded/released independently (memory savings)
  • Ideal for multi-tenant workloads (partition per customer)

At 50M vectors with 100 partitions, queries that target one partition search 500K vectors instead of 50M. That is a 100x reduction in compute cost per query.

4. Use Scalar Quantization

Milvus supports IVF_SQ8 (scalar quantization) that compresses vectors from float32 to int8:

  • Memory reduction: 4x
  • Recall impact: typically less than 2% loss
  • Cost impact: fewer CUs needed for the same dataset size

At 100M vectors, SQ8 means you can run on 4 CUs instead of 16 CUs. Annual savings: roughly $8,000.

5. Offload Cold Collections to Warm Storage

Collections not queried in the last 30+ days should move to warm storage ($0.008/GB vs $0.02/GB). That is a 60% reduction in storage costs for archival data that you still want available for occasional queries.

6. Right-Size Index Parameters

Milvus index parameters dramatically affect compute consumption:

ParameterEffect of Over-SizingRecommended Approach
nlist (IVF)Higher = more segments to buildStart at sqrt(N), tune up if recall is low
ef_construction (HNSW)Higher = slower index, better recall128-256 for most workloads
M (HNSW)Higher = more memory per vector16-32 for most workloads
nprobe (query)Higher = more partitions searchedStart at nlist/10, tune for latency budget

Over-provisioning these parameters wastes CU-hours on both index building and query execution.


The Bottom Line

Milvus is not a vector database for everyone. It is a vector database for teams building at scale with demanding requirements around distribution, GPU acceleration, and streaming ingestion. At 100M+ vectors, nothing matches its architecture.

At smaller scales (under 10M vectors), you are paying a complexity premium. The distributed architecture that makes Milvus shine at billions of vectors is unnecessary overhead at millions. Use Qdrant Cloud for the cheapest managed vector search, or Pinecone Serverless for the simplest developer experience.

If you are running vector database infrastructure that costs more than $500/month and growing, our team at LeanOps specializes in AI infrastructure cost optimization. We audit vector database deployments, recommend architecture changes, and typically cut costs by 40-60% within 60 days. Get a free Cloud Waste Assessment to find out where your AI infrastructure budget is going.


Further reading:

Frequently Asked Questions

Stop Overpaying for Cloud Infrastructure

Our clients save 30-60% on their cloud bill within 90 days. Get a free Cloud Waste Assessment and see exactly where your money is going.