pgvector vs pgvectorscale: When Vanilla Isn't Enough
The decision usually shows up the same way. You are running embeddings inside PostgreSQL with pgvector, things are working, and then someone drops a link in the channel: "PostgreSQL and pgvector — now faster than Pinecone, 75% cheaper, 100% open source." The benchmark is powered by an extension called pgvectorscale, and the obvious question follows: should you bolt it onto your database too?
This is an honest attempt to answer that for your situation. One thing worth saying before anything else, because it is the kind of disclosure that should change how you read the rest: Rivestack runs vanilla pgvector, not pgvectorscale. So this is not us talking up our own index. It is us telling you where the line actually falls — including the cases where the answer is "what you want is not what we run."
The short version: for the workloads most AI teams actually have — under roughly a million vectors — tuned vanilla pgvector on the right hardware is simpler and entirely sufficient. pgvectorscale's StreamingDiskANN index earns its keep at very large scale and in memory-constrained setups. The longer version is about one number: whether your index fits in RAM.
Quick Comparison#
| Dimension | pgvector | pgvectorscale |
|---|---|---|
| Type | PostgreSQL extension (C) | PostgreSQL extension (Rust/PGRX) that builds on pgvector |
| License | PostgreSQL license (open source) | PostgreSQL license (open source) |
| Maintainer | pgvector community | Timescale / TigerData |
| Index types | HNSW, IVFFlat | StreamingDiskANN (+ inherits pgvector's) |
| Where the index lives | In memory (HNSW) | Partly on disk (DiskANN) |
| Quantization | halfvec (fp16), bit (binary) |
Statistical Binary Quantization (SBQ) + pgvector's |
| Filtered search | Iterative index scans (0.8) | Label-based filtered DiskANN |
| Sweet spot | Indexes that fit in RAM | Very large or memory-constrained indexes |
| Requires pgvector | It is pgvector | Yes — builds on top of pgvector |
| Managed on Rivestack | Yes (0.8.x) | No |
What pgvector Is and When It Shines#
pgvector is a PostgreSQL extension that adds a vector data type along with distance operators and index support. You install it, create a column of type vector(1536), build an HNSW or IVFFlat index on it, and you have a vector search engine running inside your existing database.
The part people underestimate is that it is not a separate system you query alongside PostgreSQL. It is PostgreSQL. Your vectors live in the same rows as your user IDs, timestamps, and application data, which means a single query can do real work:
SELECT content, embedding <=> $1 AS distance
FROM documents
WHERE user_id = $2 AND created_at > now() - interval '30 days'
ORDER BY distance
LIMIT 10;That finds the semantically closest documents to an embedding, scoped to one user, filtered to the last 30 days, in one round trip and one transaction. Delete that user and their chunks cascade in the same commit — no orphaned vectors, no reconciliation job.
The HNSW index is also genuinely fast when it fits in memory. On NVMe storage, a well-tuned index handles roughly 1,000 to 2,500 queries per second per node at low-single-digit-millisecond p50 latency for datasets that fit in RAM. On one $15 Solo node (2 vCPU / 4 GB), a 250k × 1536-dim index serves ~1,000 QPS at recall@10 0.93 with a p50 of 3.7ms, measured same-region, and the methodology is reproducible. The phrase "when it fits in memory" is doing the heavy lifting here, and it is exactly where pgvectorscale enters the story.
What pgvectorscale Is and When It Shines#
pgvectorscale is a separate open-source PostgreSQL extension from Timescale (now TigerData) that builds on top of pgvector — it does not replace it. When you CREATE EXTENSION vectorscale CASCADE, it installs pgvector underneath as a dependency. Where pgvector is written in C, pgvectorscale is written in Rust using the PGRX framework. It is released under the PostgreSQL license, the same permissive open-source license pgvector uses, and it is actively maintained (version 0.9.0 landed in November 2025). So the easy "proprietary lock-in" criticism does not apply here — this is open source from a credible Postgres team, and it is fair to say so.
It introduces three real innovations.
StreamingDiskANN. This is the headline. It is a new index type inspired by Microsoft's DiskANN research, and the key idea is in the name: unlike HNSW, which has to live in memory to be fast, DiskANN is designed to keep part of the index on disk and still serve low-latency queries. This is the "diskann postgres" capability people are usually after. When your index comfortably fits in RAM, this buys you little. When it does not, it is the difference between a graph that degrades gracefully and one that falls off a cliff.
Statistical Binary Quantization (SBQ). A compression method developed by Timescale's researchers that improves on standard binary quantization, letting you shrink vectors so more of the index stays resident. pgvector is not quantization-free — it has halfvec for fp16 and bit for binary — but SBQ is more specialized.
Label-based filtered search. Based on Microsoft's Filtered DiskANN work, it applies label filters during the index traversal rather than before or after it.
Now the benchmark, attributed precisely, because the attribution is the whole point. Timescale's widely shared figures come from a test on 50 million Cohere embeddings at 768 dimensions. Against Pinecone's storage-optimized s1 index at 99% recall, they report 28x lower p95 latency and 16x higher query throughput; against Pinecone's performance-optimized p2 index at 90% recall, a more modest 1.4x lower p95 and 1.5x higher throughput. On cost, they put self-hosted Postgres at roughly $835/month on AWS EC2 versus $3,241 (s1) and $3,889 (p2) for Pinecone — the source of the "75% cheaper" line.
Two honest caveats on those numbers. First, they are Timescale's published claims, not something we measured, and the headline 28x/16x is against Pinecone, not against vanilla pgvector HNSW. There is no clean published "Nx faster than pgvector HNSW" headline, because the comparison that actually matters here is architectural, not a single multiplier. Second, the test that makes pgvectorscale look transformational uses 50 million vectors — a scale chosen precisely because it is past where in-memory HNSW is comfortable. Which is the right way to think about the whole question.
The Real Tradeoff: Does Your Index Fit in RAM?#
Strip away the benchmarks and the decision reduces to one piece of arithmetic you can do yourself.
A raw float32 vector is 4 bytes per dimension. So:
- 1 million × 1536-dim vectors ≈ 6 GB of raw vectors. Add HNSW graph overhead and you want about 16 GB of RAM to build and serve it hot. That fits on a single mid-sized node.
- 50 million × 768-dim vectors ≈ 150 GB of raw vectors before graph overhead. There is no "just add RAM" answer to that on commodity hardware — keeping an HNSW graph of that size resident is expensive and operationally awkward.
That second regime is exactly what StreamingDiskANN is built for. HNSW assumes the graph lives in memory; once it spills, traversal starts hitting disk in a random-access pattern HNSW was never designed around, and latency degrades sharply. DiskANN was designed from the start to keep most of the index on disk and stay fast, so it degrades gracefully as the dataset outgrows RAM. Below the RAM line, that engineering buys you very little and adds a dependency. Above it, it is the reason the workload is viable at all.
So the framing is not "pgvectorscale is faster than pgvector." It is: pgvectorscale changes the shape of the curve when your index no longer fits in memory. If yours does fit — and for most teams it comfortably does — you are paying for a capability you are not using.
One more variable that matters more than which extension you pick: the storage layer. On standard cloud SSDs like AWS gp3, HNSW traversal is bottlenecked by IOPS long before it is bottlenecked by the algorithm; on NVMe that bottleneck disappears. A lot of "pgvector is slow" experiences are really "pgvector on gp3 is slow." Before reaching for a new index type, it is worth knowing whether your current one is running on the right hardware — the HNSW vs IVFFlat breakdown covers how to tune what you already have.
Cost: It Depends on Where You Already Are#
The Timescale cost story compares self-hosted Postgres against managed Pinecone, which is a real saving but not an apples-to-apples one — part of that gap is the price of running the database yourself. Once you account for an engineer's time on backups, failover, upgrades, and monitoring, the self-host discount narrows.
For vanilla pgvector on a managed service like Rivestack, pricing is flat regardless of query volume. A Solo VM (2 vCPU, 4 GB RAM, 55 GB NVMe) is $15/month and comfortably serves a few hundred thousand 1536-dim vectors; the 1-million-vector index above lands on the Scale plan at $99/month, whether you run 100,000 queries or 10 million. pgvectorscale itself is free — it is open source — so its cost is not a license fee but operational: a newer Rust extension, another dependency to track, and, if you want it managed rather than self-run, a provider that actually offers it. We do not, and that is the honest seam in this post: at the scale where pgvectorscale clearly wins, the clean managed path is Timescale Cloud / TigerData, who build and run it, or a dedicated engine. If you are searching for "managed pgvectorscale," that is where to look — not here.
When to Choose Vanilla pgvector (with Rivestack)#
Choose vanilla pgvector when your index fits in memory — realistically up to around a million 1536-dim vectors per node, more at lower dimensions — and your application has any relational structure at all. That describes the large majority of real AI products: RAG over a company's documents, semantic search over a catalog, recommendations scoped to a user. The vector layer is rarely the part that is straining.
In that regime, the StreamingDiskANN machinery is solving a problem you do not have, and vanilla pgvector is the simpler system: one extension instead of two, the standard C build everyone runs, your vectors queryable with psql and introspectable with EXPLAIN ANALYZE, backed up alongside the rest of your data. If you want that without operating the database yourself, managed pgvector on Rivestack handles backups, high availability, NVMe provisioning, and tuning at a flat price. The old argument against pgvector — that self-hosting Postgres is painful — stops applying when someone else runs it.
This is also the right call when you are not sure yet. Starting on vanilla pgvector costs you nothing if you later outgrow it: pgvectorscale builds on the same pgvector foundation, so adopting StreamingDiskANN down the line is an additive change, not a migration to a different database.
When to Choose pgvectorscale#
Choose pgvectorscale when your index has outgrown RAM and HNSW is the thing that is actually hurting. Tens of millions of vectors and up, or smaller datasets at high dimensions on memory-constrained hardware — that is the regime StreamingDiskANN was built for, and below it the case is weak. Choose it when keeping the whole index resident has become the binding cost and SBQ's aggressive compression is what lets you fit more of it in the memory you can afford. And choose it when you want to stay in PostgreSQL at that scale rather than stand up a separate engine — which is a genuinely good reason, because it keeps your vectors next to your relational data even as the index design changes underneath.
If that is you, pgvectorscale is strong, credible engineering from a serious Postgres team, and the right move is to evaluate it on the path that ships it as a managed product — or to self-host it if your team already lives in that infrastructure. At the same scale you should also be weighing a dedicated vector engine; the pgvector vs Qdrant comparison covers where horizontal sharding starts to win, and the broader Postgres vs dedicated vector databases post lays out the full decision.
The Bottom Line#
pgvector vs pgvectorscale is not really a head-to-head, because pgvectorscale is built on pgvector. The real question is whether you need StreamingDiskANN, and that comes down to one line: does your index fit in RAM? Below it — where most AI teams operate today — tuned vanilla pgvector on NVMe with enough memory is simpler, sufficient, and the thing we run. Above it, pgvectorscale's disk-based DiskANN index stops being a nice-to-have and becomes the reason the workload is viable in Postgres at all.
Timescale's 28x and 16x numbers are real claims, fairly published, and worth taking seriously — for what they measure: pgvector-plus-pgvectorscale against Pinecone at 50 million vectors, not against the vanilla HNSW index most teams are actually running at a fraction of that size. Read at the right scale, both sides of this are true at once: most teams do not need pgvectorscale yet, and the teams that do should reach for it without apology.
Try Rivestack for pgvector in Production#
If vanilla pgvector is the right fit — and for sub-million-vector workloads it usually is — Rivestack removes the only real friction point: managing the database. You get PostgreSQL on NVMe with pgvector pre-configured, automated backups, and high availability, starting at $15/month. To be straight about the boundary: we run pgvector 0.8.x, not pgvectorscale, so if your index has truly outgrown RAM, the right tool is StreamingDiskANN on a provider that ships it. For everyone below that line, see pgvector hosting on Rivestack for benchmarks, plans, and migration paths.