HNSW vs IVFFlat in pgvector: How to Actually Choose

pgvector gives you two approximate-nearest-neighbour index types, and the official docs are diplomatically neutral about which to pick. I won't be. HNSW is the right answer about 90% of the time, but the remaining 10% is more interesting than most comparison posts admit, because the two indexes don't just differ in speed. They fail in completely different ways. Which failure you can live with depends on how your data changes and how much RAM you bought.

This is the long version of that argument, with build times and QPS numbers we actually measured instead of hand-waving.

How HNSW thinks: a greedy walk through a graph#

HNSW (Hierarchical Navigable Small World) stores your vectors as nodes in a multi-layer graph. The top layers are sparse: a handful of long-range links that let a search cross the dataset in a few hops, like an express train. Lower layers get denser, until the bottom layer links every vector to its m nearest neighbours.

A query starts at the top, greedily hops toward whichever neighbour is closest to the query vector, and drops down a layer whenever it can't improve. At the bottom it keeps a candidate pool of size ef_search and keeps expanding the best candidates until nothing closer turns up.

CREATE INDEX ON docs USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

The intuition that matters: search cost grows roughly with the number of hops, and every hop is a random read into the graph. If the graph sits in RAM, hops cost nanoseconds. If it doesn't, each hop can become a disk read, and your p50 falls off a cliff. Keep that in mind, because it explains both the memory appetite and the tuning trap later.

How IVFFlat thinks: pick buckets, then brute-force them#

IVFFlat is conceptually simpler. At build time it runs k-means over your data to produce lists centroids, then files every vector into its nearest centroid's bucket. At query time it compares the query against all the centroids, picks the probes closest buckets, and brute-force scans everything inside them.

CREATE INDEX ON docs USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 500);

-- at query time
SET LOCAL ivfflat.probes = 20;

Coarse partitioning plus exact scan inside the partitions. The failure mode follows directly: if a true neighbour lives in a bucket you didn't probe, you simply never see it. Query vectors that land near cluster boundaries suffer most, and recall drops silently, with no error and no warning.

One more property that drives most of this article's decision rule: the centroids are frozen at build time. K-means runs once, on the data present at CREATE INDEX. Everything you insert afterwards gets filed against those stale centroids. Hold that thought.

Build time and memory: where they diverge first#

This is the comparison most posts get wrong by summarizing it as "HNSW builds slower." The real story is that HNSW builds are memory-bound, and the difference between fitting and not fitting is not 2x. It's more like 20x.

During an HNSW build, pgvector constructs the graph in maintenance_work_mem. While the graph fits, the build is CPU-bound and brisk. The moment it doesn't, construction starts thrashing through storage and the build crawls. We measured this on our own tiers (1536-dim vectors, m=16, ef_construction=64):

Node	RAM	250k build	500k build	1M build
Starter (2 vCPU)	4 GB	~5 min	hours	fails
Growth (4 vCPU)	8 GB	~2 min	~40 min	fails (OOM)
Scale (8 vCPU)	16 GB	~76 s	—	~9 min

Read that Starter row again. Doubling the data from 250k to 500k didn't double the build time. It multiplied it by fiftyfold. The 500k build on the 4 GB box did finish, and the graph it produced was fine. The build path just isn't designed to spill. And 2M vectors won't build even on the 16 GB Scale node. There's a hard ceiling per machine size, and you want to know yours before the index is on the critical path of a launch.

IVFFlat has no equivalent cliff. The build is one k-means pass plus a bucket-assignment pass: far less memory, and minutes where HNSW takes tens of minutes. If you're indexing on a small box against a deadline, this is IVFFlat's genuinely strongest card.

The catch: IVFFlat's quality depends entirely on the data present at build time. Build it on an empty or half-loaded table and k-means trains on garbage, producing centroids that misfile everything you load afterwards. The order of operations is non-negotiable: load data first, index second. HNSW doesn't care; you can create it on an empty table and insert forever (inserts are slower than a bulk build, but recall holds).

Query knobs: ef_search vs probes, and the PgBouncer trap#

Both indexes give you one runtime dial for the recall/latency trade.

hnsw.ef_search controls the candidate pool during the graph walk. It's a smooth dial: nudge it from 40 to 80 and recall climbs a few points while latency creeps up proportionally. Our benchmarks land at recall@10 of 0.93 with ef_search = 80 on 250k vectors.

ivfflat.probes controls how many buckets get scanned. It's a chunky dial: each increment adds an entire bucket's worth of brute-force comparisons. The usual starting points are lists ≈ rows / 1000 and probes ≈ sqrt(lists), then tune against a recall measurement, not against vibes.

Now the trap, and it bites both knobs equally because both are session-level GUCs. If you connect through PgBouncer in transaction-pooling mode (which is how our Solo tier connects, on port 6432) a bare SET is silently dropped. The pooler hands your next query to whatever server connection is free, your setting stays behind on the old one, and you get default recall while believing you tuned it. No error. It bites often enough that it belongs in every tuning conversation.

The fix is to scope the setting to the transaction:

BEGIN;
SET LOCAL hnsw.ef_search = 80;
SELECT id, title
FROM docs
ORDER BY embedding <=> $1
LIMIT 10;
COMMIT;

Or, if one value suits the whole application, persist it server-side and stop thinking about it:

ALTER DATABASE app SET hnsw.ef_search = 80;

One more knob warning while we're here: the dial can outrun your RAM. On a 4 GB node, chasing recall 0.99 with ef_search = 200 pushes the working set past what the cache holds, and throughput collapses to a small fraction of what the same node does at sane settings, with p50 jumping by roughly an order of magnitude. The knob didn't get slower; the machine ran out of memory to hide behind it. More on tuning in the HNSW tuning guide.

What we measured#

These are HNSW numbers from our June 2026 runs: same-region client, clustered 1536-dim embeddings, cosine distance, m=16 / ef_construction=64, recall@10 scored against exact KNN. The harness is open source (pgvector-bench) and the full methodology (including why NVMe vs network-attached SSD matters for a graph index built on random reads) is in the benchmarks post.

Node	Vectors	recall@10	QPS	p50	Clients
Starter (2 vCPU / 4 GB)	250k	0.93	~1,600	—	16
Growth (4 vCPU / 8 GB)	250k	0.94	~2,950	—	16
Scale (8 vCPU / 16 GB)	250k	0.95	~4,465	—	16
Scale	1M	0.74	~3,600	4.2 ms	16

Two honest notes. First, every QPS figure above is tied to a specific recall and client count: a "10,000 QPS" claim without those two numbers is marketing, not measurement. Second, we haven't published IVFFlat numbers from this harness, so I won't quote any; the qualitative picture (cheaper build, recall that's more sensitive to tuning and to data drift) is well-established, but I'd rather give you a reproducible CLI than an unverifiable table. If you want to feel the HNSW latency yourself, ask.rivestack.io is a live semantic search over Hacker News running on a real Rivestack database.

Inserts, deletes, and the drift problem#

Here's where the architectures truly part ways, and where IVFFlat earns its "static datasets only" reputation.

HNSW under writes. New vectors get wired into the graph at insert time. It's more expensive per row than bulk-build insertion, but the graph stays navigable and recall holds steady. Deletes follow normal Postgres MVCC: tuples are marked dead, and vacuum cleans them out of the index and repairs the surrounding links. The index file rarely shrinks much, but search quality doesn't rot.

IVFFlat under writes. Every insert is filed against centroids that were computed once and never move. While your new data looks like your old data, fine. But embeddings drift (you add a new document category, switch embedding models, expand into a new language) and the frozen centroids stop describing the data. Buckets bloat unevenly, more true neighbours land outside the probed buckets, and recall decays. Quietly. No vacuum fixes it; vacuum removes dead tuples but never re-clusters. The only repair is REINDEX.

Silent decay is the worst failure class in production. Your dashboards are green, latency looks normal, and your RAG answers are just... a little worse every month. Nobody files a ticket for that. If your table takes continuous writes, this single section should settle the choice.

The decision rule#

Default to HNSW when any of these is true:

the table takes ongoing inserts or updates,
you need recall ≥ ~0.9 at single-digit-millisecond p50,
nobody on the team will own re-tuning lists/probes and scheduling rebuilds.

Reach for IVFFlat only when all of these hold:

the corpus is static, or rebuilt wholesale on a schedule anyway (nightly batch pipelines qualify),
build time or build memory is your binding constraint, meaning you're on a small box, or your row count is past the HNSW build ceiling for your RAM,
moderate recall is acceptable and someone will actually verify it after each rebuild.

A concrete pair of examples. A RAG application ingesting user documents all day: HNSW, no debate, because drift would eat IVFFlat alive. A product catalogue re-embedded from scratch every night by a batch job on a memory-tight node: IVFFlat is defensible, because the nightly rebuild resets the centroids and the build savings are real.

There's a third option people forget: size the node so HNSW fits in RAM and skip the compromise entirely. For 1536-dim vectors the comfortable in-memory ceilings on our tiers are roughly 350k vectors on 4 GB, 600k on 8 GB, and 1M on 16 GB. On managed pgvector that's the first sizing question to settle for any new workload, and a node-size bump is usually cheaper than the engineering time spent babysitting an IVFFlat rebuild schedule.

Migrating between them#

Both index types can coexist on the same column, which makes migration low-drama. The pattern:

-- build the new index without blocking writes
CREATE INDEX CONCURRENTLY docs_embedding_hnsw
  ON docs USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

-- confirm the planner picks it
EXPLAIN SELECT id FROM docs ORDER BY embedding <=> $1 LIMIT 10;

-- then, and only then
DROP INDEX CONCURRENTLY docs_embedding_ivfflat;

Direction-specific caveats:

IVFFlat → HNSW. Check the build ceiling first. If you have 500k vectors of 1536 dims on a 4 GB node, that CREATE INDEX is the four-hour build from the table above, so do it during a quiet window or upgrade the node first, then build. Measure recall before and after with a held-out query set; you'll usually gain.

HNSW → IVFFlat. Build only after the table is fully loaded, never ahead of a data import. And schedule the REINDEX cadence on day one, because the day you need it is the day nobody remembers why recall regressed.

If migrating also means moving providers, our free Vector Rescue migration help exists for exactly this: we move your data for you.

Frequently asked questions#

Is IVFFlat ever actually faster than HNSW?

At equal recall on queries, almost never, because HNSW's graph walk examines far fewer vectors than IVFFlat's bucket scans. Where IVFFlat genuinely wins is build time and build memory: a single k-means pass versus a memory-bound graph construction that, in our measurements, took hours for 500k 1536-dim vectors on a 4 GB node (versus ~5 minutes for 250k on the same machine). If "faster" means "indexed before the demo," IVFFlat can be the right call.

Why did my IVFFlat recall get worse over months of inserts?

Centroid drift. IVFFlat's clusters are computed once at build time and never updated, so as inserted data diverges from the original distribution, more true neighbours land in buckets your probes setting never reaches. Raising probes papers over it at a latency cost; the real fix is REINDEX to recompute the centroids, or switching to HNSW, which maintains its graph on every insert and doesn't have this failure mode.

Why does SET hnsw.ef_search seem to do nothing on my database?

Almost certainly transaction-pooled PgBouncer. A bare SET binds to one server connection, and the pooler routes your next query to a different one, silently discarding the setting. The same applies to ivfflat.probes. Use SET LOCAL inside the transaction that runs the query, or persist the value with ALTER DATABASE ... SET hnsw.ef_search = 80 so every connection inherits it.

Can I keep both an HNSW and an IVFFlat index on the same column?

Yes, and it's the standard migration technique: Postgres lets both exist and the planner picks one (verify with EXPLAIN). You pay double the write amplification and double the disk while both are alive, so treat it as a transition state: build the new one with CREATE INDEX CONCURRENTLY, compare recall and latency, then drop the loser.

How much RAM do I need for HNSW at my scale?

For 1536-dim embeddings, the in-RAM fast-search ceilings we've measured are roughly 350k vectors on a 4 GB node, 600k on 8 GB, and 1M on 16 GB. Builds are stricter than serving: 1M vectors will not build at all on 4 GB or 8 GB nodes, and needs 16 GB (where it completes in about 9 minutes). Past 1M at this dimensionality, you're into dimensionality reduction, quantization, or sharding territory.

# keep reading

// pgvector