All posts pgvector Hosting Guide: How to Choose Managed PostgreSQL for Vector Search
·Rivestack Team

pgvector Hosting Guide: How to Choose Managed PostgreSQL for Vector Search

pgvector
hosting
PostgreSQL
HNSW

pgvector hosting sounds simple: find a managed PostgreSQL provider, enable the extension, create a vector column, and ship.

That is enough for a prototype. It is not enough for production.

Once you have real traffic, real row counts, and real latency targets, pgvector behaves like a database workload, not a library feature. HNSW indexes use memory, vector queries create random-read pressure, filters can change query plans, and every RAG request can become several database searches.

This guide is the practical checklist we use when evaluating whether a hosted pgvector setup is ready for production.

1. Start with the workload, not the provider list

Before comparing providers, write down the shape of the workload:

  • Current vector count.
  • Expected vector count in 3, 6, and 12 months.
  • Embedding dimension.
  • Distance metric: cosine, inner product, or L2.
  • Average and peak query volume.
  • Peak concurrency.
  • Typical LIMIT value.
  • Required filters, such as tenant, user, language, status, timestamp, or document type.
  • Latency target for p50, p95, and p99.
  • Recall target for the product experience.

This forces the evaluation to stay grounded. A provider that is fine for 50,000 vectors and ten searches per minute may not be right for 10 million vectors and user-facing RAG.

2. Check the storage layer

HNSW search jumps through graph nodes. That means random reads matter more than sequential throughput.

Ask the provider:

  • Is storage local NVMe, network block storage, or provider-dependent?
  • What random-read latency should I expect under load?
  • What happens when the HNSW index is larger than shared buffers?
  • Can I resize storage without changing the whole database shape?
  • Are IOPS included, provisioned separately, or burst-limited?

For small indexes that fit in memory, almost any decent storage layer can feel fast. The storage question becomes important when the index is larger than cache and every cache miss turns into a disk read.

For large HNSW indexes, storage latency can dominate query latency. This is why our pgvector NVMe benchmark focuses on random-read behavior instead of only CPU.

3. Size memory for the index, not just the table

A common mistake is calculating table size and forgetting index size.

For 1536-dimensional embeddings, the raw vector alone is large. Add table overhead, metadata, HNSW graph edges, regular PostgreSQL indexes, connection memory, and background activity, and memory pressure arrives earlier than expected.

Ask:

  • How many vectors will you store over the next year?
  • What embedding dimension are you using?
  • Do filters need extra B-tree, GIN, or partial indexes?
  • Can your hot index pages stay in memory?
  • What is the plan when the HNSW index grows beyond memory?

If the provider cannot help you reason about index memory, you are probably buying generic Postgres rather than managed pgvector hosting.

4. Confirm HNSW tuning access

HNSW has important knobs:

  • m
  • ef_construction
  • hnsw.ef_search
  • maintenance_work_mem
  • parallel maintenance workers

You do not always need to tune all of them. Defaults are fine for many workloads. But when recall, build time, or tail latency matters, the provider should be able to explain the tradeoffs.

hnsw.ef_search is usually the knob teams touch first because it changes query-time behavior. Higher values often improve recall and result quality, but they increase query work and latency.

For build-time tuning, m, ef_construction, and maintenance_work_mem matter more. These affect index size, build duration, and recall.

For more detail, read pgvector HNSW tuning on managed PostgreSQL.

5. Test with your filters

Many vector benchmarks are too clean:

SELECT id, content
FROM documents
ORDER BY embedding <=> $1
LIMIT 10;

Real products usually look more like this:

SELECT id, title, excerpt
FROM documents
WHERE account_id = $2
  AND status = 'published'
  AND language = 'en'
ORDER BY embedding <=> $1
LIMIT 10;

Filters change the shape of the problem. The database has to combine vector similarity with ordinary relational constraints. Depending on selectivity, you may need B-tree indexes, partial indexes, partitioning, query rewrites, or higher ef_search so enough candidates remain after filtering.

When evaluating pgvector hosting, run tests with production-shaped filters. A provider benchmark that only searches one unfiltered table is useful, but it is not enough.

6. Understand backup and restore

Vector data is application data. It needs the same backup discipline as the rest of your database.

At minimum, check:

  • Daily backups.
  • Point-in-time recovery.
  • Restore testing.
  • Backup retention.
  • Encryption at rest and in transit.
  • Whether backups live in the same region.
  • How large vector tables affect restore time.

Do not treat embeddings as disposable unless you can regenerate them quickly and cheaply. Re-embedding millions of documents after a failure can be expensive and slow, especially if the source content has changed.

7. Plan high availability before launch

Single-node Postgres is fine for prototypes. Production systems should know what happens when the primary node fails.

Ask:

  • Is failover automatic?
  • Are replicas synchronous or asynchronous?
  • How do writes resume after failover?
  • Does the connection string stay stable?
  • How often is failover tested?
  • What is the expected recovery time objective?
  • What is the expected recovery point objective?

If you use pgvector for RAG or user-facing search, database downtime becomes application downtime. Even if vector search is not the system of record, the application may feel broken without retrieval.

8. Compare fixed pricing with usage pricing

Usage-based pricing is attractive for low-volume workloads. Fixed-node pricing is easier to reason about when query volume is high or unpredictable.

For pgvector, query volume can grow quickly because every search, recommendation, autocomplete, or RAG request may run a vector query. Some features run multiple vector searches per user request.

Before choosing a provider, model:

  • Vector count.
  • Query volume.
  • Peak concurrency.
  • Storage growth.
  • Replica needs.
  • Backup retention.
  • Expected index rebuilds.
  • Support or migration costs.

Rivestack uses fixed per-node pricing so the database bill does not scale with every vector query. That is not always the cheapest model at the beginning, but it is easier to reason about when the workload becomes central.

9. Make migration boring

If you are already on PostgreSQL, migration should use normal PostgreSQL tools:

  • pg_dump
  • pg_restore
  • logical replication
  • replica and cutover planning
  • connection string rotation

If you are migrating from a dedicated vector database, export IDs, metadata, and embeddings, then load them into a PostgreSQL table with a vector column. You will also need to decide which relational fields belong beside the embedding, which filters need indexes, and how to validate recall after the move.

A good managed pgvector provider should help you make this boring. The goal is not cleverness. The goal is a migration that is easy to test, easy to reverse during cutover, and easy to explain.

10. Use a production readiness checklist

Before launch, confirm:

Area Question
Extension Is the required pgvector version available?
Schema Are vector dimensions and distance metrics correct?
Indexes Are HNSW indexes built and monitored?
Filters Are metadata filters indexed appropriately?
Latency Have p50, p95, and p99 been tested under load?
Recall Have results been compared with exact search or expected examples?
Backups Has a restore been tested?
HA Is failover behavior documented?
Cost Has query growth been modeled?
Migration Is there a rollback plan?

This checklist matters more than a provider logo page.

FAQ

Is managed pgvector different from managed PostgreSQL?

Managed pgvector is managed PostgreSQL plus workload-specific operations for vector search. The difference is not the extension itself. The difference is storage, memory sizing, HNSW tuning, query plan support, benchmarks, migration help, and cost predictability.

Can I use normal PostgreSQL drivers?

Yes. One of the main advantages of pgvector is that it stays inside PostgreSQL. Your application can keep using standard drivers, SQL, migrations, backups, and metadata tables.

Should I choose a vector database instead?

Use a dedicated vector database if you want vector search as a separate specialized system and do not need PostgreSQL semantics. Use pgvector when relational data, metadata filters, SQL, transactions, and operational simplicity matter. Read pgvector vs Pinecone for a deeper comparison.

What size database do I need for pgvector?

Size for memory first. The HNSW index plus the vector column should fit comfortably in RAM with room left for PostgreSQL's shared buffers, the page cache, and active connections. As a rough guide for 1536-dimension embeddings: 4 GB RAM serves around 1M vectors, 16 GB serves around 20M, and beyond that you should plan for sharding or partitioning. Storage type matters as much as size — NVMe random-read latency is what HNSW search hits hardest on cache misses.

How much memory does pgvector use?

The vector column itself stores raw float4 values, so 1M 1536-dimension vectors take about 6 GB on disk. The HNSW index adds graph metadata — typically 30 to 50 percent on top of the vector data, depending on m and ef_construction. For predictable latency, plan for the index plus the working set of vectors to fit in memory.

Should I run pgvector on managed PostgreSQL or self-host?

Self-host when you have strong PostgreSQL operations and want full control over storage, kernel, and tuning. Use managed pgvector hosting when backups, monitoring, replication, failover, and version upgrades are not work you want to own. The technical surface is identical — same SQL, same extension, same query plans.

Does pgvector need NVMe storage?

It depends on dataset size and cache hit ratio. If the HNSW index fits in memory, almost any storage layer is fine because reads are served from cache. Once the index spills to disk, HNSW becomes a random-read workload, and NVMe latency (single-digit microseconds) is meaningfully better than typical cloud block storage (hundreds of microseconds to milliseconds).

How do I size pgvector for production?

Write down your vector count today and projected count in 12 months, embedding dimension, target p95 latency, peak QPS, expected concurrency, typical LIMIT, and required filters. Pick a node where the index plus working set fits in RAM with at least 25 percent headroom, set hnsw.ef_search for your recall target, and benchmark with realistic concurrency before committing to a plan. Re-evaluate when row count or dimension changes.

Can pgvector run multi-tenant workloads?

Yes. Add a tenant column with a B-tree index and use it in WHERE clauses alongside the vector search. The query planner combines the filter with the HNSW index using bitmap scans or index-only scans depending on selectivity. Keep an eye on per-tenant skew if a few tenants are much larger than others.

Bottom line

Good pgvector hosting is not just extension availability. It is storage, memory, tuning, filters, backups, high availability, migration support, and predictable cost.

If you want that as a focused managed service, see Rivestack managed pgvector hosting.