All posts What Is pgvector? A Practical Introduction
·Rivestack Team· 2 min read

What Is pgvector? A Practical Introduction

pgvector
PostgreSQL
vector search
AI

pgvector is an open-source extension that adds vector similarity search to PostgreSQL. In plain terms: it lets your database store the numeric "embeddings" that AI models produce and find the ones most similar to a given query — turning ordinary PostgreSQL into a capable vector database, with no separate system to run.

Why vectors matter

Modern AI models turn text, images, or audio into a list of numbers called an embedding — typically a few hundred to a couple thousand floating-point values. Similar items produce nearby vectors. "Find similar" becomes "find the nearest vectors," which is the foundation of semantic search, recommendations, and retrieval-augmented generation (RAG). pgvector gives PostgreSQL a native way to store and search them.

What pgvector adds

Three things:

  1. A vector column type — store an embedding as a first-class column, e.g. embedding vector(1536).
  2. Distance operators<-> (Euclidean/L2), <=> (cosine), and <#> (negative inner product) measure how close two vectors are.
  3. Approximate-nearest-neighbour indexes — HNSW and IVFFlat make search fast at scale instead of scanning every row.

Everything else is just PostgreSQL: transactions, joins, WHERE filters, backups, replication.

A first example

CREATE EXTENSION vector;                 -- enable it (a no-op on Rivestack)

CREATE TABLE items (
  id        bigserial PRIMARY KEY,
  content   text,
  embedding vector(1536)
);

-- after inserting rows with embeddings:
CREATE INDEX ON items USING hnsw (embedding vector_cosine_ops);

-- find the 5 most similar items to a query vector:
SELECT content
FROM items
ORDER BY embedding <=> $1
LIMIT 5;

That ORDER BY embedding <=> $1 is the whole idea — rank rows by cosine distance to a query embedding, fast, using the HNSW index.

What pgvector is used for

  • RAG — retrieve the most relevant chunks to ground an LLM's answer.
  • Semantic search — search by meaning, not keywords.
  • Recommendations — "more like this" over products, articles, or media.
  • Deduplication and clustering — group near-identical content.

Because the vectors sit next to your relational data, you combine similarity with normal filters in one query — see building RAG on PostgreSQL.

How to start

Any PostgreSQL with the extension works. On a managed service it is usually available to enable; on Rivestack pgvector 0.8.x ships pre-installed and tuned, so CREATE EXTENSION vector is a no-op and you can insert embeddings immediately. From there, the getting-started guide walks through your first index and query.

FAQ

What is pgvector used for?

Storing embeddings and finding similar ones: retrieval-augmented generation (RAG), semantic search, recommendations, and deduplication. Anywhere you have AI-model embeddings and want nearest-neighbour search, optionally combined with SQL filters.

Is pgvector free?

Yes. pgvector is open-source (PostgreSQL license). You only pay for the PostgreSQL instance you run it on — and Rivestack's free tier includes pgvector at no cost.

What's the difference between HNSW and IVFFlat?

They are the two index types pgvector offers. HNSW is a graph index with the best query speed and recall (the production default); IVFFlat uses inverted lists and is cheaper to build but usually needs more tuning. See our HNSW vs IVFFlat guide.


Learn more on the pgvector guide, or read about PostgreSQL for AI.