All posts PostgreSQL Full Text Search: A Complete Guide
·Rivestack Team

PostgreSQL Full Text Search: A Complete Guide

PostgreSQL
full text search
tsvector
tsquery
GIN index
search

There's a database already running in your stack. It has your users, your content, your transactions. And buried in that same PostgreSQL instance is a full text search engine you've probably never turned on.

PostgreSQL full text search has been production ready for over a decade. It handles stemming, stop words, multiple languages, weighted ranking, and trigram fuzzy matching. You don't need Elasticsearch for a search feature. You don't need Algolia if your data is already in Postgres. For most applications, especially those with under a few million documents, built-in full text search is the right call.

This guide covers everything you need to ship full text search in PostgreSQL: how the underlying model works, how to index correctly, how to rank results, and how it compares to vector search with pgvector.

How PostgreSQL Full Text Search Works

PostgreSQL doesn't search raw text. It converts text into a normalized representation called a tsvector, then matches queries expressed as tsquery objects. This two-step process is what makes it fast.

A tsvector is a sorted list of lexemes: normalized word forms that strip suffixes and reduce words to their base form. The word "running" becomes "run". "Postgres" becomes "postgr". Stop words like "the", "a", "an" are dropped entirely. Each lexeme stores a list of positions (which word position in the document it came from), which enables phrase search.

SELECT to_tsvector('english', 'The quick brown fox jumps over the lazy dog');

This returns:

'brown':3 'dog':9 'fox':4 'jump':5 'lazi':8 'quick':2

Stop words are gone. "jumps" became "jump". "lazy" became "lazi". PostgreSQL did all of that with the English text search configuration.

A tsquery is what you match against. The simplest form is a keyword:

SELECT to_tsquery('english', 'jumping');
-- Returns: 'jump'

You can use boolean operators:

-- AND: both words must appear
SELECT to_tsquery('english', 'postgres & search');

-- OR: either word
SELECT to_tsquery('english', 'postgres | mysql');

-- NOT: must not appear
SELECT to_tsquery('english', 'database & !oracle');

-- Phrase: words must appear adjacent
SELECT to_tsquery('english', '''full text''');

The match operator is @@:

SELECT to_tsvector('english', 'PostgreSQL is a powerful database')
  @@ to_tsquery('english', 'powerful');
-- Returns: true

Setting Up Full Text Search

Let's build a working example. Say you have a table of articles:

CREATE TABLE articles (
  id SERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW()
);

Option 1: Compute tsvector on the fly

The simplest approach queries without any precomputation:

SELECT id, title
FROM articles
WHERE to_tsvector('english', title || ' ' || body) @@ to_tsquery('english', 'postgresql')
ORDER BY ts_rank(to_tsvector('english', title || ' ' || body), to_tsquery('english', 'postgresql')) DESC
LIMIT 20;

This works, but computing the tsvector on every query is slow at scale. You're doing the same text processing on every row, on every search.

Option 2: Generated column (recommended for PostgreSQL 12+)

Add a generated column that stores the precomputed tsvector:

ALTER TABLE articles
  ADD COLUMN search_vector TSVECTOR
  GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED;

Notice setweight: it assigns weight 'A' (highest) to the title and 'B' to the body. Documents where the search term appears in the title rank higher than those where it only appears in the body.

Now query using the stored column:

SELECT id, title,
  ts_rank(search_vector, query) AS rank
FROM articles,
  to_tsquery('english', 'postgresql') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

Option 3: Trigger-maintained column

Before generated columns existed, the pattern was a trigger. This approach gives you more control and works on older PostgreSQL versions:

ALTER TABLE articles ADD COLUMN search_vector TSVECTOR;

CREATE FUNCTION articles_search_vector_trigger() RETURNS TRIGGER AS $$
BEGIN
  NEW.search_vector :=
    setweight(to_tsvector('english', coalesce(NEW.title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(NEW.body, '')), 'B');
  RETURN NEW;
END;
$$ LANGUAGE plpgsql;

CREATE TRIGGER articles_search_vector_update
  BEFORE INSERT OR UPDATE ON articles
  FOR EACH ROW EXECUTE FUNCTION articles_search_vector_trigger();

Indexing for Performance

Without an index, full text search does a full table scan. For any table with more than a few thousand rows, you need a GIN index.

GIN (Generalized Inverted Index) is purpose built for composite values like tsvectors. It maintains an inverted list: for each lexeme, a list of which rows contain it. This makes @@ queries extremely fast.

CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

With a GIN index on the stored column, full text search queries will use an index scan and return in milliseconds even on tables with millions of rows.

If you're computing the tsvector on the fly and can't add a column, you can create a functional GIN index:

CREATE INDEX articles_fts_idx ON articles
  USING GIN (to_tsvector('english', title || ' ' || body));

The generated column approach is cleaner and more maintainable for most cases.

Ranking Search Results

Raw @@ matching tells you what matches. ts_rank tells you how well it matches.

SELECT
  id,
  title,
  ts_rank(search_vector, query) AS rank,
  ts_rank_cd(search_vector, query) AS rank_cover_density
FROM articles, to_tsquery('english', 'postgresql & database') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 10;

ts_rank scores based on how often the terms appear and where (using the weights from setweight). ts_rank_cd uses cover density, which also considers how close the matching terms are to each other. For most use cases, ts_rank is fine.

You can also normalize rank by document length to prevent long documents from dominating:

SELECT id, title,
  ts_rank(search_vector, query, 1) AS rank  -- normalization: divide by document length
FROM articles, to_tsquery('english', 'database') query
WHERE search_vector @@ query
ORDER BY rank DESC;

Snippet Generation

Showing a snippet of matching text is standard in search UIs. ts_headline generates one:

SELECT
  id,
  title,
  ts_headline(
    'english',
    body,
    to_tsquery('english', 'postgresql'),
    'MaxWords=50, MinWords=15, StartSel=<mark>, StopSel=</mark>'
  ) AS snippet
FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql')
LIMIT 5;

This returns the most relevant fragment of body with the matching terms wrapped in <mark> tags, ready to render in a browser.

One thing to watch: ts_headline does not use the GIN index. It processes the raw text column to generate the snippet. For large result sets, call it only on the final page of results, not on everything before pagination.

Handling User Input Safely

Users don't type to_tsquery syntax. They type "postgres full text search tutorial". Use plainto_tsquery or websearch_to_tsquery to handle this:

-- plainto_tsquery: treats input as AND of words, no special characters needed
SELECT id, title
FROM articles
WHERE search_vector @@ plainto_tsquery('english', 'postgres full text search');
-- Equivalent to: 'postgr' & 'full' & 'text' & 'search'

-- websearch_to_tsquery: supports Google-style syntax (quotes for phrases, minus for exclusion)
SELECT id, title
FROM articles
WHERE search_vector @@ websearch_to_tsquery('english', '"postgres" full text -oracle');

websearch_to_tsquery is available in PostgreSQL 11+ and is the best default for any search box that accepts user input. It's safe (no injection), it handles partial syntax gracefully, and it supports quoted phrases.

Text Search Configurations and Languages

The first argument to to_tsvector, to_tsquery, and related functions is the text search configuration. The English configuration knows English stop words and uses the Snowball stemmer for English.

PostgreSQL ships with configurations for many languages:

-- List all available configurations
SELECT cfgname FROM pg_ts_config;

-- French
SELECT to_tsvector('french', 'La recherche plein texte dans PostgreSQL est rapide');

-- Spanish
SELECT to_tsvector('spanish', 'PostgreSQL tiene búsqueda de texto completo nativa');

If your application is multilingual, store the language in a column and use it dynamically:

SELECT id, title
FROM articles
WHERE to_tsvector(language::regconfig, body) @@ to_tsquery(language::regconfig, $1);

For languages without a built-in configuration (Chinese, Japanese, Korean), you'll need an extension like zhparser or a trigram approach instead.

Fuzzy Search with pg_trgm

Standard full text search won't match typos. "PostgreSQL" won't match "Prostgresql". For fuzzy matching, combine FTS with the pg_trgm extension:

CREATE EXTENSION pg_trgm;

-- Add a trigram index for fuzzy matching
CREATE INDEX articles_title_trgm_idx ON articles USING GIN (title gin_trgm_ops);

-- Find articles with similar titles (similarity > 0.3)
SELECT id, title, similarity(title, 'postgresql tutorial') AS sim
FROM articles
WHERE title % 'postgresql tutorial'
ORDER BY sim DESC
LIMIT 10;

A common pattern is to run FTS first (fast, precise) and fall back to trigram similarity for zero-result queries.

PostgreSQL Full Text Search vs pgvector

This question comes up a lot if you're building a search feature for an AI application: should you use FTS, pgvector, or both?

The short answer: they solve different problems.

Full text search finds documents containing specific words or phrases. A query for "database replication" will match documents that literally contain those words. It's fast, precise, and requires no embedding model.

Vector search with pgvector finds documents that are semantically similar, even if they don't share keywords. A query for "database replication" might surface a document about "PostgreSQL streaming standbys" because the embedding captures meaning, not exact terms. Vector search requires computing embeddings (a call to an embedding model for every query and every document).

The choice depends on your search intent:

Use case Best approach
Blog search, documentation search Full text search
Semantic Q&A, RAG pipelines pgvector
E-commerce product search Often a combination
Code search pgvector or trigrams

Many production systems use both. FTS handles keyword search; pgvector handles semantic search. Some merge the two scores using Reciprocal Rank Fusion to combine results from both into a single ranked list.

If you're building a RAG pipeline on PostgreSQL, you probably need pgvector for the semantic retrieval step. Check out our guide to getting started with pgvector and how to use pgvector with Python for the setup.

A Complete Production Setup

Here's a full, production ready setup for an articles table with full text search:

-- Table with generated tsvector column
CREATE TABLE articles (
  id BIGSERIAL PRIMARY KEY,
  title TEXT NOT NULL,
  body TEXT NOT NULL,
  created_at TIMESTAMPTZ DEFAULT NOW(),
  search_vector TSVECTOR GENERATED ALWAYS AS (
    setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(body, '')), 'B')
  ) STORED
);

-- GIN index for fast search
CREATE INDEX articles_search_idx ON articles USING GIN (search_vector);

-- Search function with ranking and snippets
CREATE FUNCTION search_articles(
  query_text TEXT,
  page_size INT DEFAULT 20,
  page_offset INT DEFAULT 0
)
RETURNS TABLE (
  id BIGINT,
  title TEXT,
  snippet TEXT,
  rank FLOAT4
) AS $$
DECLARE
  q TSQUERY := websearch_to_tsquery('english', query_text);
BEGIN
  RETURN QUERY
  SELECT
    a.id,
    a.title,
    ts_headline(
      'english', a.body, q,
      'MaxWords=40, MinWords=10, StartSel=<mark>, StopSel=</mark>'
    ),
    ts_rank(a.search_vector, q)
  FROM articles a
  WHERE a.search_vector @@ q
  ORDER BY ts_rank(a.search_vector, q) DESC
  LIMIT page_size OFFSET page_offset;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT * FROM search_articles('postgresql replication', 20, 0);

This gives you keyword based search, relevance ranking, HTML ready snippets, and pagination, all within PostgreSQL. No extra infrastructure, no separate search service, no synchronization to manage.

Production Checklist

Before shipping full text search to production:

Verify the index is being used:

EXPLAIN (ANALYZE, BUFFERS)
SELECT id FROM articles
WHERE search_vector @@ to_tsquery('english', 'postgresql');

Look for "Bitmap Index Scan" or "Index Scan" on articles_search_idx. If you see "Seq Scan", the index isn't being used; check that the column types match.

Backfill after bulk inserts: If you insert many rows at once (data migrations, imports), confirm the generated column was populated correctly:

-- Check for null tsvectors (shouldn't happen with GENERATED ALWAYS, but verify)
SELECT count(*) FROM articles WHERE search_vector IS NULL;

-- Manual backfill for trigger based setups
UPDATE articles SET search_vector =
  setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
  setweight(to_tsvector('english', coalesce(body, '')), 'B');

Run VACUUM regularly: GIN indexes accumulate pending entries with fastupdate = on (the default). Regular autovacuum will flush these, but after large bulk inserts you may want to run VACUUM articles manually to keep query performance consistent.

Paginate carefully: OFFSET 10000 is slow regardless of indexing because PostgreSQL has to scan and discard rows. For deep pagination, use keyset pagination on rank + id:

SELECT id, title, ts_rank(search_vector, q) AS rank
FROM articles, to_tsquery('english', 'postgresql') q
WHERE search_vector @@ q
  AND (ts_rank(search_vector, q), id) < ($last_rank, $last_id)
ORDER BY rank DESC, id DESC
LIMIT 20;

Try It on Rivestack

PostgreSQL full text search and pgvector work side by side on the same database. You don't need separate infrastructure for keyword search and semantic search; you can run both queries against the same table, combining results however makes sense for your application.

If you want to run this in production without managing PostgreSQL yourself, Rivestack gives you a fully managed PostgreSQL instance with pgvector, pg_trgm, and all standard extensions enabled by default. No extension installation, no configuration tuning. Just a connection string and you're building.

Try Rivestack for free, no credit card required.