
PostgreSQL JSONB: A Complete Guide to Storing and Querying JSON Data
Every application reaches a point where the relational model gets awkward. User preferences. API responses. Feature flags. Event payloads. You could create a new column for each field, run a migration every time the schema changes, and normalize everything into lookup tables. Or you could store the data as JSON.
PostgreSQL gives you both options, and unlike databases that bolt on JSON support as an afterthought, PostgreSQL treats JSONB as a first class data type with real indexing, operators, and query planning. You get the flexibility of a document store with the guarantees of a relational database: transactions, constraints, joins, and a query planner that actually knows what it's doing.
This guide covers everything you need to work with JSONB in production: when to use it, how to query it, how to index it, and where teams get tripped up.
JSON vs JSONB: Which One to Use
PostgreSQL has two JSON types. The naming is unfortunate because it makes you think they're interchangeable. They're not.
json stores the raw text exactly as you inserted it. Whitespace, key order, duplicate keys, all preserved. Every time you query it, PostgreSQL has to reparse the text. You cannot index it. There is almost no reason to use it.
jsonb stores data in a decomposed binary format. It strips whitespace, deduplicates keys (keeping the last value), and sorts object keys. Insertion is slightly slower because of the conversion, but reads are significantly faster because the database doesn't reparse anything. More importantly, JSONB supports GIN indexes, which make containment and existence queries fast even on large tables.
-- json: stores raw text, reparsed on every query
CREATE TABLE events_json (data json);
-- jsonb: binary format, indexable, fast to query
CREATE TABLE events_jsonb (data jsonb);The short version: always use jsonb unless you have a specific reason to preserve the exact text representation (like a compliance audit trail where byte for byte fidelity matters). For everything else, JSONB is the correct choice.
Creating Tables with JSONB Columns
A JSONB column works like any other column. You can add it to an existing table, set a default value, and apply constraints.
CREATE TABLE products (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text NOT NULL,
price numeric(10,2) NOT NULL,
metadata jsonb DEFAULT '{}'::jsonb,
created_at timestamptz DEFAULT now()
);This is the most common pattern: structured columns for the data you query constantly (name, price, dates), and a JSONB column for everything else. The metadata column might store things like dimensions, color options, supplier data, or feature flags that vary by product and don't justify their own columns.
INSERT INTO products (name, price, metadata) VALUES
('Wireless Mouse', 29.99, '{
"brand": "Logitech",
"color": "black",
"wireless": true,
"dpi": [800, 1200, 1600],
"dimensions": {"width": 6.1, "height": 3.8, "depth": 11.5}
}'),
('Mechanical Keyboard', 89.99, '{
"brand": "Keychron",
"color": "white",
"switches": "brown",
"layout": "75%",
"features": ["hot-swappable", "bluetooth", "rgb"]
}'),
('USB-C Hub', 45.00, '{
"brand": "Anker",
"ports": {"usb_a": 2, "usb_c": 1, "hdmi": 1, "ethernet": 1},
"max_power_delivery": 100
}');Notice that JSONB handles nested objects, arrays, numbers, booleans, and strings without any schema definition. That's the point. The structure can vary between rows without breaking anything.
Querying JSONB Data
PostgreSQL provides a set of operators for extracting values from JSONB. The two you'll use most are -> and ->>.
Extracting Values
-- -> returns jsonb (preserves type)
SELECT metadata -> 'brand' FROM products;
-- Result: "Logitech" (as jsonb, with quotes)
-- ->> returns text (casts to string)
SELECT metadata ->> 'brand' FROM products;
-- Result: Logitech (as text, no quotes)The difference matters. Use -> when you need to chain operators or when the value is an object/array you want to keep as JSONB. Use ->> when you want a text result for display, comparison, or casting to another type.
-- Chain operators to reach nested values
SELECT metadata -> 'dimensions' ->> 'width' FROM products
WHERE name = 'Wireless Mouse';
-- Result: 6.1
-- The #> and #>> operators use a path array for deep access
SELECT metadata #>> '{dimensions,width}' FROM products
WHERE name = 'Wireless Mouse';
-- Result: 6.1Filtering with WHERE Clauses
-- Text comparison (use ->> to get text)
SELECT name, price FROM products
WHERE metadata ->> 'brand' = 'Logitech';
-- Numeric comparison (cast the text result)
SELECT name FROM products
WHERE (metadata ->> 'max_power_delivery')::int >= 60;
-- Boolean check
SELECT name FROM products
WHERE (metadata ->> 'wireless')::boolean = true;One thing that catches people: ->> always returns text, so you need to cast when comparing to numbers or booleans. If you forget the cast, PostgreSQL will do a text comparison, which gives wrong results for numbers ('9' > '10' in text sorting).
Containment Operators
This is where JSONB gets interesting. The @> operator checks whether the left value contains the right value.
-- Find products where metadata contains {"brand": "Keychron"}
SELECT name FROM products
WHERE metadata @> '{"brand": "Keychron"}';
-- Check for nested containment
SELECT name FROM products
WHERE metadata @> '{"dimensions": {"width": 6.1}}';
-- The reverse: does the right side contain the left?
SELECT name FROM products
WHERE '{"brand": "Anker", "color": "black"}' <@ metadata;Containment queries are powerful because they can use GIN indexes. A query like WHERE metadata @> '{"brand": "Keychron"}' can be answered directly from an index without scanning the table. We'll cover that in the indexing section.
Existence Operators
-- Does the key exist?
SELECT name FROM products
WHERE metadata ? 'wireless';
-- Do any of these keys exist?
SELECT name FROM products
WHERE metadata ?| array['wireless', 'switches'];
-- Do all of these keys exist?
SELECT name FROM products
WHERE metadata ?& array['brand', 'color'];Working with Arrays
JSONB arrays support the same containment operators, plus dedicated functions for unpacking.
-- Check if array contains a value
SELECT name FROM products
WHERE metadata -> 'features' @> '"bluetooth"';
-- Expand array elements into rows
SELECT name, jsonb_array_elements_text(metadata -> 'features') AS feature
FROM products
WHERE metadata ? 'features';
-- Count array elements
SELECT name, jsonb_array_length(metadata -> 'features') AS feature_count
FROM products
WHERE metadata ? 'features';Indexing JSONB for Performance
Without an index, every JSONB query does a sequential scan. That's fine for small tables, but once you're past a few hundred thousand rows, you need indexes.
GIN Indexes
GIN (Generalized Inverted Index) is the standard index type for JSONB. It supports containment (@>), existence (?, ?|, ?&), and key/value lookups.
-- Default GIN index: supports @>, ?, ?|, ?&
CREATE INDEX idx_products_metadata ON products USING gin (metadata);This single index covers a wide range of queries:
-- All of these use the GIN index
SELECT * FROM products WHERE metadata @> '{"brand": "Logitech"}';
SELECT * FROM products WHERE metadata ? 'wireless';
SELECT * FROM products WHERE metadata ?& array['brand', 'color'];The default operator class (jsonb_ops) indexes every key and value in the document. This is flexible but creates a large index. If your documents are big, consider the jsonb_path_ops class instead.
-- jsonb_path_ops: smaller index, only supports @>
CREATE INDEX idx_products_metadata_path ON products
USING gin (metadata jsonb_path_ops);jsonb_path_ops creates indexes roughly 2 to 3 times smaller than jsonb_ops and can be faster for containment queries. The tradeoff: it only supports @>, not existence operators like ? or ?|.
Expression Indexes for Specific Fields
If you query a specific JSONB field frequently, a B-tree expression index is often better than a GIN index.
-- B-tree index on a specific field
CREATE INDEX idx_products_brand ON products ((metadata ->> 'brand'));
-- Now this query uses a standard B-tree index scan
SELECT * FROM products WHERE metadata ->> 'brand' = 'Logitech';Expression indexes work well for fields you filter, sort, or join on regularly. They're smaller than GIN indexes and support equality, range, and sorting operations. The downside is that you need one per field, whereas a single GIN index covers all keys.
-- Numeric expression index (cast included)
CREATE INDEX idx_products_power ON products
((metadata ->> 'max_power_delivery')::int)
WHERE metadata ? 'max_power_delivery';
-- Partial index: only rows where the key exists
-- This keeps the index small and focusedChoosing the Right Index
| Query pattern | Index type | Operator class |
|---|---|---|
metadata @> '{"key": "value"}' |
GIN | jsonb_ops or jsonb_path_ops |
metadata ? 'key' |
GIN | jsonb_ops |
metadata ->> 'key' = 'value' |
B-tree expression | N/A |
(metadata ->> 'key')::int > 100 |
B-tree expression | N/A |
| Full text search inside JSONB values | GIN with to_tsvector |
tsvector_ops |
A common production setup: one GIN index for flexible queries, plus B-tree expression indexes on the 2 or 3 fields you query most.
Modifying JSONB Data
JSONB values are immutable at the storage level. When you "update" a JSONB field, PostgreSQL writes a new version of the entire column. This is important to understand because it means updating a single key in a large JSONB document rewrites the whole thing.
Setting and Updating Keys
-- Set or update a single key
UPDATE products
SET metadata = jsonb_set(metadata, '{color}', '"red"')
WHERE name = 'Wireless Mouse';
-- Set a nested key (creates intermediate objects if create_missing = true)
UPDATE products
SET metadata = jsonb_set(metadata, '{dimensions,weight}', '85.5', true)
WHERE name = 'Wireless Mouse';
-- The || operator merges two JSONB values (right side wins on conflicts)
UPDATE products
SET metadata = metadata || '{"color": "blue", "in_stock": true}'
WHERE name = 'Mechanical Keyboard';The || (concatenation) operator is the clearest way to add or update multiple keys at once. It's also the most readable in application code.
Removing Keys
-- Remove a single key
UPDATE products
SET metadata = metadata - 'color'
WHERE name = 'Wireless Mouse';
-- Remove a nested key using #-
UPDATE products
SET metadata = metadata #- '{dimensions,depth}'
WHERE name = 'Wireless Mouse';
-- Remove multiple keys
UPDATE products
SET metadata = metadata - 'color' - 'wireless'
WHERE name = 'Wireless Mouse';Appending to Arrays
-- Append to an existing array
UPDATE products
SET metadata = jsonb_set(
metadata,
'{features}',
(metadata -> 'features') || '"usb-c"'
)
WHERE name = 'Mechanical Keyboard';
-- Insert at a specific position (0-indexed)
UPDATE products
SET metadata = jsonb_insert(metadata, '{features,0}', '"wireless"')
WHERE name = 'Mechanical Keyboard';JSON Path Queries (PostgreSQL 12+)
PostgreSQL 12 introduced SQL/JSON path language, which gives you a more expressive way to query JSONB data. Think of it as XPath for JSON.
-- Basic path query
SELECT jsonb_path_query(metadata, '$.brand') FROM products;
-- Filter with path predicates
SELECT name FROM products
WHERE jsonb_path_exists(metadata, '$.dpi[*] ? (@ > 1000)');
-- Finds products with any DPI value above 1000
-- Extract matching values
SELECT name, jsonb_path_query(metadata, '$.dpi[*] ? (@ >= 1200)') AS high_dpi
FROM products
WHERE metadata ? 'dpi';
-- Path exists check in WHERE
SELECT name FROM products
WHERE jsonb_path_exists(metadata, '$.ports.hdmi');JSON path is particularly useful for array filtering, where the traditional operator approach gets verbose. Instead of unpacking the array with jsonb_array_elements, filtering, and re-aggregating, you write a path expression.
-- Traditional: find products with features containing "bluetooth"
SELECT name FROM products
WHERE metadata -> 'features' @> '"bluetooth"';
-- JSON path equivalent
SELECT name FROM products
WHERE jsonb_path_exists(metadata, '$.features[*] ? (@ == "bluetooth")');Both work. The path version becomes more useful when your predicates get complex, like filtering array elements by multiple conditions or navigating deeply nested structures.
Real World Patterns
Pattern 1: Event Storage
CREATE TABLE events (
id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
event_type text NOT NULL,
payload jsonb NOT NULL,
created_at timestamptz DEFAULT now()
);
CREATE INDEX idx_events_type ON events (event_type);
CREATE INDEX idx_events_payload ON events USING gin (payload jsonb_path_ops);
CREATE INDEX idx_events_created ON events (created_at);
-- Insert events with varying structures
INSERT INTO events (event_type, payload) VALUES
('user.signup', '{"user_id": 42, "source": "google", "plan": "free"}'),
('order.completed', '{"order_id": 1001, "total": 59.99, "items": 3}'),
('user.signup', '{"user_id": 43, "source": "github", "plan": "pro"}');
-- Query by type and payload
SELECT payload ->> 'user_id' AS user_id, payload ->> 'source' AS source
FROM events
WHERE event_type = 'user.signup'
AND payload @> '{"plan": "pro"}';This is the pattern you see in analytics systems, audit logs, and webhook storage. The event type gives you a fast filter column, and the GIN index handles arbitrary payload queries.
Pattern 2: User Preferences
ALTER TABLE users ADD COLUMN preferences jsonb DEFAULT '{
"theme": "system",
"notifications": {"email": true, "push": true, "sms": false},
"language": "en",
"timezone": "UTC"
}'::jsonb;
-- Update a single preference
UPDATE users
SET preferences = jsonb_set(preferences, '{theme}', '"dark"')
WHERE id = 1;
-- Toggle a nested preference
UPDATE users
SET preferences = jsonb_set(
preferences,
'{notifications,push}',
to_jsonb(NOT (preferences #>> '{notifications,push}')::boolean)
)
WHERE id = 1;Default values mean new users get sensible settings without application code handling null checks. Updates target specific keys without touching the rest.
Pattern 3: API Response Caching
CREATE TABLE api_cache (
endpoint text NOT NULL,
params jsonb NOT NULL,
response jsonb NOT NULL,
cached_at timestamptz DEFAULT now(),
expires_at timestamptz NOT NULL,
PRIMARY KEY (endpoint, params)
);
-- The primary key on (endpoint, params) ensures one cache entry
-- per unique combination. JSONB comparison in the PK works because
-- JSONB has a deterministic binary representation.
-- Upsert cache entry
INSERT INTO api_cache (endpoint, params, response, expires_at)
VALUES (
'/api/search',
'{"q": "postgresql jsonb", "page": 1}',
'{"results": [...], "total": 42}',
now() + interval '1 hour'
)
ON CONFLICT (endpoint, params) DO UPDATE
SET response = EXCLUDED.response,
cached_at = now(),
expires_at = EXCLUDED.expires_at;Pattern 4: Feature Flags
CREATE TABLE feature_flags (
flag_name text PRIMARY KEY,
config jsonb NOT NULL DEFAULT '{}'
);
INSERT INTO feature_flags VALUES
('dark_mode', '{"enabled": true, "rollout_percentage": 100}'),
('new_checkout', '{
"enabled": true,
"rollout_percentage": 25,
"allowed_users": [1, 2, 42],
"excluded_regions": ["EU"]
}');
-- Check if a feature is enabled
SELECT config ->> 'enabled' AS enabled,
config ->> 'rollout_percentage' AS rollout
FROM feature_flags
WHERE flag_name = 'new_checkout';
-- Check if user is in the allowlist
SELECT flag_name FROM feature_flags
WHERE config -> 'allowed_users' @> '42';Performance Considerations
Document Size
JSONB columns can store up to 1 GB per value (compressed via TOAST). In practice, keep documents under a few kilobytes. Large documents have two costs: every update rewrites the entire value, and TOAST decompression adds overhead on reads.
If your JSONB documents are growing past 10 KB regularly, consider whether some of the data belongs in its own table. A common refactoring is moving arrays of related items into a separate table with a foreign key.
Index Bloat with GIN
GIN indexes on frequently updated JSONB columns can bloat faster than B-tree indexes. This happens because GIN uses a "pending list" for writes, and cleanup happens during vacuum. If your workload is write heavy, monitor index size and consider adjusting gin_pending_list_limit.
-- Check index size
SELECT pg_size_pretty(pg_relation_size('idx_products_metadata')) AS index_size;
-- Reindex if bloated
REINDEX INDEX CONCURRENTLY idx_products_metadata;When to Use JSONB vs Columns
JSONB is the right choice when:
- The schema varies between rows (products with different attributes)
- You're storing third party data you don't control (API responses, webhooks)
- The data is read heavy with infrequent updates
- You need to prototype quickly and the schema isn't settled yet
Regular columns are better when:
- You query the field in most of your queries
- The field participates in joins or foreign keys
- You need column level constraints (NOT NULL, UNIQUE, CHECK)
- You need to sort or aggregate on the field regularly
The hybrid approach (structured columns for core data, JSONB for flexible metadata) is the most common pattern in production and usually the right answer.
Common Mistakes
Storing everything in JSONB. Some teams put their entire data model into a single JSONB column, effectively using PostgreSQL as a document store. This defeats the purpose of using a relational database. You lose type safety, constraints, foreign keys, and join performance. Use JSONB for the parts of your schema that actually need flexibility.
Forgetting to cast ->> results. The ->> operator returns text. Comparing (metadata ->> 'price')::numeric > 50 works correctly, but metadata ->> 'count' > '9' does a text comparison where '9' is greater than '10'. Always cast when comparing to non-text types.
Creating GIN indexes on tiny tables. GIN indexes have overhead. On a table with a few thousand rows, a sequential scan is faster. Don't add a GIN index until you've confirmed with EXPLAIN ANALYZE that the query actually needs one.
Not using partial indexes. If only 10% of rows have a particular JSONB key, a partial index with a WHERE clause is dramatically smaller and faster than indexing the entire table.
-- Instead of indexing everything
CREATE INDEX idx_all ON events USING gin (payload);
-- Index only what you query
CREATE INDEX idx_signup ON events USING gin (payload jsonb_path_ops)
WHERE event_type = 'user.signup';Deeply nested updates. Each call to jsonb_set creates a new copy of the document. Chaining multiple jsonb_set calls in a single UPDATE is fine, but if you're doing it in a loop or across many keys, consider building the final object in application code and writing it in one shot.
Conclusion
PostgreSQL JSONB gives you document store flexibility inside a relational database. You get real indexing, a mature query planner, and the full power of SQL for combining structured and semi structured data.
The key principles: use JSONB for the flexible parts of your schema, not as a replacement for columns. Index with GIN for containment queries and B-tree expressions for specific field lookups. Keep documents small. Cast your comparisons.
If you're running PostgreSQL on Rivestack, JSONB works out of the box with no configuration. Your GIN indexes will perform well on NVMe storage, and connection pooling handles the concurrent query load. Check out our PostgreSQL indexing guide for more on GIN and other index types, or our full text search guide for combining tsvector with JSONB queries.