All posts PostgreSQL Backup and Recovery: A Complete Guide
·Rivestack Team

PostgreSQL Backup and Recovery: A Complete Guide

PostgreSQL
backup
recovery
PITR
pgBackRest
production
database

Your database has never failed. That's not because you're careful. It's because you haven't been running long enough.

Hardware dies. Disk controllers corrupt data silently for weeks before anyone notices. An engineer runs a DELETE without a WHERE clause on production. A cloud provider has a multi-zone outage. The only thing standing between you and catastrophic data loss is a backup you can actually restore from.

This guide covers how PostgreSQL backup actually works: the difference between logical and physical backups, how point-in-time recovery lets you undo bad queries, and what a production-grade backup strategy looks like. We will go through the tools, the tradeoffs, and the one mistake most teams make that turns a "we have backups" story into a "we lost two days of data" incident.

Logical vs Physical Backups

Before touching any tools, you need to understand the two categories of PostgreSQL backup:

Logical backups capture the data as SQL. The output is a file of CREATE TABLE, INSERT, and COPY statements that recreate your schema and data from scratch.

Physical backups copy the raw files that PostgreSQL uses to store data on disk, including the base directory and Write-Ahead Log (WAL) files.

Each has a different purpose:

Logical (pg_dump) Physical (pg_basebackup)
Portability High — restore to different Postgres versions Low — must match major version
Granularity Table, schema, database Entire cluster
Restore time Slow for large databases Fast — just copy files back
Point-in-time recovery No Yes (with WAL archiving)
Consistency Always consistent Consistent (with pg_start_backup)

For small databases or specific table exports, logical backups with pg_dump are perfectly reasonable. For production databases over a few gigabytes, or whenever you need point-in-time recovery, physical backups are the right foundation.

pg_dump: Logical Backups for Specific Needs

pg_dump is a utility that comes bundled with PostgreSQL. It connects to a running database and produces a self-consistent snapshot you can restore elsewhere.

Basic usage

# Dump a database to a plain SQL file
pg_dump -h localhost -U postgres mydb > mydb.sql

# Dump in custom format (recommended — faster restore, selective restore)
pg_dump -h localhost -U postgres -Fc mydb > mydb.dump

# Dump to a directory with parallel workers
pg_dump -h localhost -U postgres -Fd -j 4 -f mydb_dir/ mydb

The custom format (-Fc) is almost always better than plain SQL. It compresses the output, supports parallel restore, and lets you restore individual tables or schemas selectively.

Restoring a pg_dump backup

# Restore a plain SQL dump
psql -h localhost -U postgres newdb < mydb.sql

# Restore a custom format dump
pg_restore -h localhost -U postgres -d newdb mydb.dump

# Restore with parallel workers (much faster for large databases)
pg_restore -h localhost -U postgres -j 4 -d newdb mydb.dump

Limitations of pg_dump

pg_dump works by running a transaction against your running database. For large databases, this can take hours, and the entire dump runs in a single transaction to ensure consistency. If you need to restore 500 GB with pg_restore, you are looking at several hours of downtime.

More importantly, pg_dump gives you a snapshot at a point in time. If you run it at midnight and your application corrupts data at 10 PM the following day, you lose 22 hours of data. That might be acceptable for some workloads. For most production databases, it is not.

pg_basebackup: Physical Backups of the Entire Cluster

pg_basebackup creates a physical copy of the PostgreSQL data directory. Combined with WAL archiving, it forms the foundation of production backup strategies.

What pg_basebackup does

PostgreSQL writes every change to the Write-Ahead Log before applying it to the actual data files. This means a physical backup taken at any moment is internally consistent if you also keep the WAL generated during the backup. pg_basebackup handles the coordination automatically.

# Take a base backup
pg_basebackup \
  -h localhost \
  -U replicator \
  -D /backups/base \
  -Ft \         # tar format
  -z \          # gzip compression
  -P \          # show progress
  --wal-method=stream  # stream WAL concurrently during backup

# Or with parallel tablespace streaming
pg_basebackup \
  -h localhost \
  -U replicator \
  -D /backups/base \
  -Ft -z \
  --checkpoint=fast \
  --wal-method=stream \
  -P

The replicator user needs REPLICATION privilege:

CREATE ROLE replicator WITH REPLICATION LOGIN PASSWORD 'yourpassword';

And pg_hba.conf needs an entry to allow this user to connect for replication:

host replication replicator 10.0.0.0/24 scram-sha-256

Restoring from pg_basebackup

# Stop PostgreSQL
sudo systemctl stop postgresql

# Clear the data directory
sudo rm -rf /var/lib/postgresql/16/main/*

# Extract the backup
sudo tar -xzf /backups/base/base.tar.gz -C /var/lib/postgresql/16/main/

# Start PostgreSQL
sudo systemctl start postgresql

This gives you the database as it was at the time of the backup. To recover to a point after the backup, you need WAL archiving.

WAL Archiving and Point-In-Time Recovery

Point-in-time recovery (PITR) lets you restore a database to any moment between your last base backup and now. It works by replaying WAL files on top of a base backup.

Enabling WAL archiving

In postgresql.conf:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /mnt/wal-archive/%f'

The archive_command runs every time PostgreSQL completes a WAL segment (typically 16 MB). The %p is the path to the WAL file, and %f is just the filename. In production, you would replace the cp with a command that ships to object storage:

archive_command = 'aws s3 cp %p s3://my-wal-bucket/wal/%f'

Or with pgBackRest (more on that shortly):

archive_command = 'pgbackrest --stanza=main archive-push %p'

After changing these settings, reload PostgreSQL:

sudo systemctl reload postgresql

Performing a point-in-time recovery

Create a recovery.conf file (PostgreSQL 11 and earlier) or add parameters to postgresql.conf (PostgreSQL 12+):

# PostgreSQL 12+: create a recovery signal file
touch /var/lib/postgresql/16/main/recovery.signal

# Add to postgresql.conf
restore_command = 'cp /mnt/wal-archive/%f %p'
recovery_target_time = '2026-04-18 14:30:00'
recovery_target_action = 'promote'

PostgreSQL will replay WAL files until it reaches the target time, then promote itself to a primary. This lets you undo a bad migration, a dropped table, or any other data-destroying event, as long as you have WAL covering the window you need to recover.

Recovery target options

-- Recover to a specific time
recovery_target_time = '2026-04-18 14:30:00 UTC'

-- Recover to a specific transaction ID
recovery_target_xid = '12345678'

-- Recover to a named restore point you created
recovery_target_name = 'before_migration'

-- Recover to the latest available WAL (default)
recovery_target = 'immediate'

You create named restore points in SQL:

SELECT pg_create_restore_point('before_migration');

This is worth doing before any significant schema migration. If the migration goes badly, you have a clean recovery target.

pgBackRest: Production-Grade Backup Management

Running pg_basebackup and archiving WAL yourself works, but it requires you to manage retention, expiration, backup catalogs, parallel processing, and restore orchestration. For production databases, teams typically use a dedicated backup tool.

pgBackRest is the most widely used option. It handles everything from incremental backups to parallel compression and verification.

Installing pgBackRest

# Ubuntu/Debian
sudo apt-get install pgbackrest

# RHEL/CentOS
sudo yum install pgbackrest

Minimal pgBackRest configuration

/etc/pgbackrest/pgbackrest.conf:

[global]
repo1-path=/var/lib/pgbackrest
repo1-retention-full=2
repo1-retention-diff=7
process-max=4

[main]
pg1-path=/var/lib/postgresql/16/main

Initialize the stanza (tells pgBackRest about your Postgres cluster):

sudo -u postgres pgbackrest --stanza=main stanza-create

In postgresql.conf:

archive_command = 'pgbackrest --stanza=main archive-push %p'
archive_mode = on
wal_level = replica

Running backups

# Full backup
sudo -u postgres pgbackrest --stanza=main --type=full backup

# Differential backup (since last full)
sudo -u postgres pgbackrest --stanza=main --type=diff backup

# Incremental backup (since last backup of any type)
sudo -u postgres pgbackrest --stanza=main --type=incr backup

Restoring with pgBackRest

# Stop PostgreSQL
sudo systemctl stop postgresql

# Restore the latest backup
sudo -u postgres pgbackrest --stanza=main restore

# Restore to a specific point in time
sudo -u postgres pgbackrest --stanza=main restore \
  --type=time \
  --target="2026-04-18 14:30:00"

# Start PostgreSQL
sudo systemctl start postgresql

pgBackRest handles the restore command setup automatically — you do not need to manually configure restore_command in postgresql.conf.

For remote and cloud storage, pgBackRest supports S3, GCS, and Azure Blob Storage with a simple config section:

[global]
repo1-type=s3
repo1-path=/pgbackrest
repo1-s3-bucket=my-backup-bucket
repo1-s3-region=us-east-1
repo1-s3-endpoint=s3.amazonaws.com

Backup Verification: The Step Most Teams Skip

Here is the uncomfortable truth about backup strategies: an untested backup is not a backup. It is an assumption.

WAL segments can be silently corrupted. Disk writes can fail without errors. Archive commands can silently succeed (exit 0) while producing empty files. pg_dump output can be truncated. You will not know until you try to restore.

The only way to know your backups work is to restore from them, regularly, on a real PostgreSQL instance.

How to verify logical backups

# Restore the dump to a test database
createdb testdb
pg_restore -h localhost -U postgres -d testdb mydb.dump

# Run a quick sanity check
psql -h localhost -U postgres -d testdb -c "
SELECT
  schemaname,
  tablename,
  n_live_tup
FROM pg_stat_user_tables
ORDER BY n_live_tup DESC
LIMIT 20;
"

Compare the row counts against your production database. If they match, the backup is valid.

How to verify pgBackRest backups

# Verify backup integrity
sudo -u postgres pgbackrest --stanza=main verify

This checks that all WAL segments and backup files are present and have the expected checksums. It does not do a full restore, but it catches the most common failure modes.

For mission-critical databases, the only reliable verification is a full restore drill. Set up an automated job that:

  1. Takes a production backup
  2. Restores it to an isolated test instance
  3. Runs a row count check against a known reference
  4. Alerts if anything diverges

Automated restore testing sounds like overhead. It is overhead. It is also the thing that has saved teams from finding out their backups were corrupted during an actual disaster.

Backup Retention and Storage

How long should you keep backups?

A common production policy:

  • Daily full backups, kept for 7 days
  • WAL archiving continuous (allows PITR within the retention window)
  • Weekly backups kept for 4 weeks
  • Monthly snapshots kept for 12 months

The right numbers depend on your recovery time objective (RTO) and recovery point objective (RPO). If regulators require you to restore data from 90 days ago, keep 90 days of backups. If you can tolerate 1 hour of data loss, hourly base backups may not be necessary as long as WAL archiving covers the gaps.

Storage costs are real. A 100 GB database with continuous WAL archiving might generate 20-50 GB of WAL per day under moderate write load. Multiply by retention and you are looking at substantial storage bills. pgBackRest's compression (zstd by default) typically reduces this by 60-70%, and incremental backups reduce the base backup storage significantly.

Recovery Time in Practice

Recovery time is where theories about backups meet the reality of downtime.

For a logical restore from pg_dump: plan for roughly 1 GB of data per minute on typical hardware with a single-worker restore. A 100 GB database means roughly 100 minutes of restore time, plus any indexing overhead. With parallel restore (-j 4), that drops to 30-40 minutes if I/O keeps up.

For a physical restore from pgBackRest with local storage: you are limited by I/O throughput. Restoring from cloud object storage adds network throughput as a bottleneck. A 100 GB restore over a 1 Gbps connection takes under 15 minutes for the base backup, plus time to replay WAL to your target time.

This matters for your incident response plan. If your database is down and you are restoring from a backup, every minute of restore time is a minute of outage. Know your restore time before the incident, not during it.

Backup on Managed PostgreSQL

If you are running PostgreSQL yourself, implementing this backup infrastructure requires:

  • A WAL archive destination (S3 or similar) with appropriate retention policies
  • pgBackRest or Barman installed and configured
  • Monitoring for archive failures
  • Automated verification jobs
  • Runbook documentation for on-call engineers

That is a reasonable investment for a team whose core competency is database operations. For product teams building applications, it is substantial infrastructure to maintain alongside the application itself.

Rivestack handles this automatically. Every database includes continuous WAL archiving, encrypted off-site backups, and one-click point-in-time recovery to any second within the retention window. If you need to undo a bad migration or recover from data loss, you pick a timestamp and restore — the infrastructure is already there.

The retention policy, backup verification, and storage costs are included. The on-call page about a corrupted WAL segment at 3 AM is not.

If you are managing your own PostgreSQL backup strategy and hitting the limitations, it is worth reading through how PostgreSQL high availability fits into the picture before deciding whether to build or use managed infrastructure.

A Practical Backup Checklist

Before you call your backup strategy production-ready:

  • Base backups running on a schedule (at least daily)
  • WAL archiving enabled and archive destination confirmed with archive_status
  • Backup verification job running at least weekly
  • Restore procedure documented and tested by someone other than the person who wrote it
  • Retention policy defined and enforced (not just "backups exist somewhere")
  • Monitoring for archive failures (check pg_stat_archiver)
  • Offsite storage for base backups and WAL (not on the same machine or availability zone as the primary)
  • Recovery time measured on actual restore drill, not estimated

Monitor archive health with:

SELECT
  archived_count,
  last_archived_wal,
  last_archived_time,
  failed_count,
  last_failed_wal,
  last_failed_time,
  stats_reset
FROM pg_stat_archiver;

If failed_count is climbing and last_archived_time is stale, your WAL archive is silently broken. This is one of the most common ways teams discover their backup strategy does not work.

Summary

PostgreSQL gives you the primitives for a solid backup strategy, but assembling them correctly takes real work:

  • Use pg_dump for logical backups, table exports, and database migrations
  • Use pg_basebackup or pgBackRest for production physical backups with point-in-time recovery
  • Enable WAL archiving to support PITR and reduce your recovery point window
  • Test your restores. Regularly. On real data.
  • Measure your actual restore time before you need it

The tools exist and they work. The gap between "we have backups" and "we can actually recover from them" is almost always process, not tooling: untested restores, silent archive failures, or retention policies that were set once and never revisited.

If you are building on PostgreSQL and want backups that work without the operational overhead, try Rivestack. Continuous backup, point-in-time recovery, and restore in a few clicks.

For related reading on production PostgreSQL, check out our guides on connection pooling with PgBouncer and PostgreSQL indexing strategies.