PostgreSQL LISTEN/NOTIFY Is Not Durable — Here's the Fix with pg_durable

Why LISTEN/NOTIFY drops messages when nobody is listening, and how pg_durable's durable signals and workflow engine keep long-running work alive across restarts.

Dinesh

June 14, 2026 8 min read

What if PostgreSQL could run a multi-step workflow, pause it for days, survive a restart, and continue from the exact point where it stopped — with no external queue or worker service? That is what pg_durable does.

The Microsoft Azure team announced pg_durable in June 2026 (still in preview). It is not a replacement for scheduled-job extensions like pg_cron. Its purpose is different: it runs long-running workflows durably inside the database. It is built on duroxide, a Rust framework for durable workflows.

Why orchestration inside the database?

We can already chain steps inside a single PL/pgSQL function, so why do we need more?

Because a PL/pgSQL function runs as one transaction. It is all-or-nothing, and it must finish quickly — you cannot hold a transaction open for hours while you wait for a payment, an approval, or an external API.

pg_durable is a full orchestration system. Each step is saved (checkpointed) as it completes, so a workflow can wait, survive a crash, and resume without repeating work that is already done.

Get it running in under a minute

pg_durable ships as a ready-to-run Docker image (PostgreSQL 17 with the extension built in).

# 1. Start PostgreSQL with pg_durable
docker run -d --name pg_durable \
  -e POSTGRES_PASSWORD=secret -p 5434:5432 \
  ghcr.io/microsoft/pg_durable:0.2.2-pg17

# 2. Connect
psql "postgresql://postgres:secret@localhost:5434/postgres"

-- 3. Enable the extension (functions live in the df schema)
CREATE EXTENSION IF NOT EXISTS pg_durable;

-- 4. Verify
SELECT df.version();

-- 5. Run your first workflow:
--    save a step's result, then reuse it with $total
SELECT df.start(
  'SELECT 100 AS amount' |=> 'total'
  ~> 'SELECT $total * 2 AS doubled'
);

-- 6. Check the result
SELECT * FROM df.list_instances(limit_count => 1);

What CREATE EXTENSION actually starts

pg_durable is not only a set of functions. It ships a background worker that runs all the time. Two things must be in place.

1. The library must be preloaded. The worker is registered before the database accepts connections, so the library loads at startup through shared_preload_libraries:

SHOW shared_preload_libraries;   -- pg_durable

The Docker image sets this for you. On your own server, add pg_durable to shared_preload_libraries in postgresql.conf and restart.

2. CREATE EXTENSION sets up the engine. It then:

Creates the df schema — the functions you call (df.start, df.sql, df.seq, and so on).
Creates the internal duroxide schema — the polling functions and the bookkeeping tables (df.instances, df.nodes, df.vars).
Starts a single background worker that runs the duroxide engine.

That one worker opens several normal client connections to poll for and run work. So if pg_stat_activity shows a few sessions running duroxide.fetch_* queries, those are the engine’s own connections — not separate workers.

Settings you can tune

All settings use the pg_durable. prefix:

SELECT name, setting FROM pg_settings WHERE name LIKE 'pg_durable.%';

Setting	Default	What it controls
`pg_durable.database`	`postgres`	The database the worker attaches to
`pg_durable.worker_role`	`postgres`	The role the worker runs as
`pg_durable.max_duroxide_connections`	`10`	Pool for orchestration state and the listener
`pg_durable.max_user_connections`	`10`	Pool for running your SQL steps
`pg_durable.max_management_connections`	`6`	Pool for lifecycle and status updates
`pg_durable.execution_acquire_timeout`	`30`	Seconds to wait for a free slot before a step fails
`pg_durable.enable_superuser_instances`	`on`	Allow workflows submitted by superusers

These settings need a restart, not a reload. They are read once at startup, so pg_reload_conf() will not apply them — you have to restart PostgreSQL for a change to take effect.

Keep in mind that the three connection pools use real backend slots. Make sure their total stays well below your server’s max_connections.

Each step is its own transaction

It is natural to assume a workflow is one atomic transaction — all steps commit, or none do. That is not how it works. Each step runs and commits in its own transaction.

Here is a three-step workflow. Step 3 fails on purpose:

CREATE TABLE orders (id int, status text, at_step text);

-- step 3 fails on purpose
SELECT df.start(
  'INSERT INTO orders VALUES (1, ''created'', ''step1'')'
  ~> 'UPDATE orders SET status = ''paid'', at_step = ''step2'' WHERE id = 1'
  ~> 'SELECT 1/0 AS this_step_fails'
);

If the workflow were one atomic transaction, step 3’s failure would roll back everything. Instead, the workflow is marked failed, but the first two steps remain committed:

SELECT status FROM df.list_instances(limit_count => 1);
-- failed

SELECT * FROM orders;
-- 1 | paid | step2

The order was created and marked paid even though the workflow stopped at the next step, because each step had already committed in its own transaction.

This is the foundation of durable execution: a crash never re-runs a completed step, so the workflow resumes from the last checkpoint. A single large transaction could not do this, and could not stay open for a workflow that waits for hours.

If you need a later failure to undo earlier work, add a compensating step (the saga pattern) yourself. The engine makes each step durable; rolling back across steps is the application’s job.

Durable signals: what LISTEN/NOTIFY cannot do

pg_durable has its own signals (df.signal and df.wait_for_signal), so a fair question is whether these are just NOTIFY in another form. They are not. NOTIFY is a transient broadcast to whoever is connected at that moment. A signal is a durable resume sent to one specific suspended workflow, even if it has been waiting for days.

	`LISTEN` / `NOTIFY`	pg_durable signal
Durability	None — in memory	Checkpointed to disk; survives restarts
Target	Every listening connection	One specific workflow instance
If nobody is listening	The message is lost	The workflow resumes when the signal arrives
Holds a connection while waiting?	Yes	No — the workflow holds nothing
Best for	Fast, transient broadcasts	Durable coordination: approvals, sagas

A workflow that submits a document, waits for approval, then publishes:

SELECT df.start(
  'INSERT INTO doc_review VALUES (1, ''submitted'')'
  ~> df.wait_for_signal('approved')
  ~> 'UPDATE doc_review SET state = ''published'' WHERE id = 1'
);

While it waits, only step 1 has run and the workflow holds no open connection. A reviewer resumes it by signaling that instance:

SELECT df.signal('04bf5dac', 'approved', '{"by":"alice"}');

The approval could arrive five seconds or five days later, even across a restart, and the result is the same. NOTIFY cannot do this: if no session is listening when the event is sent, the message is gone.

Note (0.2.2, preview): a signal is not a mailbox. A signal sent before the workflow reaches its wait_for_signal step is dropped. For human-driven flows this is fine; for fast machine events, plan for the race.

Use them together. They are not competitors: NOTIFY for the fast, transient “something happened” message, df.signal for the durable “resume this workflow” that holds while the workflow waits. A common design bridges the two — a listener catches the notification and turns it into a durable signal:

NOTIFY (fast, transient)  -->  app / listener  -->  df.signal() (durable resume)

The rest of the toolbox

Signals are only one part. pg_durable gives you a full workflow language inside SQL. Here is a short tour, grouped by what each piece is for.

Control flow

Function or operator	What it does
`~>`	Run one step, then the next
`df.join()`, `df.join3()`	Run steps at the same time and wait for all
`df.race()`	Run steps together; the first to finish wins
`df.if()`, `df.if_rows()`	Branch on a condition
`df.loop()`, `df.break()`	Loop while a condition holds, and exit early

Activities (the actual work)

Function	What it does
SQL (auto-wrapped) or `df.sql()`	Run any PostgreSQL query as a durable step
`df.http()`	Call an external API with method, body, headers, and timeout

Results and variables

Syntax	What it does
`$name`, `$name.column`	Reuse a saved result, or one of its columns
`$name?`	Use NULL instead of failing when data is missing
`df.setvar()`, `df.getvar()`	Set and read durable variables

Time and events

Function	What it does
`df.sleep(seconds)`	Pause for a fixed time
`df.wait_for_schedule(cron)`	Wait until a cron schedule matches
`df.wait_for_signal()`, `df.signal()`	Wait for, and send, durable signals
`df.wait_for_completion(id)`	Wait for another workflow to finish

Lifecycle, observability, and security

Area	Functions
Lifecycle	`df.start()` (with a label and target database), `df.cancel(id, reason)`
Observability	`df.status()`, `df.result()`, `df.list_instances()`, `df.instance_nodes()`, `df.metrics()`, `df.explain()`
Security	`df.grant_usage()`, `df.revoke_usage()`, per-user isolation, and running as the submitting user

Built-in durability ties it together: failed SQL and HTTP steps retry automatically, and failed instances stay queryable so you can see what happened.

Where to learn more

This article only scratches the surface. For the full DSL reference and every feature in detail, see the official guide: pg_durable USER_GUIDE.md.

Why orchestration inside the database?#

Get it running in under a minute#

What CREATE EXTENSION actually starts#

Settings you can tune#

Each step is its own transaction#

Durable signals: what LISTEN/NOTIFY cannot do#

The rest of the toolbox#

Control flow#

Activities (the actual work)#

Results and variables#

Time and events#

Lifecycle, observability, and security#

Where to learn more#