Skip to content

Commit 228c370

Browse files
author
Amit Kapila
committed
Preserve conflict-relevant data during logical replication.
Logical replication requires reliable conflict detection to maintain data consistency across nodes. To achieve this, we must prevent premature removal of tuples deleted by other origins and their associated commit_ts data by VACUUM, which could otherwise lead to incorrect conflict reporting and resolution. This patch introduces a mechanism to retain deleted tuples on the subscriber during the application of concurrent transactions from remote nodes. Retaining these tuples allows us to correctly ignore concurrent updates to the same tuple. Without this, an UPDATE might be misinterpreted as an INSERT during resolutions due to the absence of the original tuple. Additionally, we ensure that origin metadata is not prematurely removed by vacuum freeze, which is essential for detecting update_origin_differs and delete_origin_differs conflicts. To support this, a new replication slot named pg_conflict_detection is created and maintained by the launcher on the subscriber. Each apply worker tracks its own non-removable transaction ID, which the launcher aggregates to determine the appropriate xmin for the slot, thereby retaining necessary tuples. Conflict information retention (deleted tuples and commit_ts) can be enabled per subscription via the retain_conflict_info option. This is disabled by default to avoid unnecessary overhead for configurations that do not require conflict resolution or logging. During upgrades, if any subscription on the old cluster has retain_conflict_info enabled, a conflict detection slot will be created to protect relevant tuples from deletion when the new cluster starts. This is a foundational work to correctly detect update_deleted conflict which will be done in a follow-up patch. Author: Zhijie Hou <houzj.fnst@fujitsu.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Nisha Moond <nisha.moond412@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
1 parent 039f7ee commit 228c370

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+2233
-220
lines changed

doc/src/sgml/catalogs.sgml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8082,6 +8082,17 @@ SCRAM-SHA-256$<replaceable>&lt;iteration count&gt;</replaceable>:<replaceable>&l
80828082
</para></entry>
80838083
</row>
80848084

8085+
<row>
8086+
<entry role="catalog_table_entry"><para role="column_definition">
8087+
<structfield>subretaindeadtuples</structfield> <type>bool</type>
8088+
</para>
8089+
<para>
8090+
If true, the information (e.g., dead tuples, commit timestamps, and
8091+
origins) on the subscriber that is useful for conflict detection is
8092+
retained.
8093+
</para></entry>
8094+
</row>
8095+
80858096
<row>
80868097
<entry role="catalog_table_entry"><para role="column_definition">
80878098
<structfield>subconninfo</structfield> <type>text</type>

doc/src/sgml/config.sgml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4965,6 +4965,8 @@ ANY <replaceable class="parameter">num_sync</replaceable> ( <replaceable class="
49654965
new setting.
49664966
This setting has no effect if <varname>primary_conninfo</varname> is not
49674967
set or the server is not in standby mode.
4968+
The name cannot be <literal>pg_conflict_detection</literal> as it is
4969+
reserved for the conflict detection slot.
49684970
</para>
49694971
</listitem>
49704972
</varlistentry>

doc/src/sgml/func.sgml

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -29592,7 +29592,9 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2959229592
</para>
2959329593
<para>
2959429594
Creates a new physical replication slot named
29595-
<parameter>slot_name</parameter>. The optional second parameter,
29595+
<parameter>slot_name</parameter>. The name cannot be
29596+
<literal>pg_conflict_detection</literal> as it is reserved for the
29597+
conflict detection slot. The optional second parameter,
2959629598
when <literal>true</literal>, specifies that the <acronym>LSN</acronym> for this
2959729599
replication slot be reserved immediately; otherwise
2959829600
the <acronym>LSN</acronym> is reserved on first connection from a streaming
@@ -29636,7 +29638,9 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2963629638
<para>
2963729639
Creates a new logical (decoding) replication slot named
2963829640
<parameter>slot_name</parameter> using the output plugin
29639-
<parameter>plugin</parameter>. The optional third
29641+
<parameter>plugin</parameter>. The name cannot be
29642+
<literal>pg_conflict_detection</literal> as it is reserved for
29643+
the conflict detection slot. The optional third
2964029644
parameter, <parameter>temporary</parameter>, when set to true, specifies that
2964129645
the slot should not be permanently stored to disk and is only meant
2964229646
for use by the current session. Temporary slots are also
@@ -29666,6 +29670,8 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2966629670
<para>
2966729671
Copies an existing physical replication slot named <parameter>src_slot_name</parameter>
2966829672
to a physical replication slot named <parameter>dst_slot_name</parameter>.
29673+
The new slot name cannot be <literal>pg_conflict_detection</literal>,
29674+
as it is reserved for the conflict detection.
2966929675
The copied physical slot starts to reserve WAL from the same <acronym>LSN</acronym> as the
2967029676
source slot.
2967129677
<parameter>temporary</parameter> is optional. If <parameter>temporary</parameter>
@@ -29688,8 +29694,10 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset
2968829694
Copies an existing logical replication slot
2968929695
named <parameter>src_slot_name</parameter> to a logical replication
2969029696
slot named <parameter>dst_slot_name</parameter>, optionally changing
29691-
the output plugin and persistence. The copied logical slot starts
29692-
from the same <acronym>LSN</acronym> as the source logical slot. Both
29697+
the output plugin and persistence. The new slot name cannot be
29698+
<literal>pg_conflict_detection</literal> as it is reserved for
29699+
the conflict detection. The copied logical slot starts from the same
29700+
<acronym>LSN</acronym> as the source logical slot. Both
2969329701
<parameter>temporary</parameter> and <parameter>plugin</parameter> are
2969429702
optional; if they are omitted, the values of the source slot are used.
2969529703
The <literal>failover</literal> option of the source logical slot

doc/src/sgml/logical-replication.sgml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2396,6 +2396,12 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER
23962396
the subscriber, plus some reserve for table synchronization.
23972397
</para>
23982398

2399+
<para>
2400+
<link linkend="guc-max-replication-slots"><varname>max_replication_slots</varname></link>
2401+
must be set to at least 1 when <link linkend="sql-createsubscription-params-with-retain-dead-tuples"><literal>retain_dead_tuples</literal></link>
2402+
is enabled for any subscription.
2403+
</para>
2404+
23992405
<para>
24002406
<link linkend="guc-max-logical-replication-workers"><varname>max_logical_replication_workers</varname></link>
24012407
must be set to at least the number of subscriptions (for leader apply
@@ -2532,6 +2538,22 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER
25322538
dependencies on clusters before version 17.0 will silently be ignored.
25332539
</para>
25342540

2541+
<note>
2542+
<para>
2543+
Commit timestamps and origin data are not preserved during the upgrade.
2544+
As a result, even if
2545+
<link linkend="sql-createsubscription-params-with-retain-dead-tuples"><literal>retain_dead_tuples</literal></link>
2546+
is enabled, the upgraded subscriber may be unable to detect conflicts or
2547+
log relevant commit timestamps and origins when applying changes from the
2548+
publisher occurred before the upgrade. Additionally, immediately after the
2549+
upgrade, the vacuum may remove the deleted rows that are required for
2550+
conflict detection. This can affect the changes that were not replicated
2551+
before the upgrade. To ensure consistent conflict tracking, users should
2552+
ensure that all potentially conflicting changes are replicated to the
2553+
subscriber before initiating the upgrade.
2554+
</para>
2555+
</note>
2556+
25352557
<para>
25362558
There are some prerequisites for <application>pg_upgrade</application> to
25372559
be able to upgrade the subscriptions. If these are not met an error
@@ -2563,6 +2585,16 @@ CONTEXT: processing remote data for replication origin "pg_16395" during "INSER
25632585
subscriptions present in the old cluster.
25642586
</para>
25652587
</listitem>
2588+
<listitem>
2589+
<para>
2590+
If there are subscriptions with retain_dead_tuples enabled, the reserved
2591+
replication slot <quote><literal>pg_conflict_detection</literal></quote>
2592+
must not exist on the new cluster. Additionally, the
2593+
<link linkend="guc-wal-level"><varname>wal_level</varname></link> on the
2594+
new cluster must be set to <literal>replica</literal> or
2595+
<literal>logical</literal>.
2596+
</para>
2597+
</listitem>
25662598
</itemizedlist>
25672599
</sect2>
25682600

doc/src/sgml/protocol.sgml

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2235,6 +2235,8 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
22352235
<para>
22362236
The name of the slot to create. Must be a valid replication slot
22372237
name (see <xref linkend="streaming-replication-slots-manipulation"/>).
2238+
The name cannot be <literal>pg_conflict_detection</literal> as it
2239+
is reserved for the conflict detection.
22382240
</para>
22392241
</listitem>
22402242
</varlistentry>
@@ -2653,6 +2655,65 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
26532655
</variablelist>
26542656
</listitem>
26552657
</varlistentry>
2658+
2659+
<varlistentry id="protocol-replication-primary-status-update">
2660+
<term>Primary status update (B)</term>
2661+
<listitem>
2662+
<variablelist>
2663+
<varlistentry>
2664+
<term>Byte1('s')</term>
2665+
<listitem>
2666+
<para>
2667+
Identifies the message as a primary status update.
2668+
</para>
2669+
</listitem>
2670+
</varlistentry>
2671+
2672+
<varlistentry>
2673+
<term>Int64</term>
2674+
<listitem>
2675+
<para>
2676+
The latest WAL write position on the server.
2677+
</para>
2678+
</listitem>
2679+
</varlistentry>
2680+
2681+
<varlistentry>
2682+
<term>Int64</term>
2683+
<listitem>
2684+
<para>
2685+
The oldest transaction ID that is currently in the commit phase on
2686+
the server, along with its epoch. The most significant 32 bits are
2687+
the epoch. The least significant 32 bits are the transaction ID.
2688+
If no transactions are active on the server, this number will be
2689+
the next transaction ID to be assigned.
2690+
</para>
2691+
</listitem>
2692+
</varlistentry>
2693+
2694+
<varlistentry>
2695+
<term>Int64</term>
2696+
<listitem>
2697+
<para>
2698+
The next transaction ID to be assigned on the server, along with
2699+
its epoch. The most significant 32 bits are the epoch. The least
2700+
significant 32 bits are the transaction ID.
2701+
</para>
2702+
</listitem>
2703+
</varlistentry>
2704+
2705+
<varlistentry>
2706+
<term>Int64</term>
2707+
<listitem>
2708+
<para>
2709+
The server's system clock at the time of transmission, as
2710+
microseconds since midnight on 2000-01-01.
2711+
</para>
2712+
</listitem>
2713+
</varlistentry>
2714+
</variablelist>
2715+
</listitem>
2716+
</varlistentry>
26562717
</variablelist>
26572718

26582719
<para>
@@ -2797,6 +2858,33 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
27972858
</variablelist>
27982859
</listitem>
27992860
</varlistentry>
2861+
2862+
<varlistentry id="protocol-replication-standby-wal-status-request">
2863+
<term>Request primary status update (F)</term>
2864+
<listitem>
2865+
<variablelist>
2866+
<varlistentry>
2867+
<term>Byte1('p')</term>
2868+
<listitem>
2869+
<para>
2870+
Identifies the message as a request for a primary status update.
2871+
</para>
2872+
</listitem>
2873+
</varlistentry>
2874+
2875+
<varlistentry>
2876+
<term>Int64</term>
2877+
<listitem>
2878+
<para>
2879+
The client's system clock at the time of transmission, as
2880+
microseconds since midnight on 2000-01-01.
2881+
</para>
2882+
</listitem>
2883+
</varlistentry>
2884+
</variablelist>
2885+
</listitem>
2886+
</varlistentry>
2887+
28002888
</variablelist>
28012889
</listitem>
28022890
</varlistentry>

doc/src/sgml/ref/alter_subscription.sgml

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -235,8 +235,9 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
235235
<link linkend="sql-createsubscription-params-with-password-required"><literal>password_required</literal></link>,
236236
<link linkend="sql-createsubscription-params-with-run-as-owner"><literal>run_as_owner</literal></link>,
237237
<link linkend="sql-createsubscription-params-with-origin"><literal>origin</literal></link>,
238-
<link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>, and
239-
<link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>.
238+
<link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>,
239+
<link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>, and
240+
<link linkend="sql-createsubscription-params-with-retain-dead-tuples"><literal>retain_dead_tuples</literal></link>.
240241
Only a superuser can set <literal>password_required = false</literal>.
241242
</para>
242243

@@ -261,8 +262,9 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
261262
</para>
262263

263264
<para>
264-
The <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>
265-
and <link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>
265+
The <link linkend="sql-createsubscription-params-with-failover"><literal>failover</literal></link>,
266+
<link linkend="sql-createsubscription-params-with-two-phase"><literal>two_phase</literal></link>, and
267+
<link linkend="sql-createsubscription-params-with-retain-dead-tuples"><literal>retain_dead_tuples</literal></link>
266268
parameters can only be altered when the subscription is disabled.
267269
</para>
268270

@@ -285,6 +287,14 @@ ALTER SUBSCRIPTION <replaceable class="parameter">name</replaceable> RENAME TO <
285287
option is changed from <literal>true</literal> to <literal>false</literal>,
286288
the publisher will replicate the transactions again when they are committed.
287289
</para>
290+
291+
<para>
292+
If the <link linkend="sql-createsubscription-params-with-retain-dead-tuples"><literal>retain_dead_tuples</literal></link>
293+
option is altered to <literal>false</literal> and no other subscription
294+
has this option enabled, the replication slot named
295+
<quote><literal>pg_conflict_detection</literal></quote>, created to retain
296+
dead tuples for conflict detection, will be dropped.
297+
</para>
288298
</listitem>
289299
</varlistentry>
290300

doc/src/sgml/ref/create_subscription.sgml

Lines changed: 86 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,7 +169,9 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
169169
<listitem>
170170
<para>
171171
Name of the publisher's replication slot to use. The default is
172-
to use the name of the subscription for the slot name.
172+
to use the name of the subscription for the slot name. The name cannot
173+
be <literal>pg_conflict_detection</literal> as it is reserved for the
174+
conflict detection.
173175
</para>
174176

175177
<para>
@@ -435,6 +437,89 @@ CREATE SUBSCRIPTION <replaceable class="parameter">subscription_name</replaceabl
435437
</para>
436438
</listitem>
437439
</varlistentry>
440+
441+
<varlistentry id="sql-createsubscription-params-with-retain-dead-tuples">
442+
<term><literal>retain_dead_tuples</literal> (<type>boolean</type>)</term>
443+
<listitem>
444+
<para>
445+
Specifies whether the information (e.g., dead tuples, commit
446+
timestamps, and origins) required for conflict detection on the
447+
subscriber is retained. The default is <literal>false</literal>.
448+
If set to <literal>true</literal>, a physical replication slot named
449+
<quote><literal>pg_conflict_detection</literal></quote> will be
450+
created on the subscriber to prevent the conflict information from
451+
being removed.
452+
</para>
453+
454+
<para>
455+
Note that the information useful for conflict detection is retained
456+
only after the creation of the slot. You can verify the existence of
457+
this slot by querying <link linkend="view-pg-replication-slots">pg_replication_slots</link>.
458+
And even if multiple subscriptions on one node enable this option,
459+
only one replication slot will be created. Also,
460+
<varname>wal_level</varname> must be set to <literal>replica</literal>
461+
or higher to allow the replication slot to be used.
462+
</para>
463+
464+
<caution>
465+
<para>
466+
Note that the information for conflict detection cannot be purged if
467+
the subscription is disabled; thus, the information will accumulate
468+
until the subscription is enabled. To prevent excessive accumulation,
469+
it is recommended to disable <literal>retain_dead_tuples</literal>
470+
if the subscription will be inactive for an extended period.
471+
</para>
472+
473+
<para>
474+
Additionally when enabling <literal>retain_dead_tuples</literal> for
475+
conflict detection in logical replication, it is important to design the
476+
replication topology to balance data retention requirements with
477+
overall system performance. This option provides minimal performance
478+
overhead when applied appropriately. The following scenarios illustrate
479+
effective usage patterns when enabling this option.
480+
</para>
481+
482+
<para>
483+
a. Large Tables with Bidirectional Writes:
484+
For large tables subject to concurrent writes on both publisher and
485+
subscriber nodes, publishers can define row filters when creating
486+
publications to segment data. This allows multiple subscriptions
487+
to replicate exclusive subsets of the table in parallel, optimizing
488+
the throughput.
489+
</para>
490+
491+
<para>
492+
b. Write-Enabled Subscribers:
493+
If a subscriber node is expected to perform write operations, replication
494+
can be structured using multiple publications and subscriptions. By
495+
distributing tables across these publications, the workload is spread among
496+
several apply workers, improving concurrency and reducing contention.
497+
</para>
498+
499+
<para>
500+
c. Read-Only Subscribers:
501+
In configurations involving single or multiple publisher nodes
502+
performing concurrent write operations, read-only subscriber nodes may
503+
replicate changes without seeing a performance impact if it does index
504+
scan. However, if the subscriber is impacted due to replication lag or
505+
scan performance (say due to sequential scans), it needs to follow one
506+
of the two previous strategies to distribute the workload on the
507+
subscriber.
508+
</para>
509+
</caution>
510+
511+
<para>
512+
This option cannot be enabled if the publisher is a physical standby.
513+
</para>
514+
515+
<para>
516+
Enabling this option ensures retention of information useful for
517+
conflict detection solely for changes occurring locally on the
518+
publisher. For the changes originating from different origins,
519+
reliable conflict detection cannot be guaranteed.
520+
</para>
521+
</listitem>
522+
</varlistentry>
438523
</variablelist></para>
439524

440525
</listitem>

0 commit comments

Comments
 (0)