Re: Long running query causing XID limit breach

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Torsten Förtsch <[email protected]>
To: sud <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: Long running query causing XID limit breach
Date: Sun, 26 May 2024 19:48:25 +0200
Message-ID: <CAKkG4_npZ8Mn-KeT2=vyOgbr3B37bL5ZfbA5y=Yv3RwhBioHXQ@mail.gmail.com> (raw)
In-Reply-To: <CAD=mzVUzDD_XDa+BJ8fH96pEwgTG1NHUvaoyVePLhp3xEN=J9A@mail.gmail.com>
References: <CAD=mzVXR3GjM0vcthMBwEdbOKqSKcv8oojSS9coczWRi9BRYTA@mail.gmail.com>
	<[email protected]>
	<CAD=mzVVvK8xk-9m8h3Xu27cGN7BW329HKYdO+0EMXfWvSD3AGA@mail.gmail.com>
	<[email protected]>
	<CAD=mzVVqR-mKUFHetsejFWSPQPbLjTVhCmBebJTFX5XmYp+nGg@mail.gmail.com>
	<[email protected]>
	<CAD=mzVXRkNM6ATTtnCsZeA0sfD6S_UPU=i6vfMTfoTBuT0pKTw@mail.gmail.com>
	<CAEzWdqdix_ftiUuPJp_LZ3QjB6rDmHVfxtdVMOn+akhMAWEOGw@mail.gmail.com>
	<CAD=mzVX1HuqQuMZx2QCy8ybJCG43zzxY4mqY4pM_gpxKscadBw@mail.gmail.com>
	<CAKkG4_nLNSDSDd7emC+p4rEicmeCBCBJozzy3QRFvO7BWh3SRA@mail.gmail.com>
	<CAD=mzVUzDD_XDa+BJ8fH96pEwgTG1NHUvaoyVePLhp3xEN=J9A@mail.gmail.com>

On Sun, May 26, 2024 at 11:16 AM sud <[email protected]> wrote:

> On Sun, May 26, 2024 at 1:43 PM Torsten Förtsch <[email protected]>
> wrote:
>
>> On Sat, May 25, 2024 at 11:00 PM sud <[email protected]> wrote:
>>
>>>
>>> But i have one question here , does max_standby_streaming_delay = 14 ,
>>> means the queries on the standby will get cancelled after 14 seconds?
>>>
>>
>> No, your query gets cancelled when it stalls replication for >14 sec. If
>> your master is idle and does not send any WAL and the replica has
>> caught up, the query can take as long as it wants.
>>
>
> Thank you so much.
> For example , in below scenario,
> if i have insert query going on on primary instance on table  25th may
> partition of TABLE1, and at same time we are selecting data from 24th May
> partition , then with "max_standby_streaming_delay = 14" setup , it just
> allows the select query to run for any duration without any restriction
> even if the WAL gets applied on the standby regularly. Also INSERT query in
> primary won't make the standby SELECT queries to cancel as because the WAL
> record of INSERT queries on the primary instance is not conflicting to the
> exact rows those were being read by the standby. Is my understanding
> correct here?
>
> However, if i have Update/Delete query going on on primary instance on
> table  25th may partition of TABLE1 and on the exact same set of rows which
> were being read by the standby instance by the SELECT query, then the
> application of such WAL record to standby can max wait for 14 seconds and
> thus those select query are prone to be cancelled after 14 seconds. Is this
> understanding correct?
>
> If the above is true then it doesn't look good, as because in an OLTP
> system there will be a lot of DMLS happening on the writer instances and
> there may be many queries running on the reader/standby instances which are
> meant to run for hours. And if we say making those SELECT queries run for
> hours means compromising an hour of "high availability"/RPO or a lag of an
> hour between primary and standby , that doesn't look good. Please
> correct me if I am missing something here.
>

Each query on the replica has a backend_xmin. You can see that in
pg_stat_activity. From that backend's perspective, tuples marked as deleted
by any transaction greater or equal to backend_xmin are still needed. This
does not depend on the table.

Now, vacuum writes to the WAL up to which point it has vacuumed on the
master. In pg_waldump this looks like so:

PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0,
redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel
1663/5/16430 blk 0

That snapshotConflictHorizon is also a transaction id. If the backend_xmin
of all backends running transactions in the same database (the 5 in 16
63/5/16430) -as the vacuum WAL record is greater than vacuum's
snapshotConflictHorizon, then there is no conflict. If any of the
backend_xmin's is less, then there is a conflict.

This type of conflict is determined by just 2 numbers, the conflict horizon
sent by the master in the WAL, and the minimum of all backend_xmins. For
your case this means a long running transaction querying table t1 might
have a backend_xmin of 223. On the master update and delete operations
happen on table T2. Since all the transactions on the master are fast, when
vacuum hits T2, the minimum of all backend_xmins on the master might
already be 425. So, garbage left over by all transactions up to 424 can be
cleaned up. Now that cleanup record reaches the replica. It compares
223>425 which is false. So, there is a conflict. Now the replica can wait
until its own horizon reaches 425 or it can kill all backends with a lower
backend_xmin.

As I understand, hot_standby_feedback does not work for you. Not sure if
you can run the query on the master? That would resolve the issues but
might generate the same bloat on the master as hot_standby_feedback.
Another option I can see is to run long running queries on a dedicated
replica with max_standby_streaming_delay set to infinity or something large
enough. If you go that way, you could also fetch the WAL from your
WAL archive instead of replicating from the master. That way the replica
has absolutely no chance to affect the master.

Good Luck!
Torsten

view thread (24+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Long running query causing XID limit breach
  In-Reply-To: <CAKkG4_npZ8Mn-KeT2=vyOgbr3B37bL5ZfbA5y=Yv3RwhBioHXQ@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox