MIME-Version: 1.0
References: <CAD=mzVXR3GjM0vcthMBwEdbOKqSKcv8oojSS9coczWRi9BRYTA@mail.gmail.com>
 <abd3bc064d16bc93a2d8661a692903da97d2c154.camel@cybertec.at>
 <CAD=mzVVvK8xk-9m8h3Xu27cGN7BW329HKYdO+0EMXfWvSD3AGA@mail.gmail.com>
 <bed28c629a839b1f354e18f416a87fd5f4f78ba7.camel@cybertec.at>
 <CAD=mzVVqR-mKUFHetsejFWSPQPbLjTVhCmBebJTFX5XmYp+nGg@mail.gmail.com>
 <891bcfec74f7358ef0212caf6565a35153dd2941.camel@cybertec.at>
 <CAD=mzVXRkNM6ATTtnCsZeA0sfD6S_UPU=i6vfMTfoTBuT0pKTw@mail.gmail.com>
 <CAEzWdqdix_ftiUuPJp_LZ3QjB6rDmHVfxtdVMOn+akhMAWEOGw@mail.gmail.com>
 <CAD=mzVX1HuqQuMZx2QCy8ybJCG43zzxY4mqY4pM_gpxKscadBw@mail.gmail.com>
 <CAKkG4_nLNSDSDd7emC+p4rEicmeCBCBJozzy3QRFvO7BWh3SRA@mail.gmail.com>
 <CAD=mzVUzDD_XDa+BJ8fH96pEwgTG1NHUvaoyVePLhp3xEN=J9A@mail.gmail.com> <CAKkG4_npZ8Mn-KeT2=vyOgbr3B37bL5ZfbA5y=Yv3RwhBioHXQ@mail.gmail.com>
In-Reply-To: <CAKkG4_npZ8Mn-KeT2=vyOgbr3B37bL5ZfbA5y=Yv3RwhBioHXQ@mail.gmail.com>
From: sud <suds1434@gmail.com>
Date: Mon, 27 May 2024 00:16:40 +0530
Message-ID: <CAD=mzVUGzU3vEMp4AY17vG1xDkVjOCUhuXnLt3NnAB9jhSyfRA@mail.gmail.com>
Subject: Re: Long running query causing XID limit breach
To: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= <tfoertsch123@gmail.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000f833f106195fd2ee"
Archived-At: <https://www.postgresql.org/message-id/CAD%3DmzVUGzU3vEMp4AY17vG1xDkVjOCUhuXnLt3NnAB9jhSyfRA%40mail.gmail.com>
Precedence: bulk

--000000000000f833f106195fd2ee
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, May 26, 2024 at 11:18=E2=80=AFPM Torsten F=C3=B6rtsch <tfoertsch123=
@gmail.com>
wrote:

> Each query on the replica has a backend_xmin. You can see that in
> pg_stat_activity. From that backend's perspective, tuples marked as delet=
ed
> by any transaction greater or equal to backend_xmin are still needed. Thi=
s
> does not depend on the table.
>
> Now, vacuum writes to the WAL up to which point it has vacuumed on the
> master. In pg_waldump this looks like so:
>
> PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0,
> redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel
> 1663/5/16430 blk 0
>
> That snapshotConflictHorizon is also a transaction id. If the backend_xmi=
n
> of all backends running transactions in the same database (the 5 in 16
> 63/5/16430) -as the vacuum WAL record is greater than vacuum's
> snapshotConflictHorizon, then there is no conflict. If any of the
> backend_xmin's is less, then there is a conflict.
>
> This type of conflict is determined by just 2 numbers, the conflict
> horizon sent by the master in the WAL, and the minimum of all
> backend_xmins. For your case this means a long running transaction queryi=
ng
> table t1 might have a backend_xmin of 223. On the master update and delet=
e
> operations happen on table T2. Since all the transactions on the master a=
re
> fast, when vacuum hits T2, the minimum of all backend_xmins on the master
> might already be 425. So, garbage left over by all transactions up to 424
> can be cleaned up. Now that cleanup record reaches the replica. It compar=
es
> 223>425 which is false. So, there is a conflict. Now the replica can wait
> until its own horizon reaches 425 or it can kill all backends with a lowe=
r
> backend_xmin.
>
> As I understand, hot_standby_feedback does not work for you. Not sure if
> you can run the query on the master? That would resolve the issues but
> might generate the same bloat on the master as hot_standby_feedback.
> Another option I can see is to run long running queries on a dedicated
> replica with max_standby_streaming_delay set to infinity or something lar=
ge
> enough. If you go that way, you could also fetch the WAL from your
> WAL archive instead of replicating from the master. That way the replica
> has absolutely no chance to affect the master.
>
>
Thank you so much.

Would you agree that we should have two standby, one with default
max_standby_streaming_delay (say 10 sec ) which will be mainly used as high
availability and thus will be having minimal lag. and another standby with
max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the
SELECT queries to finish without caring about the lag, which will be
utilized for the long running SELECT queries.

And keep the hot_standby_feedback as ON for the first standby which is used
as HA/high availability. And keep the hot_standby_feedback as OFF for the
second standby which is utilized for long running SELECT queries, so that
primary won't be waiting for the response/feedback from this standby to
vacuum its old transactions and that will keep the transaction id wrap
around issue from not happening because of the Read/Select queries on any
of the standby.

--000000000000f833f106195fd2ee
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Sun, May 26, 2024 at 11:18=E2=80=AFPM =
Torsten F=C3=B6rtsch &lt;<a href=3D"mailto:tfoertsch123@gmail.com">tfoertsc=
h123@gmail.com</a>&gt; wrote:<br></div><div class=3D"gmail_quote"><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px =
solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">=
Each query on the replica has a backend_xmin. You can see that in pg_stat_a=
ctivity. From that backend&#39;s perspective, tuples marked as deleted by a=
ny transaction greater or equal to backend_xmin are still needed. This does=
 not depend on the table.<br></div><div class=3D"gmail_quote"><div><br></di=
v><div>Now, vacuum writes to the WAL up to which point it has vacuumed on t=
he master. In pg_waldump this looks like so:</div><div><br></div><div><span=
 style=3D"font-family:monospace"><span style=3D"color:rgb(0,0,0)">PRUNE sna=
pshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0, redirected=
: [], dead: [2, 4, 6, 8, 10], u</span>nused: [], blkref #0: rel 1663/5/1643=
0 blk 0<br></span></div><div><br></div><div>That snapshotConflictHorizon is=
 also a transaction id. If the backend_xmin of all backends running transac=
tions in the same database (the 5 in=C2=A0<span style=3D"font-family:monosp=
ace;color:rgb(0,0,0)">16</span><span style=3D"font-family:monospace">63/5/1=
6430</span>) -as the vacuum WAL record is greater than vacuum&#39;s snapsho=
tConflictHorizon, then there is no conflict. If any of the backend_xmin&#39=
;s is less, then there is a conflict.</div><div><br></div><div>This type of=
 conflict is determined by just 2 numbers, the conflict horizon sent by the=
 master in the WAL, and the minimum of all backend_xmins. For your case thi=
s means a long running transaction querying table t1 might have a backend_x=
min of 223. On the master update and delete operations happen on table T2. =
Since all the transactions on the master are fast, when vacuum hits T2, the=
 minimum of all backend_xmins on the master might already be 425. So, garba=
ge left over by all transactions up to 424 can be cleaned up. Now that clea=
nup record reaches the replica. It compares 223&gt;425 which is false. So, =
there is a conflict. Now the replica can wait until its own horizon reaches=
 425 or it can kill all backends with a lower backend_xmin.</div><div><br><=
/div><div>As I understand, hot_standby_feedback does not work for you. Not =
sure if you can run the query on the master? That would resolve the issues =
but might generate the same bloat on the master as hot_standby_feedback. An=
other option I can see is to run long running queries on a dedicated replic=
a with max_standby_streaming_delay set to infinity or something large enoug=
h. If you go that way, you could also fetch the WAL from your WAL=C2=A0arch=
ive instead of replicating from the master. That way the replica has absolu=
tely no chance to affect the master.</div><div><br></div></div></div></bloc=
kquote><div><br></div><div>Thank you so much.</div><div><br></div>Would you=
 agree that we should have two standby, one with default max_standby_stream=
ing_delay (say 10 sec ) which will be mainly used as high availability and =
thus will be having minimal lag. and another standby with max_standby_strea=
ming_delay as &quot;-1&quot; i.e. it will wait indefinitely for the SELECT =
queries to finish without caring about the lag, which will be utilized for =
the long running SELECT queries. <br><br><div>And keep the hot_standby_feed=
back as ON for the first standby which is used as HA/high availability. And=
 keep the hot_standby_feedback as OFF for the second standby which is utili=
zed for long running SELECT queries, so that primary won&#39;t be waiting f=
or the response/feedback from this standby to vacuum its old transactions a=
nd that will keep the transaction id wrap around issue from not happening b=
ecause of the Read/Select queries on any of the standby.=C2=A0</div></div><=
/div>

--000000000000f833f106195fd2ee--