MIME-Version: 1.0
References: <CAD=mzVXR3GjM0vcthMBwEdbOKqSKcv8oojSS9coczWRi9BRYTA@mail.gmail.com>
 <abd3bc064d16bc93a2d8661a692903da97d2c154.camel@cybertec.at>
 <CAD=mzVVvK8xk-9m8h3Xu27cGN7BW329HKYdO+0EMXfWvSD3AGA@mail.gmail.com>
 <bed28c629a839b1f354e18f416a87fd5f4f78ba7.camel@cybertec.at>
 <CAD=mzVVqR-mKUFHetsejFWSPQPbLjTVhCmBebJTFX5XmYp+nGg@mail.gmail.com>
 <891bcfec74f7358ef0212caf6565a35153dd2941.camel@cybertec.at>
 <CAD=mzVXRkNM6ATTtnCsZeA0sfD6S_UPU=i6vfMTfoTBuT0pKTw@mail.gmail.com>
 <CAEzWdqdix_ftiUuPJp_LZ3QjB6rDmHVfxtdVMOn+akhMAWEOGw@mail.gmail.com>
 <CAD=mzVX1HuqQuMZx2QCy8ybJCG43zzxY4mqY4pM_gpxKscadBw@mail.gmail.com> <CAKkG4_nLNSDSDd7emC+p4rEicmeCBCBJozzy3QRFvO7BWh3SRA@mail.gmail.com>
In-Reply-To: <CAKkG4_nLNSDSDd7emC+p4rEicmeCBCBJozzy3QRFvO7BWh3SRA@mail.gmail.com>
From: sud <suds1434@gmail.com>
Date: Sun, 26 May 2024 14:45:50 +0530
Message-ID: <CAD=mzVUzDD_XDa+BJ8fH96pEwgTG1NHUvaoyVePLhp3xEN=J9A@mail.gmail.com>
Subject: Re: Long running query causing XID limit breach
To: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= <tfoertsch123@gmail.com>
Cc: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000008d18de061957d923"
Archived-At: <https://www.postgresql.org/message-id/CAD%3DmzVUzDD_XDa%2BBJ8fH96pEwgTG1NHUvaoyVePLhp3xEN%3DJ9A%40mail.gmail.com>
Precedence: bulk

--0000000000008d18de061957d923
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sun, May 26, 2024 at 1:43=E2=80=AFPM Torsten F=C3=B6rtsch <tfoertsch123@=
gmail.com>
wrote:

> On Sat, May 25, 2024 at 11:00=E2=80=AFPM sud <suds1434@gmail.com> wrote:
>
>>
>> But i have one question here , does max_standby_streaming_delay =3D 14 ,
>> means the queries on the standby will get cancelled after 14 seconds?
>>
>
> No, your query gets cancelled when it stalls replication for >14 sec. If
> your master is idle and does not send any WAL and the replica has
> caught up, the query can take as long as it wants.
>

Thank you so much.
For example , in below scenario,
if i have insert query going on on primary instance on table  25th may
partition of TABLE1, and at same time we are selecting data from 24th May
partition , then with "max_standby_streaming_delay =3D 14" setup , it just
allows the select query to run for any duration without any restriction
even if the WAL gets applied on the standby regularly. Also INSERT query in
primary won't make the standby SELECT queries to cancel as because the WAL
record of INSERT queries on the primary instance is not conflicting to the
exact rows those were being read by the standby. Is my understanding
correct here?

However, if i have Update/Delete query going on on primary instance on
table  25th may partition of TABLE1 and on the exact same set of rows which
were being read by the standby instance by the SELECT query, then the
application of such WAL record to standby can max wait for 14 seconds and
thus those select query are prone to be cancelled after 14 seconds. Is this
understanding correct?

If the above is true then it doesn't look good, as because in an OLTP
system there will be a lot of DMLS happening on the writer instances and
there may be many queries running on the reader/standby instances which are
meant to run for hours. And if we say making those SELECT queries run for
hours means compromising an hour of "high availability"/RPO or a lag of an
hour between primary and standby , that doesn't look good. Please
correct me if I am missing something here.

--0000000000008d18de061957d923
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">On Sun, May 26, 2024 at 1:43=E2=80=AFPM T=
orsten F=C3=B6rtsch &lt;<a href=3D"mailto:tfoertsch123@gmail.com">tfoertsch=
123@gmail.com</a>&gt; wrote:<br></div><div class=3D"gmail_quote"><blockquot=
e class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px s=
olid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">O=
n Sat, May 25, 2024 at 11:00=E2=80=AFPM sud &lt;<a href=3D"mailto:suds1434@=
gmail.com" target=3D"_blank">suds1434@gmail.com</a>&gt; wrote:<br></div><di=
v dir=3D"ltr"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pa=
dding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr"><br></div><div class=3D"g=
mail_quote"><div>But i have one question here , does=C2=A0max_standby_strea=
ming_delay =3D 14 , means the queries on the standby will get cancelled aft=
er 14 seconds?</div></div></div></blockquote><div><br></div><div>No, your q=
uery gets cancelled when it stalls replication for &gt;14 sec. If your mast=
er is idle and does not send any WAL and the replica has caught=C2=A0up, th=
e query can take as long as it wants.=C2=A0</div></div></div></div></blockq=
uote><div><br></div><div>Thank you so much.=C2=A0</div><div>For example , i=
n below scenario,</div><div>if i have insert query going on on primary inst=
ance on table=C2=A0 25th may partition of TABLE1, and at same time we are s=
electing data from 24th May partition , then with &quot;max_standby_streami=
ng_delay =3D 14&quot; setup , it just allows the select query to run for an=
y duration without any restriction even if the WAL=C2=A0gets applied on the=
 standby regularly. Also INSERT query in primary won&#39;t=C2=A0make the st=
andby SELECT queries to cancel as because the WAL record of INSERT queries =
on the primary instance is not conflicting to the exact=C2=A0rows those wer=
e being read by the standby. Is my understanding correct here?</div><div><b=
r></div><div><div>However, if i have Update/Delete query going on on primar=
y instance on table=C2=A0 25th may partition of TABLE1 and on the exact sam=
e set of rows which were being read=C2=A0by the standby instance by the SEL=
ECT query, then the application of such=C2=A0WAL=C2=A0record to standby can=
 max wait for 14 seconds and thus those select query are prone to be cancel=
led=C2=A0after 14 seconds. Is this understanding correct?</div><div><br></d=
iv><div>If the above is true then it doesn&#39;t=C2=A0look good, as because=
 in an OLTP system there will be a lot of DMLS happening on the writer inst=
ances and there may be many queries running on the reader/standby instances=
 which are meant to run for hours. And if we say making those SELECT querie=
s run for hours means compromising an hour of &quot;high availability&quot;=
/RPO or a lag of an hour between primary and standby , that doesn&#39;t=C2=
=A0look good. Please correct=C2=A0me if I am missing something here.</div><=
div></div></div><div><br></div></div></div>

--0000000000008d18de061957d923--