MIME-Version: 1.0
References: <CAD=mzVXR3GjM0vcthMBwEdbOKqSKcv8oojSS9coczWRi9BRYTA@mail.gmail.com>
 <abd3bc064d16bc93a2d8661a692903da97d2c154.camel@cybertec.at>
 <CAD=mzVVvK8xk-9m8h3Xu27cGN7BW329HKYdO+0EMXfWvSD3AGA@mail.gmail.com>
 <bed28c629a839b1f354e18f416a87fd5f4f78ba7.camel@cybertec.at>
 <CAD=mzVVqR-mKUFHetsejFWSPQPbLjTVhCmBebJTFX5XmYp+nGg@mail.gmail.com>
 <891bcfec74f7358ef0212caf6565a35153dd2941.camel@cybertec.at>
 <CAD=mzVXRkNM6ATTtnCsZeA0sfD6S_UPU=i6vfMTfoTBuT0pKTw@mail.gmail.com>
 <CAEzWdqdix_ftiUuPJp_LZ3QjB6rDmHVfxtdVMOn+akhMAWEOGw@mail.gmail.com>
 <d18f56f9e8aec98b981ade94f300ec7473ec0cce.camel@cybertec.at>
 <CAEzWdqdPnErdeg6xe=zf7aF-fGy0Z42vXEm6zE6Ok25o=f6a7Q@mail.gmail.com>
 <56ad97911d83f721dd872e8ee68cd77d50d3eef6.camel@cybertec.at>
 <CAD=mzVU7Ry7xhZ=Kra4N87ugvAUubwGFqnLtXbcvy8yJasOVPQ@mail.gmail.com>
 <CAPOUM=cxpEaN9kSHnBAQFuiMKJ7iyD7+u4wS5djY-ZWRpo_Log@mail.gmail.com>
 <CAD=mzVVhE4Y637XOkqK1yMzP6U2aRhBi5xTcEZw6_Ppppb6mqA@mail.gmail.com>
 <c1d238173bf8ba94018b65beb4e1670f0d37f766.camel@cybertec.at>
 <CAEzWdqfzp7+buzXkpKnHqVpq=a2kaE33fWD-nXVeEdGiTBwGEQ@mail.gmail.com> <CAD=mzVWy=0HtzkjtqqCZktRDA7w0AL+mVW5+dz_E1EHqiyRYkA@mail.gmail.com>
In-Reply-To: <CAD=mzVWy=0HtzkjtqqCZktRDA7w0AL+mVW5+dz_E1EHqiyRYkA@mail.gmail.com>
From: yudhi s <learnerdatabase99@gmail.com>
Date: Thu, 13 Jun 2024 13:13:06 +0530
Message-ID: <CAEzWdqfF3Vk4OYE6DPP241rHV1D27vT82VjLu_Qju1bX_PiSdw@mail.gmail.com>
Subject: Re: Long running query causing XID limit breach
To: sud <suds1434@gmail.com>, Laurenz Albe <laurenz.albe@cybertec.at>
Cc: Simon Elbaz <elbazsimon9@gmail.com>, 
	pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000f846b1061ac0a6f2"
Archived-At: <https://www.postgresql.org/message-id/CAEzWdqfF3Vk4OYE6DPP241rHV1D27vT82VjLu_Qju1bX_PiSdw%40mail.gmail.com>
Precedence: bulk

--000000000000f846b1061ac0a6f2
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Sat, Jun 8, 2024 at 2:51=E2=80=AFPM sud <suds1434@gmail.com> wrote:

>
> Thank You so much Laurenz and Yudhi.
>
> Yes its RDS and as you mentioned there does exist a space limitation of
> ~64TB but as Laurenz mentioned the only time the second standby may crash
> would be probably because of  the storage space saturation and thus we ne=
ed
> to have appropriate monitoring in place to find this and get alerted
> beforehand. And also a monitoring to see how much WAL gets generated per
> hour/day to get an idea of the usage. I am not sure how to do it , but wi=
ll
> check on this.
>


Not exactly related but just for our information, While going through the
"aurora postgres" database docs in regards to similar concepts which are
getting discussed here, I am finding some interesting stuff.

https://aws.amazon.com/blogs/database/manage-long-running-read-queries-on-a=
mazon-aurora-postgresql-compatible-edition/


*Cancel the conflicting query on the reader node if the conflict lasts
longer than max_standby_streaming_delay (maximum 30 seconds). This is
different from Amazon RDS or self-managed PostgreSQL. With Amazon RDS or
self-managed PostgreSQL, the instance has its own physical copy of the
database, and you=E2=80=99re able to set the parameter max_standby_streamin=
g_delay
as high as you want to prevent query cancellation.If the conflicting query
can=E2=80=99t cancel in time, or if multiple long-running queries are causi=
ng the
replication lag to go beyond 60 seconds, Aurora restarts the reader node to
ensure it=E2=80=99s not lagging far behind the primary node.*

So if i get it correct it means, even if hot_standby_feedback is set to OFF=
,
the constraints of max_standby_streaming_delay (30 seconds) and the
60-second replication lag limit applies. And thus Aurora may cancel
long-running queries or restart reader nodes to maintain synchronization
even if it just runs for >60seconds.  So it's really odd but does that mean
, by no way you can guarantee a query to run >60 seconds on read replica in
aurora postgres?

--000000000000f846b1061ac0a6f2
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div><br></div></div><div class=3D"gmail_=
quote"><div dir=3D"ltr" class=3D"gmail_attr">On Sat, Jun 8, 2024 at 2:51=E2=
=80=AFPM sud &lt;<a href=3D"mailto:suds1434@gmail.com">suds1434@gmail.com</=
a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0p=
x 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><d=
iv dir=3D"ltr"><div dir=3D"ltr"></div><div class=3D"gmail_quote"><div><br><=
/div><div>Thank You so much Laurenz and Yudhi.</div><div><br></div><div>Yes=
 its RDS and as you mentioned there does exist=C2=A0a space limitation of ~=
64TB but as Laurenz mentioned the only time the second standby may crash wo=
uld be probably because of=C2=A0 the storage=C2=A0space saturation and thus=
 we need to have appropriate monitoring in place to find this and get alert=
ed beforehand. And also a monitoring to see how much WAL gets generated=C2=
=A0per hour/day to get an idea of the usage. I am not sure how to do it , b=
ut will check on this.</div></div></div></blockquote><div><br></div>=C2=A0<=
/div><div class=3D"gmail_quote">Not exactly related but just for our inform=
ation, While going through the &quot;aurora postgres&quot; database docs in=
 regards to similar concepts which are getting discussed here, I am finding=
 some interesting=C2=A0stuff.<div><br></div><div><a href=3D"https://aws.ama=
zon.com/blogs/database/manage-long-running-read-queries-on-amazon-aurora-po=
stgresql-compatible-edition/">https://aws.amazon.com/blogs/database/manage-=
long-running-read-queries-on-amazon-aurora-postgresql-compatible-edition/</=
a><br></div><div><br></div><div><i>Cancel the conflicting query on the read=
er node if the conflict lasts longer than max_standby_streaming_delay (maxi=
mum 30 seconds). This is different from Amazon RDS or self-managed PostgreS=
QL. With Amazon RDS or self-managed PostgreSQL, the instance has its own ph=
ysical copy of the database, and you=E2=80=99re able to set the parameter m=
ax_standby_streaming_delay as high as you want to prevent query cancellatio=
n.<br><br>If the conflicting query can=E2=80=99t cancel in time, or if mult=
iple long-running queries are causing the replication lag to go beyond 60 s=
econds, Aurora restarts the reader node to ensure it=E2=80=99s not lagging =
far behind the primary node.<br></i></div><div><br></div><div>So if i get i=
t correct it means, even if <code>hot_standby_feedback</code> is set to <co=
de>OFF</code>, the constraints of <code>max_standby_streaming_delay</code> =
(30 seconds) and the 60-second replication lag limit applies. And thus Auro=
ra may cancel long-running queries or restart reader nodes to maintain sync=
hronization even if it just=C2=A0runs for &gt;60seconds.=C2=A0 So it&#39;s =
really odd but does that mean , by no way=C2=A0you can guarantee a query to=
 run &gt;60 seconds on read replica in aurora postgres?<br></div><div>=C2=
=A0</div></div></div>

--000000000000f846b1061ac0a6f2--