Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sAyQX-00DQ6k-8T for pgsql-general@arkaria.postgresql.org; Sat, 25 May 2024 20:55:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sAyPX-000oW9-60 for pgsql-general@arkaria.postgresql.org; Sat, 25 May 2024 20:54:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sAyPW-000oVk-Pv for pgsql-general@lists.postgresql.org; Sat, 25 May 2024 20:54:38 +0000 Received: from mail-qv1-xf34.google.com ([2607:f8b0:4864:20::f34]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sAyPS-000kkM-4L for pgsql-general@lists.postgresql.org; Sat, 25 May 2024 20:54:36 +0000 Received: by mail-qv1-xf34.google.com with SMTP id 6a1803df08f44-6ab8ab4ca0bso18916246d6.0 for ; Sat, 25 May 2024 13:54:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716670472; x=1717275272; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=2geTSog6sg1dUL+veR9xhFJP+NNhIX1Qgl8Q47SxI7k=; b=JFffQbuf0cKeKKe8G8Y8y/lEIFCEzy4qduvNMNLvu5F15X3DjUiaKcoypw7xeoniQl GakU6Chot6entWz4ZRJuH7OgIXqo/g6bc8d5tdUIz9JyrCVU9ya8tAVFyg4Zd+yxo7SE yuvIih+thIZJsMDHpVT24W3VnsTKA02pJADimJ7qpk8RZDooFouRhAAcKdVGHDoRGtDZ cCcqsfby5iXUT8QhqGqR8TodINQEYCUcx8LlM7qnr9HB5cG7MO1709yWE3upS23GPk2Q nNfoRi2J1D6c2mVwNEv578UJMsNSNHgBCGil0BS7f/1/N7NJClWWEaasUujAixumgf/T CzbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716670472; x=1717275272; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=2geTSog6sg1dUL+veR9xhFJP+NNhIX1Qgl8Q47SxI7k=; b=uTw1/aJOlqIHF3iw//Mz8EPoPA2hohj9bAejrLXCIdI1cRXkr4FwF7YMC1+Ab/XxI4 Dqb8G7JZtlKBoENjOuFinzvI4FyC+iVoDGqqueMYm6D8WdfPzWhVpGnBM58TpnBTB+fh WhkcvMEBWbNJqIBtQMVbXNMtAKkoM5gPLjjplioLucL2BsgQ5HcYGO9qZ5zMaxz4TlNI bc47/1QSSoc5mJaRUaJfWFsvPyG5uH1pl8andaFsBBb6yQM03Zp66X64NymHERzLWjns fxzcpSeHLnypEWv47+PXsvaVTotmnekHDanWo2FNkHIZpHeI0orbvHsypv3qCL7hKKlH CDrw== X-Forwarded-Encrypted: i=1; AJvYcCVa6VEmVTWfxPKmoQI4nxoEC+II6P4coOYS6MVyis/UMzUikjgdMxI8ciIfFok6QwcPdOV+v5PKeauUguiGZXrUFmOHNOb+/PkZ+0ZMtXZeuf5B X-Gm-Message-State: AOJu0Yxh6f5rzxx1XZn7/GrjkywmBFH4eDU7yOYrhM+tzLbAG20i/Wzo +zWBf9gr2cu97INFejYMQMUOOa3lHolmxL+rPIiE/qpLYmXiSm+YiTZZ2n5G9BiXMurHomQeVi9 ZYWNmN8DfWqYHh6WBUuqAscMUqKQ= X-Google-Smtp-Source: AGHT+IG3CcA076P70aePWWO01qBE0Zwz3OQRcdfIxNvNi6J9/7K0VQRmQGFirAEVG7+GZ9kn2hC/rgspn4f4z6TWNiI= X-Received: by 2002:a05:6214:5b82:b0:6a8:efd0:9bf3 with SMTP id 6a1803df08f44-6abcd0fc203mr60866526d6.54.1716670472120; Sat, 25 May 2024 13:54:32 -0700 (PDT) MIME-Version: 1.0 References: <891bcfec74f7358ef0212caf6565a35153dd2941.camel@cybertec.at> In-Reply-To: From: yudhi s Date: Sun, 26 May 2024 02:24:20 +0530 Message-ID: Subject: Re: Long running query causing XID limit breach To: sud Cc: Laurenz Albe , pgsql-general Content-Type: multipart/alternative; boundary="000000000000afede006194d7d81" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000afede006194d7d81 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, May 24, 2024 at 10:34=E2=80=AFAM sud wrote: > I am trying to understand these two parameters and each time it looks a > bit confusing to me. If These two parameters complement or conflict with > each other. > > Say for example, If we set hot_feedback_standby to ON (which is currently > set as default ON by the way), it will make the primary wait till the que= ry > completion at standby and can cause such a high bump in XID in scenarios > where the query on standby runs for days(like in our current scenario whi= ch > happens). So we were thinking of setting it as OFF, to avoid > the transaction ID wrap around issue.. > > But as you also mentioned to set the "max_standby_streaming_delay" to -1 > (which is currently set as 14 second in our case) ,it will wait infinitel= y > , till the query completes on the standby and wont apply the WAL which ca= n > cause override of the XID which the standby query is reading from. But wo= nt > this same behaviour be happening while we have hot_feedback_standby set a= s > "ON"? > > But again for HA , in case primary down we should not be in big lag for > the standby and thus we want the standby also with minimal lag. And as yo= u > mentioned there will never be incorrect results but at amx it will be que= ry > cancellation, so I was thinking , if it's fine to just keep the > "hot_feedback_standby" as OFF and let the max_standby_streaming_delay set > as it is like 14 sec. Let me know your thoughts. > > Basically below are the combinations, i am confused between.. > > hot_feedback_stanby ON and max_standby_streaming_delay=3D-1 > or > hot_feedback_stanby OFF and max_standby_streaming_delay=3D-1 > Or > hot_feedback_stanby ON and max_standby_streaming_delay=3D14 sec > Or > hot_feedback_stanby OFF and max_standby_streaming_delay=3D14 sec > As per my understanding here, this would be the behaviour. Others may comment.. *hot_standby_feedback ON and max_standby_streaming_delay =3D -1:* Ensures that long-running queries on the standby are not interrupted. The primary waits indefinitely to avoid vacuuming rows needed by standby queries. But Can lead to significant replication lag and increased XID consumption on the primary, potentially causing transaction ID wraparound issues. *hot_standby_feedback OFF and max_standby_streaming_delay =3D -1:* Ensures long-running queries on the standby are not interrupted. No feedback is sent to the primary, reducing the risk of XID wraparound. But The standby may fall significantly behind the primary, resulting in high replication lag. *hot_standby_feedback ON and max_standby_streaming_delay =3D 14 seconds:* The primary prevents vacuuming rows needed by standby queries, reducing query cancellations on the standby. The replication lag is limited to 14 seconds. But Long-running queries on the standby that exceed 14 seconds may be canceled, and the primary can still experience increased XID consumption. *hot_standby_feedback OFF and max_standby_streaming_delay =3D 14 seconds:* Limits replication lag to 14 seconds and reduces XID consumption on the primary. Queries on the standby exceeding 14 seconds are canceled. but Long-running queries on the standby are more likely to be canceled due to the lack of feedback to the primary. --000000000000afede006194d7d81 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Fri, May 24, 2024 at 10:34=E2=80= =AFAM sud <suds1434@gmail.com&= gt; wrote:
I am trying to understand these two parameters= and each time it looks a bit confusing to me. If These two parameters comp= lement or conflict with each other.

Say for example, If we set hot_feedback_standby to ON (whi= ch is currently set as default ON by the way), it will make the primary wai= t till the query completion at standby and can cause such a high bump in XI= D in scenarios where the query on standby runs for days(like in our current= scenario which happens). So we were thinking of setting it as OFF,=C2=A0to= avoid the=C2=A0transaction ID wrap around issue..

But as you also mentioned to set the "max_standby_streaming_delay&quo= t; to -1 (which is currently set as 14 second=C2=A0in our case) ,it will wa= it infinitely , till the query completes=C2=A0on the standby and wont apply= the WAL which can cause override of the XID which the standby query is rea= ding from. But wont this same behaviour be happening while we have hot_feed= back_standby set as "ON"?

But again for = HA , in case primary down we should not be in big lag for the standby and t= hus we want the standby also with minimal lag. And as you mentioned there w= ill never be incorrect results but at amx it will be query cancellation, so= I was thinking , if it's=C2=A0fine to just keep the "hot_feedback= _standby" as OFF and let the max_standby_streaming_delay set as it is = like 14 sec. Let me know your thoughts.

Basically = below are the combinations, i am confused between..=C2=A0

hot_feedback_stanby ON and max_standby_streaming_delay=3D-1
or
hot_feedback_stanby OFF and max_standby_streaming_delay=3D-= 1
Or
hot_feedback_stanby ON and max_standby_streami= ng_delay=3D14 sec
Or
hot_feedback_stanby OFF and ma= x_standby_streaming_delay=3D14 sec
<= br>
As per my understanding here, this would be the behaviour. Ot= hers may comment..


hot_standby_feedback ON = and max_standby_streaming_delay =3D -1:

Ensures that long-runnin= g queries on the standby are not interrupted. The primary waits indefinitel= y to avoid vacuuming rows needed by standby queries.
But Can lead to sig= nificant replication lag and increased XID consumption on the primary, pote= ntially causing transaction ID wraparound issues.

hot_standby_fee= dback OFF and max_standby_streaming_delay =3D -1:

Ensures long-r= unning queries on the standby are not interrupted. No feedback is sent to t= he primary, reducing the risk of XID wraparound.
But The standby may fal= l significantly behind the primary, resulting in high replication lag.
<= br>
hot_standby_feedback ON and max_standby_streaming_delay =3D 14 se= conds:

The primary prevents vacuuming rows needed by standby que= ries, reducing query cancellations on the standby. The replication lag is l= imited to 14 seconds.
But Long-running queries on the standby that excee= d 14 seconds may be canceled, and the primary can still experience increase= d XID consumption.

hot_standby_feedback OFF and max_standby_strea= ming_delay =3D 14 seconds:

=C2=A0Limits replication lag to 14 se= conds and reduces XID consumption on the primary. Queries on the standby ex= ceeding 14 seconds are canceled.
but Long-running queries on the standby= are more likely to be canceled due to the lack of feedback to the primary.=

=C2=A0
--000000000000afede006194d7d81--