Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sBItb-00Fi0T-Dl for pgsql-general@arkaria.postgresql.org; Sun, 26 May 2024 18:47:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sBIta-004TrW-CR for pgsql-general@arkaria.postgresql.org; Sun, 26 May 2024 18:47:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sBItZ-004TrO-RZ for pgsql-general@lists.postgresql.org; Sun, 26 May 2024 18:47:01 +0000 Received: from mail-vk1-xa32.google.com ([2607:f8b0:4864:20::a32]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sBItS-0026oO-FP for pgsql-general@lists.postgresql.org; Sun, 26 May 2024 18:47:00 +0000 Received: by mail-vk1-xa32.google.com with SMTP id 71dfb90a1353d-4e4effccfccso771896e0c.1 for ; Sun, 26 May 2024 11:46:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716749213; x=1717354013; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=qRIEcaKwWl7buplR0VdBchYVJuELHq+uKOcRdO8ak78=; b=SQq+YkAaiVniFc54yQ13DM/lPhgpii6NpJtkxwTVheIPlp9eXX9pJz3W02V9ngYjhH GQUZVG+GuLfIDf723LBZrCyL6hwMuEpYKDOXQsR4OBkSfRqHqWAwGZ8tsm+zKgF81nEQ O+G3nu7BlsIYdFEK2gYX2f8IbUrIi40EG94iMt/O81DfndJZ/vuB8XtZv1WSjSWWBGFy uvij4GPOf6qtfRoyTgx+XbsfN/lfHw6AJ878w0WSwrVrr+bha4d+2QNRwXNcrhT2hP3v H/JnM3GijK4CS6oSS1VFRqR1KdCUSKmAp439kkay6Z9dIqxxJNJpG/HCpoDMs+WmadTg kYlQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716749213; x=1717354013; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=qRIEcaKwWl7buplR0VdBchYVJuELHq+uKOcRdO8ak78=; b=DUq7C9Vzp/l+8HsMIePQOO/bfhIBLdmbPC4HZqLRckTrTLZIF+nE7hsuBciJSs9cZs SezsSXGCnX0uz1ZkPSnCC9DIrZPHT8humgq8NSt7c/VxpFgXD1UWTwKn5mg6Qqf1qKB6 bS+HIm8KVeqarGyGghoty/iFOoa8rpP85kF/e5yUE20NKChSm7jkm9X1dDY+1Vfqzi3J qgBJ+cUkDZJ7IE8MR/ua40Tcbmj8xeZCIUFyxVCfAOAjH4NC4ykmd5X9gKmLHx0VEPxf tpFlLN5qctlMdpXo6Q9gzAbL8YEOQn/GQd+vbCd3uAlgAJoskZ5i+pmap5Dw3s55qx3U dw7A== X-Gm-Message-State: AOJu0Ywzqkzv7JV/Yv8+QeeYwxqqj2ZVEFAa6oU567eJI6X7XBWHVA7l OftirVzq5dtlUya6pmxAkrMOWIlx25XDfaRmhM5Wk99Wie/7MhyY1MokYOp5qRcAV8cojfM5PIk gT7SPkfJec8IvtBoANgL4LD9Tqw0= X-Google-Smtp-Source: AGHT+IGfbk68IR27fTvIyRbGDqvRfyuLSVPDwt/AcqBvk0vz17lE+R03teMPY1Xpefj9iepTeoUnqlPraZoXZ4CL9AQ= X-Received: by 2002:a05:6122:919:b0:4da:ced8:b09a with SMTP id 71dfb90a1353d-4e4f012a446mr7673307e0c.0.1716749212330; Sun, 26 May 2024 11:46:52 -0700 (PDT) MIME-Version: 1.0 References: <891bcfec74f7358ef0212caf6565a35153dd2941.camel@cybertec.at> In-Reply-To: From: sud Date: Mon, 27 May 2024 00:16:40 +0530 Message-ID: Subject: Re: Long running query causing XID limit breach To: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000f833f106195fd2ee" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f833f106195fd2ee Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, May 26, 2024 at 11:18=E2=80=AFPM Torsten F=C3=B6rtsch wrote: > Each query on the replica has a backend_xmin. You can see that in > pg_stat_activity. From that backend's perspective, tuples marked as delet= ed > by any transaction greater or equal to backend_xmin are still needed. Thi= s > does not depend on the table. > > Now, vacuum writes to the WAL up to which point it has vacuumed on the > master. In pg_waldump this looks like so: > > PRUNE snapshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0, > redirected: [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel > 1663/5/16430 blk 0 > > That snapshotConflictHorizon is also a transaction id. If the backend_xmi= n > of all backends running transactions in the same database (the 5 in 16 > 63/5/16430) -as the vacuum WAL record is greater than vacuum's > snapshotConflictHorizon, then there is no conflict. If any of the > backend_xmin's is less, then there is a conflict. > > This type of conflict is determined by just 2 numbers, the conflict > horizon sent by the master in the WAL, and the minimum of all > backend_xmins. For your case this means a long running transaction queryi= ng > table t1 might have a backend_xmin of 223. On the master update and delet= e > operations happen on table T2. Since all the transactions on the master a= re > fast, when vacuum hits T2, the minimum of all backend_xmins on the master > might already be 425. So, garbage left over by all transactions up to 424 > can be cleaned up. Now that cleanup record reaches the replica. It compar= es > 223>425 which is false. So, there is a conflict. Now the replica can wait > until its own horizon reaches 425 or it can kill all backends with a lowe= r > backend_xmin. > > As I understand, hot_standby_feedback does not work for you. Not sure if > you can run the query on the master? That would resolve the issues but > might generate the same bloat on the master as hot_standby_feedback. > Another option I can see is to run long running queries on a dedicated > replica with max_standby_streaming_delay set to infinity or something lar= ge > enough. If you go that way, you could also fetch the WAL from your > WAL archive instead of replicating from the master. That way the replica > has absolutely no chance to affect the master. > > Thank you so much. Would you agree that we should have two standby, one with default max_standby_streaming_delay (say 10 sec ) which will be mainly used as high availability and thus will be having minimal lag. and another standby with max_standby_streaming_delay as "-1" i.e. it will wait indefinitely for the SELECT queries to finish without caring about the lag, which will be utilized for the long running SELECT queries. And keep the hot_standby_feedback as ON for the first standby which is used as HA/high availability. And keep the hot_standby_feedback as OFF for the second standby which is utilized for long running SELECT queries, so that primary won't be waiting for the response/feedback from this standby to vacuum its old transactions and that will keep the transaction id wrap around issue from not happening because of the Read/Select queries on any of the standby. --000000000000f833f106195fd2ee Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, May 26, 2024 at 11:18=E2=80=AFPM = Torsten F=C3=B6rtsch <tfoertsc= h123@gmail.com> wrote:
= Each query on the replica has a backend_xmin. You can see that in pg_stat_a= ctivity. From that backend's perspective, tuples marked as deleted by a= ny transaction greater or equal to backend_xmin are still needed. This does= not depend on the table.

Now, vacuum writes to the WAL up to which point it has vacuumed on t= he master. In pg_waldump this looks like so:

PRUNE sna= pshotConflictHorizon: 774, nredirected: 0, ndead: 5, nunused: 0, redirected= : [], dead: [2, 4, 6, 8, 10], unused: [], blkref #0: rel 1663/5/1643= 0 blk 0

That snapshotConflictHorizon is= also a transaction id. If the backend_xmin of all backends running transac= tions in the same database (the 5 in=C2=A01663/5/1= 6430) -as the vacuum WAL record is greater than vacuum's snapsho= tConflictHorizon, then there is no conflict. If any of the backend_xmin'= ;s is less, then there is a conflict.

This type of= conflict is determined by just 2 numbers, the conflict horizon sent by the= master in the WAL, and the minimum of all backend_xmins. For your case thi= s means a long running transaction querying table t1 might have a backend_x= min of 223. On the master update and delete operations happen on table T2. = Since all the transactions on the master are fast, when vacuum hits T2, the= minimum of all backend_xmins on the master might already be 425. So, garba= ge left over by all transactions up to 424 can be cleaned up. Now that clea= nup record reaches the replica. It compares 223>425 which is false. So, = there is a conflict. Now the replica can wait until its own horizon reaches= 425 or it can kill all backends with a lower backend_xmin.

<= /div>
As I understand, hot_standby_feedback does not work for you. Not = sure if you can run the query on the master? That would resolve the issues = but might generate the same bloat on the master as hot_standby_feedback. An= other option I can see is to run long running queries on a dedicated replic= a with max_standby_streaming_delay set to infinity or something large enoug= h. If you go that way, you could also fetch the WAL from your WAL=C2=A0arch= ive instead of replicating from the master. That way the replica has absolu= tely no chance to affect the master.


Thank you so much.

Would you= agree that we should have two standby, one with default max_standby_stream= ing_delay (say 10 sec ) which will be mainly used as high availability and = thus will be having minimal lag. and another standby with max_standby_strea= ming_delay as "-1" i.e. it will wait indefinitely for the SELECT = queries to finish without caring about the lag, which will be utilized for = the long running SELECT queries.

And keep the hot_standby_feed= back as ON for the first standby which is used as HA/high availability. And= keep the hot_standby_feedback as OFF for the second standby which is utili= zed for long running SELECT queries, so that primary won't be waiting f= or the response/feedback from this standby to vacuum its old transactions a= nd that will keep the transaction id wrap around issue from not happening b= ecause of the Read/Select queries on any of the standby.=C2=A0
<= /div> --000000000000f833f106195fd2ee--