Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sB9zC-00EgRP-IN for pgsql-general@arkaria.postgresql.org; Sun, 26 May 2024 09:16:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sB9zA-003ljh-Co for pgsql-general@arkaria.postgresql.org; Sun, 26 May 2024 09:16:12 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sB9z9-003ljY-RE for pgsql-general@lists.postgresql.org; Sun, 26 May 2024 09:16:12 +0000 Received: from mail-ua1-x936.google.com ([2607:f8b0:4864:20::936]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sB9z3-0023I3-Hh for pgsql-general@lists.postgresql.org; Sun, 26 May 2024 09:16:10 +0000 Received: by mail-ua1-x936.google.com with SMTP id a1e0cc1a2514c-804e25cdd65so335713241.0 for ; Sun, 26 May 2024 02:16:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716714964; x=1717319764; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Bs4HRbVquyYtY031bGboOB6hNZw6fQVjw1eKi8mJp5Y=; b=hy25LGcdk5CZNz37bFL33rBOC8Tucby7TH3m68c+b2Yr4S7BELJWNYSaBTfmUAWjip qpNxiH5N77MtR0HuXnklpPQVkbBSw1a9qZHkaRx4TnxlG8uzIsL/5bXQ23gPOIT3TqBV p8HcVyHo5gcF3wBHGa4//1OgkBzf+0R3zp+ePiotWMMh/niC7EkwvNBKLzUVrEMwY+OR /uaxFTQN4iP79Oh3CWSfouS9k9CCnCdnfWbEYBs/wlz67ywdXSrKDpI+4hy95lPL15TZ Mb5ZYwZjGR8l/fVxfVQiddkcj2NV0nBer+JRkDtepbX1hqCqSTO48Yw6+iF/mz1V+yi6 PD+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716714964; x=1717319764; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Bs4HRbVquyYtY031bGboOB6hNZw6fQVjw1eKi8mJp5Y=; b=iJ8Jm/FxohSw4qJ6xhkBrrPxXPHWc7srub5t8xRAQfqyik/0OSOhk+ywZkR4MQlRM/ Ab7ALtSCw6qSf5cL7anRDyCgtbHsq9nBLGm9ptqlQjB9UV7hMTWHVsDxMr5L0FKFSjsc Bk5ljaOAX8Eu2yNWqriyqMGrR2zdhu295a46kwYhm92ipO2rx3icFm1cGgnv94qlkRBG 5Ju1QgXWiumsedBId/EJIkBNwsQHnwEH9y3n56fCzef8mnuZ0FMUPA/CXrpQmsCjAgmn egyoznsPtdTYoliCocKJQt3mchHJpOp4KmlxuW6CGRzHrmyv/q3XUqTtJzZP4QW1OGji qpqg== X-Gm-Message-State: AOJu0YwonYhPXQhIUrTacmE5ev66qBYxBgkVvWuSgeMWyJMj8SkKBZFf oCurbMEi840A5eCYbyYblleClzUnPQnz9SdRbVwR9o4N9j6f9kcloJDjsqW26HzzDM6MPxTSr5p 7Nfv2CE7ySHNM0GKU40jaIWK1aKI= X-Google-Smtp-Source: AGHT+IEZMkJRmlgww5QknT119EwEnBFHkTfDZg7xQ3fXwmTz/GwQRSbDSjtza4J/1sX12PV+ZMqNkJ+bRSWmu7rBYyQ= X-Received: by 2002:a05:6102:6d5:b0:47e:f109:ea33 with SMTP id ada2fe7eead31-48a385b8a73mr7980922137.19.1716714963014; Sun, 26 May 2024 02:16:03 -0700 (PDT) MIME-Version: 1.0 References: <891bcfec74f7358ef0212caf6565a35153dd2941.camel@cybertec.at> In-Reply-To: From: sud Date: Sun, 26 May 2024 14:45:50 +0530 Message-ID: Subject: Re: Long running query causing XID limit breach To: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000008d18de061957d923" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000008d18de061957d923 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, May 26, 2024 at 1:43=E2=80=AFPM Torsten F=C3=B6rtsch wrote: > On Sat, May 25, 2024 at 11:00=E2=80=AFPM sud wrote: > >> >> But i have one question here , does max_standby_streaming_delay =3D 14 , >> means the queries on the standby will get cancelled after 14 seconds? >> > > No, your query gets cancelled when it stalls replication for >14 sec. If > your master is idle and does not send any WAL and the replica has > caught up, the query can take as long as it wants. > Thank you so much. For example , in below scenario, if i have insert query going on on primary instance on table 25th may partition of TABLE1, and at same time we are selecting data from 24th May partition , then with "max_standby_streaming_delay =3D 14" setup , it just allows the select query to run for any duration without any restriction even if the WAL gets applied on the standby regularly. Also INSERT query in primary won't make the standby SELECT queries to cancel as because the WAL record of INSERT queries on the primary instance is not conflicting to the exact rows those were being read by the standby. Is my understanding correct here? However, if i have Update/Delete query going on on primary instance on table 25th may partition of TABLE1 and on the exact same set of rows which were being read by the standby instance by the SELECT query, then the application of such WAL record to standby can max wait for 14 seconds and thus those select query are prone to be cancelled after 14 seconds. Is this understanding correct? If the above is true then it doesn't look good, as because in an OLTP system there will be a lot of DMLS happening on the writer instances and there may be many queries running on the reader/standby instances which are meant to run for hours. And if we say making those SELECT queries run for hours means compromising an hour of "high availability"/RPO or a lag of an hour between primary and standby , that doesn't look good. Please correct me if I am missing something here. --0000000000008d18de061957d923 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Sun, May 26, 2024 at 1:43=E2=80=AFPM T= orsten F=C3=B6rtsch <tfoertsch= 123@gmail.com> wrote:
O= n Sat, May 25, 2024 at 11:00=E2=80=AFPM sud <suds1434@gmail.com> wrote:

But i have one question here , does=C2=A0max_standby_strea= ming_delay =3D 14 , means the queries on the standby will get cancelled aft= er 14 seconds?

No, your q= uery gets cancelled when it stalls replication for >14 sec. If your mast= er is idle and does not send any WAL and the replica has caught=C2=A0up, th= e query can take as long as it wants.=C2=A0

Thank you so much.=C2=A0
For example , i= n below scenario,
if i have insert query going on on primary inst= ance on table=C2=A0 25th may partition of TABLE1, and at same time we are s= electing data from 24th May partition , then with "max_standby_streami= ng_delay =3D 14" setup , it just allows the select query to run for an= y duration without any restriction even if the WAL=C2=A0gets applied on the= standby regularly. Also INSERT query in primary won't=C2=A0make the st= andby SELECT queries to cancel as because the WAL record of INSERT queries = on the primary instance is not conflicting to the exact=C2=A0rows those wer= e being read by the standby. Is my understanding correct here?
However, if i have Update/Delete query going on on primar= y instance on table=C2=A0 25th may partition of TABLE1 and on the exact sam= e set of rows which were being read=C2=A0by the standby instance by the SEL= ECT query, then the application of such=C2=A0WAL=C2=A0record to standby can= max wait for 14 seconds and thus those select query are prone to be cancel= led=C2=A0after 14 seconds. Is this understanding correct?

If the above is true then it doesn't=C2=A0look good, as because= in an OLTP system there will be a lot of DMLS happening on the writer inst= ances and there may be many queries running on the reader/standby instances= which are meant to run for hours. And if we say making those SELECT querie= s run for hours means compromising an hour of "high availability"= /RPO or a lag of an hour between primary and standby , that doesn't=C2= =A0look good. Please correct=C2=A0me if I am missing something here.
<= div>

--0000000000008d18de061957d923--