Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sA1th-0074Rw-BQ for pgsql-general@arkaria.postgresql.org; Thu, 23 May 2024 06:25:54 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sA1tg-008ilY-6D for pgsql-general@arkaria.postgresql.org; Thu, 23 May 2024 06:25:52 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sA1tf-008ikR-N8 for pgsql-general@lists.postgresql.org; Thu, 23 May 2024 06:25:51 +0000 Received: from mail-ua1-x92a.google.com ([2607:f8b0:4864:20::92a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sA1td-001ZmQ-26 for pgsql-general@lists.postgresql.org; Thu, 23 May 2024 06:25:50 +0000 Received: by mail-ua1-x92a.google.com with SMTP id a1e0cc1a2514c-80312cb791dso533366241.3 for ; Wed, 22 May 2024 23:25:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1716445548; x=1717050348; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=4ldSApuUnepInI+WNANdem6bML1lytEauOaw/tGSSbk=; b=B/wxpv+x8apXMbN7YOsxVe1KywZ35+8cCUx6HHgSMCFELoNHARAf899sjD5YGrKKzA pihx6Cnk1Mp5M168DsJSQekl7oblAhAj2bhF8M3qmwPrJ2wI4YWtQidQHweMLqDNYyJX qwmMZV9UxBZhzOLgQH7Jviti3fcNeMfY5Aa8NBlbCcGbDo7rVj7BoGk6782Ul8law6vT a8pePeeRjbcbLjGDk3Oy/aM8y+3LJbgS0cqzsFnKAHIOuqGUY/pEtttZrDJphWR6s/uT ppRJi2yaXKBnESGDz4CIT1N9KagPNU3afSaB5jQZhhyY/k+yEyG4TEoWYH6/fdp8OW4h +4Ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1716445548; x=1717050348; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=4ldSApuUnepInI+WNANdem6bML1lytEauOaw/tGSSbk=; b=FcxliyYA1rq/+gIEqdOWH795ad7GnW6BKrrMOHdPhN5MK1nWCqpuKuAU5c78v+9v2z lfbNUYQw/y1QwpF+F9NL94NeWe8LO7irEnpYVP/VghjaKk5H7yLOBTiImVtbEf7txIZ3 03ShOTyQ8NrxM+50mEdB9iSP3qko5K1JsiX5VEQ3NRHH2FKVn0AdIfBinz06n1poRYVg x8A1D9yDuLS1yWpMF72pM0AkEmb8Z7KWvA8rw7q1GnuGFvItBbrOdFA7EvYdFfJ5ej2w rwBusyPph/oVVPWZCycXzJ0OzNbkLBvLGH1zeqiQ1kpr4KMD/QHH9rN4RXJuE2NqTLOQ X09A== X-Gm-Message-State: AOJu0Yzhi6X9PdWtrNmZHKVDi6KhkRdkysDM+Wq7XpX451pRy+spdNa/ +r/qsIHVrhYqXIRvbu7UNJunb+xzzeTJEZPrs7su+rf6DY86YfrMAx9oMx7F0ZrJFMCs7VIbOtk 3dlPtWPWBgmjFqUBtTzN1rmyFkUw= X-Google-Smtp-Source: AGHT+IHetaBWR7JQdMjbnzKFpnch9aQ7b34IwZ/6AsV3lds0YcJRfBQladBRW3wrSKSxkQAzIlTCT7TQLrCRJOM4k1s= X-Received: by 2002:a05:6122:1da6:b0:4cd:20ea:35aa with SMTP id 71dfb90a1353d-4e2185483edmr4205343e0c.8.1716445548114; Wed, 22 May 2024 23:25:48 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: sud Date: Thu, 23 May 2024 11:55:37 +0530 Message-ID: Subject: Re: Long running query causing XID limit breach To: Muhammad Salahuddin Manzoor Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000002bf8740619191f77" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000002bf8740619191f77 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Also,if i am getting it correct, it means we should not run any transaction (even if it's legitimate one like for e.g. a big Reporting "SELECT" query) beyond 10hrs, as that will end up consuming 10*200million XID per hour=3D 2billion XID limit saturation and thus causing system failure. Hope my understanding is correct here. On Thu, May 23, 2024 at 11:41=E2=80=AFAM sud wrote: > > On Thu, May 23, 2024 at 10:42=E2=80=AFAM Muhammad Salahuddin Manzoor < > salahuddin.m@bitnine.net> wrote: > >> Greetings, >> >> Running `VACUUM table_name;` on a partitioned table will vacuum each >> partition individually, not the whole table as a single unit. >> >> Yes, running `VACUUM table_name;` frequently on tables or partitions wit= h >> heavy DML is recommended. >> >> Regular `VACUUM` does not lock the table for reads or writes, so it won'= t >> disrupt ongoing 24/7 data operations. >> >> "optimize autovacuum" >> Yes. Adjust following parameters as per your system/environment >> requirement >> autovacuum_max_workers, >> autovacuum_freeze_max_age , >> autovacuum_vacuum_cost_delay >> >> Following need to be first tested thoroughly in a test environment. >> Recommended Alert Threshold >> Alert at 50% Usage: Set the alert threshold at 1 billion used XIDs. This >> provides a significant buffer, giving you ample time to take corrective >> action before reaching the critical limit. >> >> Calculation Rationale >> Daily XID Usage: Approximately 4 billion rows per day implies high XID >> consumption. >> Buffer Time: At 1 billion XIDs, you would still have 1 billion XIDs >> remaining, giving you roughly 12 hours to address the issue if your syst= em >> consumes 200 million XIDs per hour. >> >> >> > Thank you so much. That helps. > So apart from setting these alerts on "Maximumusedtxnids" and making the > vacuum optimized by tweaking above parameters, should we also need to hav= e > monitoring in place to ensure the Vacuum is not taking longer as compared > to its normal runtime and also if it's getting blocked/failed by somethin= g? > Like for example in our case where the select query was running longer , = so > the vacuum must not be able to succeed every time it attempts, so is it > really worth having that level of alerting? and also how can we get an > idea regarding if the vacuum is not succeeding or getting failed etc to > avoid such upcoming issues? > > --0000000000002bf8740619191f77 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Also,if i am getting it correct, it means we should not ru= n any transaction (even if it's legitimate one like for e.g. a big Repo= rting "SELECT" query) beyond 10hrs, as that will end up consuming= 10*200million XID per hour=3D 2billion XID limit saturation and thus causi= ng system failure. Hope my understanding=C2=A0is correct here.

On Thu, May 2= 3, 2024 at 11:41=E2=80=AFAM sud <s= uds1434@gmail.com> wrote:

On Thu, May 23, 2024 at = 10:42=E2=80=AFAM Muhammad Salahuddin Manzoor <salahuddin.m@bitnine.net> wrote:=
Greetings,

Running `VACUUM table_name;` on a partition= ed table will vacuum each partition individually, not the whole table as a = single unit.

Yes, running `VACUUM table_name;` frequently on tables = or partitions with heavy DML is recommended.

Regular `VACUUM` does n= ot lock the table for reads or writes, so it won't disrupt ongoing 24/7= data operations.

"optimize autovacuum"
Yes. Adjust fol= lowing parameters as per your system/environment requirement
autovacuum= _max_workers,=C2=A0
autovacuum_freeze_max_age ,=C2=A0
a= utovacuum_vacuum_cost_delay

Following need to be first tested thorou= ghly in a test environment.
Recommended Alert Threshold
Alert at 50% = Usage: Set the alert threshold at 1 billion used XIDs. This provides a sign= ificant buffer, giving you ample time to take corrective action before reac= hing the critical limit.

Calculation Rationale
Daily XID Usage: A= pproximately 4 billion rows per day implies high XID consumption.
Buffer= Time: At 1 billion XIDs, you would still have 1 billion XIDs remaining, gi= ving you roughly 12 hours to address the issue if your system consumes 200 = million XIDs per hour.


Thank you so much. That helps.
So apart from setting these alert= s on "Maximumusedtxnids" and making the vacuum optimized by tweak= ing above parameters, should we also need to have monitoring in place to en= sure=C2=A0the Vacuum is not taking longer as compared to its normal runtime= and also if it's getting blocked/failed by something? Like for example= in our case where the select query was running longer , so the vacuum=C2= =A0must not be able to succeed every time it attempts, so is it really wort= h having=C2=A0that level of alerting?=C2=A0 and also how can we get an idea= regarding if the vacuum is not succeeding or getting failed etc to avoid s= uch upcoming issues?=C2=A0

--0000000000002bf8740619191f77--