Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vXG3z-005fvg-1z for pgsql-general@arkaria.postgresql.org; Sun, 21 Dec 2025 09:49:20 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vXG3w-00BWj2-2G for pgsql-general@arkaria.postgresql.org; Sun, 21 Dec 2025 09:49:17 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vXG3w-00BWir-0h for pgsql-general@lists.postgresql.org; Sun, 21 Dec 2025 09:49:16 +0000 Received: from mail.hjp.at ([212.17.106.138] helo=rorschach.hjp.at) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vXG3r-001nCO-2U for pgsql-general@lists.postgresql.org; Sun, 21 Dec 2025 09:49:15 +0000 Received: by rorschach.hjp.at (Postfix, from userid 1000) id 364E44E11; Sun, 21 Dec 2025 10:49:08 +0100 (CET) Date: Sun, 21 Dec 2025 10:49:08 +0100 From: "Peter J. Holzer" To: pgsql-general@lists.postgresql.org Subject: Re: Dealing with SeqScans when Time-based Partitions Cut Over Message-ID: Mail-Followup-To: pgsql-general@lists.postgresql.org References: <875xa31v2x.fsf@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="cedqcoenflsansza" Content-Disposition: inline In-Reply-To: <875xa31v2x.fsf@gmail.com> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --cedqcoenflsansza Content-Type: text/plain; protected-headers=v1; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: Dealing with SeqScans when Time-based Partitions Cut Over MIME-Version: 1.0 On 2025-12-18 13:52:22 -0500, Matthew Planchard wrote: >=20 > In a table with high insert frequency (~1.5k rows/s) and high query > frequency (~1k queries/s), partitioned by record creation time, we have > observed the following behavior: >=20 > * When the current time crosses a partition boundary, all new records > are written to the new partition, which was previously empty, as > expected >=20 > * Because the planner's latest knowledge of the partition was based on > its state prior to the cutover, it assumes the partition is empty and > creates plans that use sequential scans >=20 > * The table accumulates tens to hundreds of thousands of rows, and the > sequentail scans start to use nearly 100% of available database CPU >=20 > * Eventually the planner updates thee stats and all is well, but the > cycle repeats the next time the partitions cut over. If I understand you correctly, the stats are updated after "tens to hundreds of thousands of rows" which at that insert rate would translate to several minutes. > We have tried setting up a cron job that runs ANALYZE on the most recent > partition of the table every 15 seconds at the start of the hour, and > while this does help in reducing the magnitude and duration of the > problem, it is insufficient to fully resolve it (our engineers are still > getting daily pages for high DB CPU utilization). Since nobody seems to have an idea how to prevent the seqscans I'd go back one step: What is the real problem here? 1) The database is sluggish during that time and users are negatively affected by the sluggishness. 2) Your engineers get spurious pages for something which isn't really a=20 problem and which they can do nothing about. 3) Something else. Note that I don't consider "high CPU utilization" a real problem. That's what the CPU is there for. It might cause problems or be a symptom of a problem, but it isn't a problem in itself. If the problem is 1) I agree that a database-related solution should be found. I think it is possible to explicitely set the statistics on a table. So maybe that would be good enough until auto-analyze kicks in. But if 2) is the real problem I think you should change the ruleset in your monitoring. Pages should only be sent out for conditions which actually impact the service and which can (probably) be fixed or at least investigated by the engineer on call. Something which happens every day at the same time and resolves itself within a few minutes doesn't satisfy these criteria. Maybe you can just add a "downtime" to that rule. Something like "between 00:00 and 00:07, don't send out notifications for this condition". hjp --=20 _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@hjp.at | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" --cedqcoenflsansza Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEETtJbRjyPwVTYGJ5k8g5IURL+KF0FAmlHwosACgkQ8g5IURL+ KF1dFBAAlz1vZ86BPQF4aXNRcFBtQg/1q5sx7nZDCowP7zIE1dIHxvrg33STSP0j 8W61GYMrDXAp+LrdI6tCq8H6LILftjBxQyrhn2GcSm+vBmghG7yoawgM8K0sTB+O YmeVb+ygp9mkelcf2QtG9QgWc23dam26nsJoUTAY6f7UuIjSNRovnfSfeNhnft7W WHmrDyMhAuFxjBsoKWHXM/EibE+tEF6Y9KvkpjAKwo+rlTtj/cY6cTDGayhXDuAJ 05yXNne+IsSRT0pPxS9qTr38Vpc0mhvYi5kQzGPXHLt5Qsr5vf7EoFHuqXV2/2LT hCiB9PiJCUkHb/uawFJZzG3rrNljkIEXkEuug9hr1WXT0u04VegmkyACJMFYEXHa wbcfuziv+GKzFuvwtd714E7hOUluAbFndMxsMJSbchfedBFTsIU0+MTNyF1eQnU7 CFR+ouOUsWOAzxG8VIriCPUFHXIj4bTNZCI6UFZVLK+783Ok9MrJvnjsY428QgG+ xim3/qj2iS1qZ3ff+M1IQOPNj0AUiksnrre/uuV9+EBSBycFyUpw0ixGJYqwY1Al Td4+dRx9tgKChzLrD+HZMyn/sYps9zYSXJpM/k0PImyySVBB/4vE4HIgcIwkNj2o BQy4Qsg3+x4k0rppGrVpmQWWQEkmx3CpGyO1OGF6Vkkdz+QkYBw= =omuw -----END PGP SIGNATURE----- --cedqcoenflsansza--