Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6rhq-002BRo-Fd for pgsql-general@arkaria.postgresql.org; Tue, 14 May 2024 12:56:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s6rgr-00EJiH-VF for pgsql-general@arkaria.postgresql.org; Tue, 14 May 2024 12:55:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6rgr-00EJi7-GS for pgsql-general@lists.postgresql.org; Tue, 14 May 2024 12:55:33 +0000 Received: from mout.gmx.net ([212.227.15.15]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6rgn-000C4f-La for pgsql-general@lists.postgresql.org; Tue, 14 May 2024 12:55:32 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net; s=s31663417; t=1715691327; x=1716296127; i=jimis@gmx.net; bh=wqE5QmHnHQXazjIJuXfkqDtyHj2+mttgCypkKHLdqis=; h=X-UI-Sender-Class:Date:From:To:cc:Subject:In-Reply-To:Message-ID: References:MIME-Version:Content-Type:Content-Transfer-Encoding:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=rc4/XJ/x9EvL0Zu/DTH6RkGrBhQpSjFgBKFDjVTNj7MsiGSwVvTd5Mhbe2WtYeMk guwJtmIDpiGF5wKIQi2OoblSBbX0N+Qr6UVxkrWXrUpmjwP557A1TWMAvDEYCc2lV M5mXiWQDhH6Uiy2Y4OW5v2KnS1crgFatO6MX+w9YHti9FJHhN8352xn7kdaLwzfY4 yHX7U9BeCBuTsTwZ+Z6WjyVDx2Ghzy8gnE6I/6txFn86yMggQ7q7JNX52LROy6U3q dzASaPIagfDBhvjGD26LEeciuezx0/9sjPhFaRfOl0/eFJv3ynoPQG5jc1X0qB43/ L2WxrLdUOhewyy/isA== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from [10.9.70.35] ([185.55.106.54]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1MhD6W-1sk1zJ1has-00eIR5; Tue, 14 May 2024 14:55:27 +0200 Date: Tue, 14 May 2024 14:55:26 +0200 (CEST) From: Dimitrios Apostolou To: Tom Lane cc: pgsql-general@lists.postgresql.org, David Rowley Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions In-Reply-To: <1629463.1715372568@sss.pgh.pa.us> Message-ID: <660a8477-4130-40da-3492-f8827c5c3596@gmx.net> References: <7886a68f-b466-2131-1747-f69f0fb71a37@gmx.net> <69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net> <559b0e40-63e6-fa9a-6b03-d1eba10f30f8@gmx.net> <1629463.1715372568@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Provags-ID: V03:K1:Nx3OFJ6EA0FfbTP02HcL24gIeBnAhoZw/bU/7QjpchSwNckuk8/ /RM8d5WbmRfV5R7w4JtdkSR3cdHCT9wjVBs7wbuk1RkHxfF8Ig+eHkkNUxRlWK1zAibxalt Gl3/Ifb9ziWvdtw6xDaoyf6ptUOfqGhjEcFKr8/1+XBrZ7atimLvQoHimoorY8Xg8IasLnE QgXJXMhzZd1AP7GKIjLiQ== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:xGxwux+K5QU=;fY81U4y0Xg5YqAD6L5Vq4Q4gVYb NiUd04kFPLRvyP5QE91FPnB0n415LWjrvwuA+L4EY1Promn/w2EJJnj7nKOV4+LmIWY15bC6I iHH91gQKtIch6n53htQwQfmPWPpaFmO6dEjZgcFs11HgpEmSe39SIEef45oYJXwMe69BAN5WI PzhWYopKadH17xnEnXWyCt6DWLI9gGhMzBD10DpeQI9aCRMvfTTcyXRsTxY5owAx7aBGJluYT BsJNA0ie8dbD58cyzCmDTcENw9kcsMhLZBN63OXXORC7zuOQ2QuwRVtjCJ7KPLPJQ/Eol9DQa hvFqfCN+03CCHxCoFhCTPCNM4+ChfhlXQFpRMA51Tyal1kFZycsKt9OFClGamL2FDSdLcdkEp 9A7Im5aJ5CSNSxp05LcB8v5bXx95TTL7wg0S4TlN3SdH6xvBbnu1ibeJiZF7rSpD38OinAd+l lUyNc4845sTKDwfCPL/iVy0AwvfNmXBRfZc1kBt5Oqq8/gsbNAkYhB7H0r/rrgkygh5WzYw9/ Ks87rtL7TpoQPrfN+t97mAnjVVDEig/w+n9uFJOZLVM2LR6M9BuBuej7/J6wC/IS0E3de+igJ tQuJgrOeiyCBaClcXyC+TqXrTJSX66ShO2ZN8d7QCl2YGnwVzXM+DUHXjY5za1gvN7rfTyV6m 1hSF0GLHi65TiAs9edjr4kldqyU4SjK729Czpp8VJ1zGTWQ36m2+/axCvvDqS2EE39fr1V/da uHtTcnvq909HSjxZK6qVD/cFNtKmBtPfd5xpJWc4ikJ8JJwTOtl6Ll4h/tpgsJlKIs8XY/m8n P413xBL1EqsqT6m8PYOrclyCJHUqOAANL5Psud6jAsmLA= Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 10 May 2024, Tom Lane wrote: > Dimitrios Apostolou writes: >> Further digging into this simple query, if I force the non-parallel pla= n >> by setting max_parallel_workers_per_gather TO 0, I see that the query >> planner comes up with a cost much higher: > >> Limit (cost=3D363.84..1134528847.47 rows=3D10 width=3D4) >> -> Unique (cost=3D363.84..22690570036.41 rows=3D200 width=3D4) >> -> Append (cost=3D363.84..22527480551.58 rows=3D65235793929= width=3D4) >> ... > >> The total cost on the 1st line (cost=3D363.84..1134528847.47) has a muc= h >> higher upper limit than the total cost when >> max_parallel_workers_per_gather is 4 (cost=3D853891608.79..853891608.99= ). >> This explains the planner's choice. But I wonder why the cost estimatio= n >> is so far away from reality. > > I'd say the blame lies with that (probably-default) estimate of > just 200 distinct rows. That means the planner expects to have > to read about 5% (10/200) of the tables to get the result, and > that's making fast-start plans look bad. > > Possibly an explicit ANALYZE on the partitioned table would help. It took long but if finished: ANALYZE Time: 19177398.025 ms (05:19:37.398) And it made a difference indeed, the serial plan is chosen now: EXPLAIN SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n = DESC LIMIT 10; Limit (cost=3D364.29..1835512.29 rows=3D10 width=3D4) -> Unique (cost=3D364.29..22701882164.56 rows=3D123706 width=3D4) -> Append (cost=3D364.29..22538472401.60 rows=3D65363905182 wi= dth=3D4) -> Index Only Scan Backward using test_runs_raw__part_max= 20000k_pkey on test_runs_raw__part_max20000k test_runs_raw_1000 (cost=3D0= .12..2.34 rows=3D1 width=3D4) -> Index Only Scan Backward using test_runs_raw__part_max= 19980k_pkey on test_runs_raw__part_max19980k test_runs_raw_999 (cost=3D0.= 12..2.34 rows=3D1 width=3D4) -> Index Only Scan Backward using test_runs_raw__part_max= 19960k_pkey on test_runs_raw__part_max19960k test_runs_raw_998 (cost=3D0.= 12..2.34 rows=3D1 width=3D4) [...] -> Index Only Scan Backward using test_runs_raw__part_max= 12460k_pkey on test_runs_raw__part_max12460k test_runs_raw_623 (cost=3D0.= 57..12329614.53 rows=3D121368496 width=3D4) -> Index Only Scan Backward using test_runs_raw__part_max= 12440k_pkey on test_runs_raw__part_max12440k test_runs_raw_622 (cost=3D0.= 57..5180832.16 rows=3D184927264 width=3D4) -> Index Only Scan Backward using test_runs_raw__part_max= 12420k_pkey on test_runs_raw__part_max12420k test_runs_raw_621 (cost=3D0.= 57..4544964.21 rows=3D82018824 width=3D4) [...] Overall I think there are two issues that postgres could handle better here: 1. Avoid the need for manual ANALYZE on partitioned table 2. Create a different parallel plan, one that can exit early, when the LIMIT is proportionally low. I feel the partitions could be parallel-scanned in-order, so that the whole thing stops when one partition has been read. Thank you! Dimitris