Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6wcg-002cvP-Qg for pgsql-general@arkaria.postgresql.org; Tue, 14 May 2024 18:11:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s6wcg-000nHX-Rv for pgsql-general@arkaria.postgresql.org; Tue, 14 May 2024 18:11:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6wcg-000nHN-F5 for pgsql-general@lists.postgresql.org; Tue, 14 May 2024 18:11:34 +0000 Received: from mout.gmx.net ([212.227.15.15]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6wcb-000EGd-9f for pgsql-general@lists.postgresql.org; Tue, 14 May 2024 18:11:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net; s=s31663417; t=1715710288; x=1716315088; i=jimis@gmx.net; bh=HEtbchxa9X3yJ2+PfAN38VNx0Rd9zLtQpH7ngSj0gz4=; h=X-UI-Sender-Class:Date:From:To:cc:Subject:In-Reply-To:Message-ID: References:MIME-Version:Content-Type:Content-Transfer-Encoding:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=E3mCsb41ZgoejySO1sy6j4Q1IL0rcJ5Dy2SgsTwydxl6Utmj1ds8BTqJoFymgguD AqrqQ396ZYs2KoUYano8oIgEuKiUeqyOv0KjsUM/pL1g5KNTFnAIQ+QtfaLO0JMjp DQxWdn3BkirGJvnDW98FgC+huG6fbTBIvDgCG4lzUALn0c+xXLPoUClnEIjNGvTAf 5tFNkPEDJZ0LGFskUfs979CjTL/zBA1cVSo55WacV9by19aGrTu3V7GevrdcPtjZW 4EesyMiVv9U6qjcvYBkfK7xIltSgwWUw/OnW/dut8Kshvh2uecqcJcMJeps3NkqlG Dzra561sn5xyVflMkw== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from [10.9.70.35] ([185.55.106.54]) by mail.gmx.net (mrgmx004 [212.227.17.190]) with ESMTPSA (Nemesis) id 1M2wGs-1sAEwe0jDo-0005er; Tue, 14 May 2024 20:11:28 +0200 Date: Tue, 14 May 2024 20:11:26 +0200 (CEST) From: Dimitrios Apostolou To: David Rowley cc: Tom Lane , pgsql-general@lists.postgresql.org Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions In-Reply-To: Message-ID: <46f9dc8a-ff83-d086-f726-9df0a0d39c97@gmx.net> References: <7886a68f-b466-2131-1747-f69f0fb71a37@gmx.net> <69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net> <559b0e40-63e6-fa9a-6b03-d1eba10f30f8@gmx.net> <1629463.1715372568@sss.pgh.pa.us> <6a2f3906-3d7a-6924-7403-8f77d57a18e4@gmx.net> <3efce60b-48c0-160d-0444-474b02f76739@gmx.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Provags-ID: V03:K1:US79BM3PBi/jZcFE0h1r4jAOZ3+voFgV+U1WfYtCtPIUcdCrCYf dnfc8S4nEDa7RiYxMi6xza1nqnQvIsUeboj/BiTwa57npkP8OHzoR7VUpGagtlCdYyjV59r OPTe8FJgwJoPci6NcmAOl523urAAtia82Kz2NaE92bKk4k9tnoR/kv2jzaGhj5PjAuxej7Q v3bUHV9rFFno4UGjGzngg== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:iKgZ/aZU1T0=;Bx4OgOki1tFNXHmkKliSdj8yk3p bRACdGpF6RTCS1Ba/9+i0Rb5d8ukO1l7qH0xD4FjV8Lc1NP8IeeG8ujEElObCLb2a6Hturkgy EGO3Y/u69N1/SWOmHDje8SzRJiSChZRg+HDQr0OGejvuCFYU/KgbLTxZEz6SLPTPsBdATyd3B QpWO5+X/RubtNbVpLEzl09MMb4rvGYT9CyOWse57erF+Hg32EQdwunKHKwoLMPTZu052awL7V jDjIrHoIdZX5ECJtaLztvp7OP3O7AaYfgVCK5YGcZkDznsA5EAWP6x8YFqjwozmfpfH+XIOQo 2BSX8fO0SyipIK0/13mEnaA3hwsos5z5ZrQoB2af9K0wxmiMz4fRFXlJCWYEkp5v5xvNps8Mk JuFag90PEUnY7iIb2zl5fHq4xAqokSWBwvfyVZjhw5vxX70D3LE5J5FHI3VYEbS3zGjNjPLQL kzqcwJouSfX/81vsRMEa5PZtaHwUuRDmGOth6RolOKybwx8USYejMcUHbRLGCtd/sIA1J7ztd 3wLwStEsoPvxbK5k9LiNPFM3MjWd8ZguMAU2pAcB4BDnXbzbU69rua3F8dimegxKIKYLby3Tn X305DcRSCQcM14vZ+gcItElVrx+C5jXD022tboRsERG3TD74bOZNawVFfGa6doY8eujUAd62I JpbkOMjIUeN0se63PI0d5EYuy37CFCM2rw7ssIYNVypbvhxn8DWIu23gMhX8PUSkPh8tQYcqk kxYNuXb62KJ6hs49qSTRK86gQCkPpUWiQgDJ4LEWivJh5xeSC1Xpp/Q9mRkRRuMEpi0Kl2etb zVjKN5NgiXdyDz1TB0sdxEfhKVo+aJKdSPD4tOBZmrJjE= Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 14 May 2024, David Rowley wrote: > > If you were to put the n_distinct_inherited estimate back to 200 and > disable sort, you should see the costs are higher for the index plan. > If that's not the case then there might be a bug. It seems more > likely that due to the n_distinct estimate being so low that the > planner thought that a large enough fraction of the rows needed to be > read and that made the non-index plan appear cheaper. > > I'd be interested in seeing what the costs are for the index plan. I > think the following will give you that (untested): > > alter table test_runs_raw alter column workitem_n set > (n_distinct_inherited=3D200); > analyze test_runs_raw; I had to stop this step because it was taking too long going through all partitions again. But it seems it had the desired effect. > set enable_sort=3D0; > explain SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_= n DESC LIMIT 10; It chooses the non-parallel index plan: Limit (cost=3D365.17..1135517462.36 rows=3D10 width=3D4) -> Unique (cost=3D365.17..22710342308.83 rows=3D200 width=3D4) -> Append (cost=3D365.17..22546660655.46 rows=3D65472661350 wi= dth=3D4) -> Index Only Scan Backward using test_runs_raw__part_max= 20000k_pkey on test_runs_raw__part_max20000k test_runs_raw_1000 (cost=3D0= .12..2.34 rows=3D1 width=3D4) -> Index Only Scan Backward using test_runs_raw__part_max= 19980k_pkey on test_runs_raw__part_max19980k test_runs_raw_999 (cost=3D0.= 12..2.34 rows=3D1 width=3D4) [... only index scans follow] LIMIT 100 goes for the parallel seqscan plan, that even contains a sort! But it seams to me that the extra upper level HashAggregate step raises the cost by an order of magnitude, from 800M to 10G, in comparison to doing (Unique->Sort) - see plan in the next paragraph. Limit (cost=3D10857220388.76..10857220389.01 rows=3D100 width=3D4) -> Sort (cost=3D10857220388.76..10857220389.26 rows=3D200 width=3D4) Sort Key: test_runs_raw.workitem_n DESC -> HashAggregate (cost=3D857220379.12..857220381.12 rows=3D200= width=3D4) Group Key: test_runs_raw.workitem_n -> Gather (cost=3D857220295.12..857220377.12 rows=3D800 = width=3D4) Workers Planned: 4 -> HashAggregate (cost=3D857219295.12..857219297.1= 2 rows=3D200 width=3D4) Group Key: test_runs_raw.workitem_n -> Parallel Append (cost=3D0.00..816295259.2= 1 rows=3D16369614363 width=3D4) -> Parallel Index Only Scan Backward us= ing test_runs_raw__part_max9600k_pkey on test_runs_raw__part_max9600k test= _runs_raw_480 (cost=3D0.57..1597356.30 rows=3D33623360 width=3D4) -> Parallel Index Only Scan Backward us= ing test_runs_raw__part_max10140k_pkey on test_runs_raw__part_max10140k te= st_runs_raw_507 (cost=3D0.57..1210806.37 rows=3D25794030 width=3D4) -> Parallel Seq Scan on test_runs_raw__= part_max9500k test_runs_raw_475 (cost=3D0.00..3037800.88 rows=3D64122388 = width=3D4) -> Parallel Seq Scan on test_runs_raw__= part_max11180k test_runs_raw_559 (cost=3D0.00..2918865.36 rows=3D61611136= width=3D4) [... only seqscans follow] If I re-enable sort, then it goes for the parallel seqscan plan even with = LIMIT 10: SET SESSION enable_sort TO TRUE; EXPLAIN SELECT DISTINCT workitem_n FROM test_runs_raw ORDER BY workitem_n= DESC LIMIT 10; Limit (cost=3D857166256.39..857166256.59 rows=3D10 width=3D4) -> Unique (cost=3D857166256.39..857166260.39 rows=3D200 width=3D4) -> Sort (cost=3D857166256.39..857166258.39 rows=3D800 width=3D= 4) Sort Key: test_runs_raw.workitem_n DESC -> Gather (cost=3D857166135.82..857166217.82 rows=3D800 = width=3D4) Workers Planned: 4 -> HashAggregate (cost=3D857165135.82..857165137.8= 2 rows=3D200 width=3D4) Group Key: test_runs_raw.workitem_n -> Parallel Append (cost=3D0.00..816243567.2= 4 rows=3D16368627432 width=3D4) -> Parallel Index Only Scan Backward us= ing test_runs_raw__part_max9600k_pkey on test_runs_raw__part_max9600k test= _runs_raw_480 (cost=3D0.57..1597356.30 rows=3D33623360 width=3D4) -> Parallel Index Only Scan Backward us= ing test_runs_raw__part_max10140k_pkey on test_runs_raw__part_max10140k te= st_runs_raw_507 (cost=3D0.57..1210806.37 rows=3D25794030 width=3D4) -> Parallel Seq Scan on test_runs_raw__= part_max9500k test_runs_raw_475 (cost=3D0.00..3037800.88 rows=3D64122388 = width=3D4) -> Parallel Seq Scan on test_runs_raw__= part_max11180k test_runs_raw_559 (cost=3D0.00..2918865.36 rows=3D61611136= width=3D4) [... only seqscans follow] So in order of higher to lower cost, we have the following alternatives: 1. non-parallel index scan (800M) 2. parallel seqscan with sort (1.3G) 3. parallel seqscan without sort but actually has a sort (10G assuming it= 's the same as for LIMIT 100) > > -- undo > alter table test_runs_raw alter column workitem_n set (n_distinct_inheri= ted=3D-1); I believe I need to set it to 0 to be back to defaults. > reset enable_sort; > Regards, Dimitris