Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s5bGQ-006Zi9-IN for pgsql-general@arkaria.postgresql.org; Sat, 11 May 2024 01:11:02 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s5bGN-002N1S-P6 for pgsql-general@arkaria.postgresql.org; Sat, 11 May 2024 01:11:00 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s5bGN-002N1I-E8 for pgsql-general@lists.postgresql.org; Sat, 11 May 2024 01:10:59 +0000 Received: from mout.gmx.net ([212.227.17.22]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s5bGJ-000Psp-8x for pgsql-general@lists.postgresql.org; Sat, 11 May 2024 01:10:58 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net; s=s31663417; t=1715389852; x=1715994652; i=jimis@gmx.net; bh=k9qdnJnZOLZOmI8zqJC+4sYEF6RkVRoFXRWwfp6mlu8=; h=X-UI-Sender-Class:Date:From:To:cc:Subject:In-Reply-To:Message-ID: References:MIME-Version:Content-Type:Content-Transfer-Encoding:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=HkIutP2/gx1RQOOZEYLPIJHlSsTwLKo57/k9zMLCNkCF/IEGIbTT11v3aYw8tyYr Xrk018U7NRJT7W1vsCaUCDozrB6G9gDemGHWZJalCr0jrdIVFq0MGaawxP3usne+U HgklOQzPMhNHQ2RMRlE4R7kYPYsNjBK10q3QMwxlT+RuN8FIOzt30xvNtKiEerbXS Zsx/FHYkZXGD5cosECwxh/26PfG0sZAzNpqunUmaq9FNb7k6ZRFuA1mRh8F6GefV8 8Gh9MTwaNd5o7+Cfy3Ojn1DKLPo2//eSGPh4t/AdfIgMx2DDbCjueqHNmrFWIJa13 ++ICrZ60ugBUKkf6EA== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from [192.168.0.83] ([178.232.0.21]) by mail.gmx.net (mrgmx105 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MybGX-1sommn0KdR-00yypk; Sat, 11 May 2024 03:10:52 +0200 Date: Sat, 11 May 2024 03:10:50 +0200 (CEST) From: Dimitrios Apostolou To: Tom Lane cc: pgsql-general@lists.postgresql.org Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions In-Reply-To: <1629463.1715372568@sss.pgh.pa.us> Message-ID: <6a2f3906-3d7a-6924-7403-8f77d57a18e4@gmx.net> References: <7886a68f-b466-2131-1747-f69f0fb71a37@gmx.net> <69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net> <559b0e40-63e6-fa9a-6b03-d1eba10f30f8@gmx.net> <1629463.1715372568@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Provags-ID: V03:K1:hj9ZPkogCzuqPBqSTWelHcLxiLx8eoQDvFjvVa2dZfTixuE328L xDHReYTbrDdd8NtmhpGvoQyDiUyOzK38+lVHJEvoFbPWn0gttkeZal2WJvaLgUetLfxEknS 5JZrBBRrD0QqhSsT51gfD6ppSHsAI1DgV3Z31acw2zk0vu3lwLXIKPQmgAw0jEe/+cvhWnK 2ycIx3xA7BEhrVft+wrJg== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:RQ2rb0MNo3c=;EiJTYYC+jtz1/3s9xKH4Z6W3eSe iFlCWjD+GqpD7+uNQ+qV1nW/jGxzvU2Ri63Ujxut8tdjdglCOS9bXY25CVa0G7aJt3sazSDbb pxw+oFplW6DYhaLW6Cfl52GiMF4JW4PLg5hXfCoNnf2/Ti3ow+4OfWXYGNtKbmWZ+mcSChG6K 5xmgNj6WasHcRzm0+vcbGVWkzdQGdKSaVamvg9Mq7Vw5SUW0aQJxYDYoW+Kyvk2IHP4AVGxXj KdU3b+iI5Gvkc79tGywAbjAWnATZxWaIddLlrOhBFG5MLJ+p0UiZwM6pUNWmvVZ+Lnf2H8ZdL xktT1lT7NlDEzRM77tIjY3Ci89tx5jCDvn7x+gtDZTo31zxx792OFAFUXrEvxzEww2ijVE8y/ XuzVcumzO/3uVKcrxfLpKaLp3exkBJOGOAThEQK+0N+ztXtkOURlMqiSVkNmux93IdgW4RD9d 0DuT2Jg6Pg3IOrWw2zJydu8hYMftQwUu9YDTxo76p8hSFJnRCUk9oSu9evCfYSV5fJHPe7Kim D9sBb3xoTyROPHU4LqT0CYcC2h7qHtVjEfkKtc8OwiklXJNHGKID243cFDkeuCZf4NlEouyOG V9JvBFEPzJ+osXTcy13w4Z0tnaaJdAzScspoCR2HnuUFr2WJ2JOjjrsjJaM2KBDj8QRSPwagS EyvkIajr49y7I8Ktr2lTWTK/RJhtbe33D49nmIWwjA15pjjhXe4vpTeF49CdqD++3pl56FwA2 S7YSxxheM7sfs0qugLXG7fNNLmPMv3iGM/BpL9N/HIWtQq1QzPLwvJOtUcbIQMGgY8FvW/HJr MlzPcu5Je6KZxtAm7ZAGhGFoyOxtZxpJ4TugFycWkE1JI= Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 10 May 2024, Tom Lane wrote: > Dimitrios Apostolou writes: >> Further digging into this simple query, if I force the non-parallel pla= n >> by setting max_parallel_workers_per_gather TO 0, I see that the query >> planner comes up with a cost much higher: > >> Limit (cost=3D363.84..1134528847.47 rows=3D10 width=3D4) >> -> Unique (cost=3D363.84..22690570036.41 rows=3D200 width=3D4) >> -> Append (cost=3D363.84..22527480551.58 rows=3D65235793929= width=3D4) >> ... > >> The total cost on the 1st line (cost=3D363.84..1134528847.47) has a muc= h >> higher upper limit than the total cost when >> max_parallel_workers_per_gather is 4 (cost=3D853891608.79..853891608.99= ). >> This explains the planner's choice. But I wonder why the cost estimatio= n >> is so far away from reality. > > I'd say the blame lies with that (probably-default) estimate of > just 200 distinct rows. That means the planner expects to have > to read about 5% (10/200) of the tables to get the result, and > that's making fast-start plans look bad. Indeed that's an awful estimate, the table has more than 1M of unique values in that column. Looking into pg_stat_user_tables, I can't see the partitions having been vacuum'd or analyzed at all. I think they should have been auto-analyzed, since they get a ton of INSERTs (no deletes/updates though) and I have the default autovacuum settings. Could it be that autovacuum starts, but never finishes? I can't find something in the logs. In any case, even after the planner decides to execute the terrible plan with the parallel seqscans, why doesn't it finish right when it finds 10 distinct values? > > Possibly an explicit ANALYZE on the partitioned table would help. Thanks, I'll save the ANALYZE as the last step; I feel it's a good opportunity to figure out more details about how postgres works. Plus I expect ANALYZE to last a couple of days, so I should first find quiet time for that. :-) Dimitris