Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6UzT-00CMGr-Ax for pgsql-general@arkaria.postgresql.org; Mon, 13 May 2024 12:41:15 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1s6UzR-000nVS-In for pgsql-general@arkaria.postgresql.org; Mon, 13 May 2024 12:41:13 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6UzR-000nVK-7o for pgsql-general@lists.postgresql.org; Mon, 13 May 2024 12:41:13 +0000 Received: from mout.gmx.net ([212.227.15.15]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1s6UzO-000oCq-Ig for pgsql-general@lists.postgresql.org; Mon, 13 May 2024 12:41:12 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net; s=s31663417; t=1715604067; x=1716208867; i=jimis@gmx.net; bh=K2zi6CmL/2itM7LXWU9WhfDNWs79v2jIjjyHn3lRJ+o=; h=X-UI-Sender-Class:Date:From:To:cc:Subject:In-Reply-To:Message-ID: References:MIME-Version:Content-Type:Content-Transfer-Encoding:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=m/UEmwSPJfzE8Dp5dDlre1qBy7EC83uGbrku6eFB7F6/NUza9Jg6naPm0mxIYt+1 RWZb5Ty4A5XO1gtYcPZTvja27io2Me43HG0VXKbKX2rXGuqLOwhHI5YT6nVdoPPpu RRWyWsU61llkY5qqDLW5T6YFMrisc+VRJ4VZ5nZ+ZVHmFnOrEtObL3bNXVvCsZ5Pj QGrVUndU9Ea3SzJOb50X5ZKEnoYnKlz+TcQ/++5fJYKsGfNVcIXxXzaoOCV7Li1ja CN4KjRb+d6VTEuQQA1O0J9f5xGpn2ZlLesIjGt/nHWaXujTvCcLR0FTTzqFt3UXxR caxOmVMTAikypFyFfg== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from [10.9.70.35] ([185.55.106.54]) by mail.gmx.net (mrgmx005 [212.227.17.190]) with ESMTPSA (Nemesis) id 1Mi2Jn-1skWm2085x-00e1WQ; Mon, 13 May 2024 14:41:07 +0200 Date: Mon, 13 May 2024 14:41:05 +0200 (CEST) From: Dimitrios Apostolou To: David Rowley cc: Tom Lane , pgsql-general@lists.postgresql.org Subject: Re: SELECT DISTINCT chooses parallel seqscan instead of indexscan on huge table with 1000 partitions In-Reply-To: Message-ID: <3efce60b-48c0-160d-0444-474b02f76739@gmx.net> References: <7886a68f-b466-2131-1747-f69f0fb71a37@gmx.net> <69077f15-4125-2d63-733f-21ce6eac4f01@gmx.net> <559b0e40-63e6-fa9a-6b03-d1eba10f30f8@gmx.net> <1629463.1715372568@sss.pgh.pa.us> <6a2f3906-3d7a-6924-7403-8f77d57a18e4@gmx.net> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed X-Provags-ID: V03:K1:p1XDpvcmoE9PyG67vL6q7XfpG6CB1gDX1rJrlGAEMAsBZlAQaaL EjJE3ANsvHv3LOk+G9JBf5oa76MmV9aA8jVW0a8tzcWJaryhcYB+IOxgq9UVB9zJxCB0KbA 30ktmXg3pguf0ms5/u981ItceG7KA5OKvXJPVOSOXaKilWCeMn2FUINFfdFAukeHljtrStU AIgiWksywjAIuTmQ27BMg== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:WEg81dUSzAs=;KswOlTi4oO/F9IFBooESJJyWkMn CslJuIwjkHBaovdy4/iR8w/lZCnMrHWnGNmDPmyBbf2XTlEbwoF4RvTfp+SM0gDchUPLPJN8C P2MAUR+NBZmVCUhONv1VOZYNVfycjiZVeueCkUw+T0OwoQenvYvH5R1FDie87UlBKehjspdF6 D+yhXFnhyKLgUpiid1pNKe4MNEk+MIm3Fn7FGlFu+z2HhB60FmXOQbS78jbPm3CQip1opnNx/ IT9exn+myJkjFtpyFbmri8O7qRYukvBIIXSvbQhQJaDimAMUiI6rU/riI/k8UN4irP9Z50PzM jlU25Of3b3R8WgRRNlxopJEEjL2QTrgETZjS0lVFKQ0lZHD6ZoHy5e+0d866I33Jz6p5RykLT qUKHJN08cwaZIEnCQaCXhfUWOaJXMvnWjGKGo/Hw1twDgIyYo+LGBi/cJCqrt1kfW37wSWkTJ 8DaZz5JPZ6D8+OpIVDOXCFQGyVLqyYjmDM58RmIGzdFZghTgJLMn42VnD/UjqsQfU9T+GwX3g zo1/a0lcerkU3piKclH47KtANtTHmgsWl00h7L2fdP4oID63h2Ucu6VgokjGN8lzSTCo1vdFC GtspBg27d+5M0eyxb0/HZrlh9+u6CgFYuzB3DGtMG+zhSzxVkIhg9r5Ks8CqOcaiwotuWrEU3 SGPULkzMTRgIjOw3HaJQgNgHX5PAepVrarK5VMcoyOO0LbMum9epUI0aeD/9HxmWu3Knkp9Sa 9bUpRSRGgXg0Kfa/+20MIWO+2f20g7gjaDqDvKhpwiTdlt5lrbSIaT55C6NYdOGGjbEkymhNJ Od2pLHW55A6by5ynvr600Yk8LOz4677uPNJiIFeoZHGhE= Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, 11 May 2024, David Rowley wrote: > On Sat, 11 May 2024 at 13:11, Dimitrios Apostolou wrote: >> Indeed that's an awful estimate, the table has more than 1M of unique >> values in that column. Looking into pg_stat_user_tables, I can't see th= e >> partitions having been vacuum'd or analyzed at all. I think they should >> have been auto-analyzed, since they get a ton of INSERTs >> (no deletes/updates though) and I have the default autovacuum settings. >> Could it be that autovacuum starts, but never >> finishes? I can't find something in the logs. > > It's not the partitions getting analyzed you need to worry about for > an ndistinct estimate on the partitioned table. It's auto-analyze or > ANALYZE on the partitioned table itself that you should care about. > > If you look at [1], it says "Tuples changed in partitions and > inheritance children do not trigger analyze on the parent table." Thanks > >> In any case, even after the planner decides to execute the terrible pla= n >> with the parallel seqscans, why doesn't it finish right when it finds 1= 0 >> distinct values? > > It will. It's just that Sorting requires fetching everything from its su= bnode. Isn't it plain wrong to have a sort step in the plan than? The different partitions contain different value ranges with no overlap, and the last query I posted doesn't even contain an ORDER BY clause, just a DISTINCT clause on an indexed column. Even with bad estimates, even with seq scan instead of index scan, the plan should be such that it concludes all parallel work as soon as it finds the 10 distinct values. And this is actually achieved if I disable parallel plans. Could it be a bug in the parallel plan generation? Dimitris