Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sfKYj-004usJ-AP for pgsql-general@arkaria.postgresql.org; Sat, 17 Aug 2024 14:37:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sfKYf-0040GA-Nt for pgsql-general@arkaria.postgresql.org; Sat, 17 Aug 2024 14:37:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sfKYf-0040G1-9w for pgsql-general@lists.postgresql.org; Sat, 17 Aug 2024 14:37:34 +0000 Received: from mout.gmx.net ([212.227.17.20]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sfKYZ-00092w-5Y for pgsql-general@lists.postgresql.org; Sat, 17 Aug 2024 14:37:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmx.net; s=s31663417; t=1723905446; x=1724510246; i=jimis@gmx.net; bh=sRqp8FJTRN/3gkkS9pyxYLS4/7g076ac7aQfZcdLYug=; h=X-UI-Sender-Class:Date:From:To:Subject:Message-ID:MIME-Version: Content-Type:Content-Transfer-Encoding:cc: content-transfer-encoding:content-type:date:from:message-id: mime-version:reply-to:subject:to; b=PzhDkVTCtzgT5OdbOVat+37bAJfTEiEDQWGNmE2ffKzYZwL/lJj7GiwQOh3+ejER lrI6ktqw1xLeTd8rNfZN7m9wZA+1rGyTvEgSJCtIwBpVV2pijGhW+HYUfKrMn4ORb AMUS8pWzArGIg+yq0iNPYGb8hy7ZKtS4zNafenJqv+uMPQUFflyfl8anIZLqEtPf3 GBQ3XLohaJFn4EtnUjCro6ngi+X32Cxcq6qMaKkZLnU241EnluaCzx3uG6qRpcEGu mLbEErEohjz8q/4Jtii0yuSStKUYOMRf73lMkccg0NRf5NpM2dDs/IZKdkGpGdNd6 L0YfGDtV9Nf1GfZZPQ== X-UI-Sender-Class: 724b4f7f-cbec-4199-ad4e-598c01a50d3a Received: from [192.168.0.96] ([178.232.0.21]) by mail.gmx.net (mrgmx104 [212.227.17.168]) with ESMTPSA (Nemesis) id 1MjS54-1sHWJw2bsI-00pDwn for ; Sat, 17 Aug 2024 16:37:26 +0200 Date: Sat, 17 Aug 2024 16:37:25 +0200 (CEST) From: Dimitrios Apostolou To: pgsql-general@lists.postgresql.org Subject: array_agg() does not stop aggregating according to HAVING clause Message-ID: <215d6efa-bb4b-a76e-6066-7a83ccdb55e3@gmx.net> MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset=US-ASCII X-Provags-ID: V03:K1:xsYi3ICHcUhV9qILZLPbX9bBEiU8djluM3GWqrkdVuKVPUInlMw cGcfL1xCG0EL4ahMt8ARcI0ulXneMaW/DOg6UrtvXXmEdKVsNNXvnWLy6/1sCReZWqlra9d HmypaDiQlcKUqMhQj2e6iiUTQTr6EjKEoYoeUyyS2tGgTY+gl+6xJw8OVjemGm8nPw/B5jX T+Sib2OpVfAvHYHAtAbRQ== X-Spam-Flag: NO UI-OutboundReport: notjunk:1;M01:P0:6fCkWtVl/AY=;nRdiC4fbAK7Pxw30taw8fAoZgr0 ZdvWgJwK7QmClFa87PHyB47cOrhLREhVtitIHaCHQf46Eg+RuLCjN5B+J0t69cMJmAFt6nSyu cWBRAZVXcvnZ31DXZM3i2GuvN1DHHny93inQvqjFVZBhuPOoDvJo/b8zT10KkdOoab8PSoiM4 8/eeVqfZ5rzphW1+jr3HWNmNjreSBge4eEJUrKjIMpBRbX/Jo+uFgWOQ4xQpqsgzkP0awpDDx mfMVdnj8Q2pA6aWK/TKZZ7HMlurvmkHggTcLIo3uQxoZtaoNyIeu1FJ3Y6AAhFt/srdtUG+M8 mRQDCOjPKwnzJDrr8qd/gKtVPgljCBBNJm3n5TSpKLUyG8Xjy3lzBY3mQd6VnzYuFrT4CggmS HsOH/hz984UOpn+vKFKF6rulTKE+SIcVizMnAYtqyk5mvPIDn5qUDyPnBy9DuShoC3lMRac6S JZLFSGFZcdSVcdPUdlkocTrO82peUAPBlR+LyeRGtjferpKijRTlwgLSGM0R6gqLQ25WmOfjN PC4Z648eqj3KwOJFUZE1Su7KvHFSTElqA7ePX3SOjTGrsmyo8XlYfUPWqHY0T/a36WRlxSoHs nneuI0xPJ1wGdUhwksxmTb49yAK8afdPL+TSu9YQBGuh+Fu5h4KYiN4CrYOMT7LTEBmlKkfGU KNO5Aq2xt6/w3oQMyRTFT3HrA2kvWaaAPvYXHoEAezi/x7dbQJIHFyYw4rO2zYhHg6Zktk0+z BnWg33UFLAnSVT1Wx96UJ05UCs49lLvo/aigOOELLchuxuWwTwFOR1YlB5Frj6dKdBtH7x4lm V6NicR0sahAQQeDpUvBTD6/A== Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello list, I have a query that goes through *billions* of rows and for the columns that have an infrequent "datatag" (HAVING count(test_datatag_n)<10) it selects all the IDs of the entries (array_agg(run_n)). Here is the full query: INSERT INTO infrequent_datatags_in_this_chunk SELECT datatag, datatags.datatag_n, array_agg(run_n) FROM runs_raw JOIN datatags USING(datatag_n) WHERE workitem_n >=3D 295 AND workitem_n < 714218 AND datatag IS NOT NULL GROUP BY datatags.datatag_n HAVING count(datatag_n) < 10 AND count(datatag_n) > 0 -- Not really needed because of the JOI= N above ; The runs_raw table has run_n as the primary key id, and an index on workitem_n. The datatags table is a key value store with datatag_n as primary key. The problem is that this is extremely slow (5 hours), most likely because it creates tens of gigabytes of temporary files as I see in the logs. I suspect that it is writing to disk the array_agg(run_n) of all entries and not only those HAVING count(datatag_n)<10. (I might be wrong though, as this is only an assumption based on the amount of data written; I don't know of any way to examine the temporary files written). While this query is going through billions of rows, the ones with infrequent datatags are maybe 10M. How do I tell postgres to stop aggregating when count>=3D10? Thank you in advance, Dimitris