Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vSXJB-00HSMi-0v for pgsql-hackers@arkaria.postgresql.org; Mon, 08 Dec 2025 09:13:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vSXJ9-00FvlF-2A for pgsql-hackers@arkaria.postgresql.org; Mon, 08 Dec 2025 09:13:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vSXJ9-00Fvl6-0w for pgsql-hackers@lists.postgresql.org; Mon, 08 Dec 2025 09:13:27 +0000 Received: from gmmr-4.centrum.cz ([2a00:da80:1:502::8]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vSXJ7-003oq9-1h for pgsql-hackers@lists.postgresql.org; Mon, 08 Dec 2025 09:13:27 +0000 Received: from gmmr-4.centrum.cz (localhost [127.0.0.1]) by gmmr-4.centrum.cz (Postfix) with ESMTP id E94E365C56 for ; Mon, 8 Dec 2025 10:13:22 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=atlas.cz; s=mail; t=1765185202; bh=MvxFwmZtRd+ZRPpP+qp7KhgL3+OhKRsGBOiGKqVUbnw=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=lIYXbOGF0suJrlDwrSIwMuvG1WDlOkdp2tvNJ+noaSA9H1fEZDFSwc6bqJtENePMa c1wBf+O0a3tzGvZWOuUnPr3NR6WbqE5RiWw1RA+taR0N80D2jy30otlLGr/a6AVGOc WO5WnffxN/K4JUg8qXVWD075G3ZKarn9g3iS+BDs= Received: from antispam69.centrum.cz (antispam69.cent [10.30.208.69]) by gmmr-4.centrum.cz (Postfix) with ESMTP id E72272015411 for ; Mon, 8 Dec 2025 10:13:22 +0100 (CET) X-CSE-ConnectionGUID: 2ykwnj8QQ7C/GEGI8ocRvg== X-CSE-MsgGUID: aWv0rXYnQ2+lxfx9LKK+xg== X-ThreatScanner-Verdict: Negative X-IPAS-Result: =?us-ascii?q?A2GSGQAvlTZp/0vj/y5QAgiBCQmLTJF0A4NVZoc8hjWKG?= =?us-ascii?q?YNUDwEBAQEBAQEBAQlRBAEBhQcCjGgnOBMBAgQBAQEBAwIDAQEBAQEBAQEBA?= =?us-ascii?q?QELAQEGAQEBAQEBBgYBAoEdhglTgluEBwEBAQECASMPAQVBBQsLDQsCAiYCA?= =?us-ascii?q?lYGEQQBAYJ+gjoBEiexUnqBMhoCZdx1AoElH4FtgR0uhW6CShsBc4lzNoFVR?= =?us-ascii?q?IE8C4J4PoQWBoQCgmkEgg0VgQ6GH4tLh1xSeBwDWSwBVRMXCwcFgSBDA4ELI?= =?us-ascii?q?0sFLR2BIyEdFxNgVECDSRgGaA8GgREZSQICAgUCQDqBZgYcBhwSAgMBAgI6V?= =?us-ascii?q?QyBdwICBIIYfoIKD4dFgQgFLoEsAgELbT03CQsbBj2XIGECPFITEiY0MUAIM?= =?us-ascii?q?CcXPJJrgyyvbDQHhB+BXgYMgl6dDgYPBC+XZhcDkUWBJpkGozwsCIVQgX+Bf?= =?us-ascii?q?zMag1tRGY48Fsh/SWoCBwEKAQEDCZIegUsBAQ?= IronPort-PHdr: A9a23:/L2cahQ6UBQmpZCBAXYtfPkUrNpsosSZAWYlg6HPa5pwe6iut67vI FbYra00ygOSAcODs7kd1bCJ++C4ACpcus3H6CtDOLV3FDY7yuwu1zQ6B8CEDUCpZNXLVAcdW OlkahpO0kr/D3JoHt3jbUbZuHy44G1aMBz+MQ1oOra9QdaK3Iy42O+o5pLcfRhDiiajbrNuN hW2qhjautULjYd4Jas8xBTErmFUd+hKym9kOFOekwj+68yt+JJv7zlct+gu+sNOX6j2Zbg0Q LJdAD87L207+crnuAXMQgeI+nYSSHgdnABSAwjC4hH6X5PxvTX/u+FgxiSUPtD4QK4tVzi64 KllUBDnhD4ZOjA27W/bksN/gr5VrxKnvBx/3pXbYJqJO/pkf6PSZs0VSGxcVctMTSNBA4e8b 4wSD+odOOtTsofwq0cSoRa4GAKhAv7kxD1ViX/sxaA00OovHg7I0wI9AdwDrnrUotX3O6gOX u6417XIwDbZYv9KxTvx9orFfxY8qv+MR7Jwds/RxFEhGQzfkFWQqJHqNC6Q2OQJsmib7PBvX v+ohmE6sQ5xvDmvydo2honKgoIVy1PE+j9izYkvOd24TFR0YcW+HJRMtiCXLo17Sd4tTG90o ig10KEGuYKlcygQzpQq3xDSZviGfoWU4x/uVuScLCt7iX9qeLyzmwi+/FSgxOD8VMS63lZHo Cpbn9fMuX4A1hje5MiIRPdg40qsxDaC2g7V5+pZIk40jbLWJpEgz7IqiJYesUTOEjXolEnoj qKabEcp9+iw5+njbbjqvIKQO5Jqhg3kMKkihtazDOQ3PwUIQmOV4/6z1Kf58k38WLhKi/o2n bTHv53CPsQbo7K5AxdS0oY+9xazFzem38ocnXkANF9FdxeHg5HyO1HLPPD4Feqzj0m2nDh32 f/GOKHhAo/QLnjGirvuYbd851ZdyAo1099f+4pZB74ALf7pREP8tNzVAgUnPwCqwevrEtdw2 4MGVWKKGKCZMafSsVGS5uIoJumBfJIatyzjK/gk+vHil3w5mVscfamqx5Qac2q3HuhiI0qDe nrgmMsOEGYSsgokUOPqkEGCUSJUZ3uqRa084i87CIK4AofZXICinqeM0z2/Hp1IZmBKEFaME XDxe4WFQ/cAciWSItVukjAcT7iuV5ch1Q2ytA/907doNOrU9zYEupPjz9d6+ffTlQ0s+jNoF ciSyX2NQ3lunmwUXz82wLx/oUtlx1eB0Kh3mfpYGcZO5/NPUwc6K4DTz/ZhC9/oXgLBf8uJS Va9T9WpBDE+VNMxw9sUb0lhHNWiiwjP3zC2DL8Ni7yLGJs0/7rZ33jwJ8Zy0XfG1K0nj1knQ ctAL2mmibR/9gfNGo7EiEqZl6OweqQd2C7B7nuMzW2LvE1ASg5/Tb3FXWwDZkvRtdn5/kzCT 6S3CbQjKwdBydSNJbdJatL3l1VGX+zuONDfY2Kqh2ewHgyExquLbIrxZ2Ud0z/dB1YfkwAK+ XaKLw8+Bj2uo27GFjxhCUrvY1/w8el5sH67VEg0wBqRYkxu1ru1+xoVhfiGRP8J2LILpjshp y1yHFmmx93WEcaPpxJ9fKVAZtMw+FZK2XrEuAx+P5ygKrhviEYRfQt5oU3gywl6BIMT2fQt+ dou0gNoLuq43UlbeimRlcT5PqPXJ3Po8Timb7DWnFfZldeOvKAV56JrhU/kuVSFH1Qj/z1E1 9hW2jPI6pzUDQZUUpbwUU8f/gR+rqqcaTtrtNCc7mFlLaTh6myK4NkuHuZwkn6d IronPort-Data: A9a23:+gLsk65p25JMsAVn5LN/9gxRtADGchMFZxGqfqrLsTDasY5as4F+v jdLCGiBOaqPMDD0KIt2Pdu18BkOsMKAytE3GQE6rSsxZn8b8sCt6fZ1j6vT04F+CuWZESqLO u1HMoGowPgcFyGa/lH3dOG49xGQ7InQLpLkEunIJyttcgFtTSYlmHpLlvUw6mJSqYHR7zil5 5Wo+6UzBHf/g2QpazhMu/rawP9SlK2aVA0w7wFWic9j4we2e0k9VPo3Oay3Jn3kdYhYdsbSb /rD1ryw4lTC9B4rDN6/+p6jGqHdauO60aCm0xK6aoD66vRwjnVaPpUTaJLwXXxqZwChxLid/ v0W7MDtFl15VkH7sL91vxFwS0mSNEDdkVPNCSDXXce7lyUqf5ZwqhnH4Y5f0YAwo45K7W9yG fMwciEOcRaB19uPn5n8ZcRDp5QkJ/P1M9ZK0p1g5Wmx4fcOTpWGWKDW/YYBmjw9gNxUAPOYb NhxhThHMEqGOUASfA1NV9RhwY9EhVGmG9FcgF6cu6s7pWvXzQ94+LHxNNPOPNeYLSlQth/B/ zKXoTykXnn2MvSW9iTd6EKdgNaV3ifrVpJDMJ6bte5l1Qj7Kms7TUd+uUGAifC9lE6zQchSA 0UQ5Cpoq6939VTtScPyNzWxoFaPrxQRQcZdGKsz9GmlyKvd5UCEAGkaQyRaYdgnqN4eSjkj1 1vPlNTsbRRqsb2cSHab97SdqTqjNDU9ImoLZCtCRgwAi/H7r5oqilfPR91gHbSuptn0Ai3rh SiNtzQxnbgIlMpN0L+0lXjdhDiEtorISgo8oAPaQgqYAhhRONDjPdH1rwKBsrAfce51U2W8g ZTNoODGhMhmMH1HvHblrDkldF1x28u4DQ== IronPort-HdrOrdr: A9a23:Q//CiKwcJ6cAOvFCtyCpKrPwL71zdoMgy1knxilNoNJuEvBw5P re/sjzsiWE7gr5OUtQ/uxoV5PsfZqxz+8R3WBVB8bHYOCEggeVxeNZh7cKqgeIc0bDH6xmpM RdmsNFZuEYY2IasS+32maFL+o= X-Talos-CUID: =?us-ascii?q?9a23=3A+aIFUmrLgXJKXjVGM+Wm1TrmUZo0cWLbzifpGhC?= =?us-ascii?q?TTklEZ5zFTlSq5qwxxg=3D=3D?= X-Talos-MUID: 9a23:xnXvyQmJiXtC8MjHzBGMdnprC9kw+ZSfDnlX0pEWtdnVEQ9TeCeS2WE= X-IronPort-Anti-Spam-Filtered: true X-IronPort-AV: E=Sophos;i="6.20,258,1758578400"; d="scan'208";a="111896054" Received: from unknown (HELO gm-smtp11.centrum.cz) ([46.255.227.75]) by antispam69.centrum.cz with ESMTP; 08 Dec 2025 10:13:22 +0100 Received: from [192.168.0.43] (ip-94-112-70-54.bb.vodafone.cz [94.112.70.54]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by gm-smtp11.centrum.cz (Postfix) with ESMTPSA id 651A6100AE114; Mon, 8 Dec 2025 10:13:22 +0100 (CET) Message-ID: <0ac6540e-c9f7-4918-913b-21288f6436cf@atlas.cz> Date: Mon, 8 Dec 2025 10:13:20 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Adding REPACK [concurrently] To: =?UTF-8?Q?=C3=81lvaro_Herrera?= Cc: ah@cybertec.at, jian.universality@gmail.com, pgsql-hackers@lists.postgresql.org, mihailnikalayeu@gmail.com, rob@xzilla.net References: <202512041531.4yz4szwnfsqk@alvherre.pgsql> Content-Language: cs From: David Klika In-Reply-To: <202512041531.4yz4szwnfsqk@alvherre.pgsql> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello Alvaro Thank you for the detailed analysis. Dne 04.12.2025 v 16:43 Álvaro Herrera napsal(a): > Hello David, > > Thanks for your interest in this. > > On 2025-Dec-04, David Klika wrote: > >> Let's consider a large table where 80% blocks are fine (filled enough by >> live tuples). The table could be scanned from the beginning (left side) to >> identify "not enough filled" blocks and also from the end (right side) to >> process live tuples by moving them to the blocks identified by the left side >> scan. The work is over when both scan reaches the same position. > If you only have a small number of pages that have this problem, then > you don't actually need to do anything -- the pages will be marked free > by regular vacuuming, and future inserts or updates can make use of > those pages. It's not a problem to have a small number of pages in > empty state for some time. > > So if you're trying to do this, the number of problematic pages must be > large. I agree, I had in mind about 20-40% of the table that could have tenths of GB. > Now, the issue with what you propose is that you need to make either the > old tuples or the new tuples visible to concurrent transactions. If at > any point they are both visible, or none of them is visible, then you > have potentially corrupted the results that would be obtained by a query > that's scanning the table and halfway through. When performing a tuple movement from a (right) page to a (left) page, both of pages must be hold in shared buffers. I suppose the other processes scanning the table also access the table data through the shared buffers so the movement could be handled at this level. If the tuple movement does not change its xid, it wouldn't even have to be in conflict with other transactions that locked/modified the tuple (in buffer cache again, just changing the physical location). Looks like something dirty... > The other point is that you need to keep indexes updated. That is, you > need to make the indexes point to both the old and new, until you remove > the old tuples from the table, then remove those index pointers. > This process bloats the indexes, which is not insignificant, considering > that the number of tuples to process is large. If there are several > indexes, this makes your process take even longer. > > You can fix the concurrency problem by holding a lock on the table that > ensures nobody is reading the table until you've finished. But we don't > want to have to hold such a lock for long! And we already established > that the number of pages to check is large, which means you're going to > work for a long time. > So, I'm not really sure that it's practical to implement what you > suggest. I agree. Proposed tuple shuffle might work better compared to the current VACUUM FULL (i.e. blocking non-clustered maintenance) but I understand that you prefer an universal method of data files maintenance (the concurrent variant will be amazing). Regards David