Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1q3K-008tD9-GY for pgsql-hackers@arkaria.postgresql.org; Mon, 07 Apr 2025 17:14:30 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u1q3I-001cRk-HL for pgsql-hackers@arkaria.postgresql.org; Mon, 07 Apr 2025 17:14:28 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u1q3I-001cRc-7M for pgsql-hackers@lists.postgresql.org; Mon, 07 Apr 2025 17:14:28 +0000 Received: from relay6-d.mail.gandi.net ([217.70.183.198]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1u1q3G-003VgD-12 for pgsql-hackers@postgresql.org; Mon, 07 Apr 2025 17:14:27 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id C7E0E42E79; Mon, 7 Apr 2025 17:14:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1744046063; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=05weec6TYKWume6vo6RCXJeu7loNvmvIcEyJhbWMl8w=; b=VJoeA2mEoHS2bqQnGasg0VGkZF3FfTXu4FdC6zvP6h0okdN+JzoUIJbte0LlfQJ/hp4jeL OE2DtFq1Oqf4PYKnN8sCGZJH4QX5mv24Z70NOfiKCrA+LKCun+K+AgrIPL1FE+b6/WtDS9 qFZP3UFbxH8vYEAdLfxrmQ529LcKp0eWwpFhvdjX0QuebMp8yylk7Qt+M0JwAna/h0Dx7Q iIHDQrqZ8y488wjAU8K2oqodlwhcxNHLuD92fZHh+e5EWOeLeZhQTA5yeiEzZeWABhHBSB 6qoAdBfvzIVCbm+Ca6q/0Dwq24jtdBExvZoe7ZQRF18l4xLfaxkSuby87pgKJA== Message-ID: <3ff847de-de78-4933-8566-78335da29883@vondra.me> Date: Mon, 7 Apr 2025 19:14:21 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Improve monitoring of shared memory allocations To: Rahila Syed Cc: Nazir Bilal Yavuz , Andres Freund , PostgreSQL-development References: <3e40eeec-d8bf-4496-854e-485dd901f6a2@vondra.me> <6bf7194e-4c34-4e6d-8215-f6acf8903974@vondra.me> <6705dbd2-060b-4f3c-9fcb-1c7f10880b26@vondra.me> <66283922-0448-44e7-bbdd-38aaf2be34b9@vondra.me> <7987a0b5-9933-457a-b5ba-d28e9cce883b@vondra.me> <4e9a124d-29cb-4fd5-99fb-a5016e508f20@vondra.me> Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-GND-State: clean X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvtddtjeejucetufdoteggodetrfdotffvucfrrhhofhhilhgvmecuifetpfffkfdpucggtfgfnhhsuhgsshgtrhhisggvnecuuegrihhlohhuthemuceftddunecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfevfhfhjggtgfesthekredttddvjeenucfhrhhomhepvfhomhgrshcugghonhgurhgruceothhomhgrshesvhhonhgurhgrrdhmvgeqnecuggftrfgrthhtvghrnheptedvfefgveffgfeuhfeuueeuteegffetleevueeivdevhfefhfehteejgedutedtnecuffhomhgrihhnpehpohhsthhgrhgvshhqlhdrohhrghenucfkphepkeeirdegledrvddvledrudelfeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeekiedrgeelrddvvdelrdduleefpdhhvghloheplgdutddrudefjedrtddrvdgnpdhmrghilhhfrhhomhepthhomhgrshesvhhonhgurhgrrdhmvgdpnhgspghrtghpthhtohepgedprhgtphhtthhopehrrghhihhlrghshigvugeltdesghhmrghilhdrtghomhdprhgtphhtthhopegshigrvhhuiiekudesghhmrghilhdrtghomhdprhgtphhtthhopegrnhgurhgvshesrghnrghrrgiivghlrdguvgdprhgtphhtthhopehpghhsqhhlqdhhrggtkhgvrhhssehpohhsthhgrhgvshhqlhdrohhrgh X-GND-Sasl: tomas@vondra.me List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, I'm sorry, but I'm afraid this won't make it into PG18 :-( AFAICS the updated patch is correct / not buggy for non-shared hash tables, but considering I missed a pretty serious flaw before pushing the original patch, and that the tests didn't catch that either ... The risk/benefit is not really worth it for PG18. I think this needs a bit more work/testing. The dynahash is a bit too critical piece, I don't want to be reverting a patch a second time. Sorry, I realize it's not a great outcome ... On 4/4/25 17:45, Rahila Syed wrote: > Hi, > > Analysis of the Bug id/3d36f787-8ba5-4aa2-abb1-7ade129a7b0e%40vondra.me> in 0002 reported by > David Rowley : > The 0001* patch allocates memory for the hash header, directory, segments, > and elements collectively for both shared and non-shared hash tables. While >  this approach works well for shared hash tables, it presents an issue > for non- > shared hash tables. Specifically, during the expand_table() process, > non-shared hash tables may reallocate a new directory and free the old one. > Since the directory's memory is no longer allocated individually, it > cannot be freed > separately. This results in the following error: > > ERROR: pfree called with invalid pointer 0x60a15edc44e0 (header > 0x0000002000000008) > > These allocations are done together to improve reporting of shared > memory allocations > for shared hash tables. Similar change is done for non-shared hash > tables only to maintain > consistency since hash_create code is shared between both types of hash > tables. > Thanks, that matches my conclusions after looking into this. > One solution could be separating allocation of directory from rest of > the allocations for  > the non-shared hash tables, but this approach would undermine the > purpose of > doing the change for a non-shared hash table. > I don't follow. Why would that undermine the change for non-shared hash tables? Why should this affect non-shared hash tables at all? The goal was to improve accounting for shared hash tables. > A better/safer solution would be to do this change only for shared hash > tables and > exclude the non-shared hash tables. > > I believe it's acceptable to allocate everything in a single block > provided we are not trying > to individually free any of these, which we do only for the directory > pointer in dir_realloc. > Additional segment allocation goes through seg_alloc and element > allocations through > element_alloc, which do not free existing chunks but instead allocate > new ones with > pointers in existing directories and segments. > Thus, as long as we don't reallocate the directory, which we avoid in > the case of shared > hash tables, it should be safe to proceed with this change. > Right, I believe that's correct. But I admit I find the current dynahash code a bit confusing, especially in how it mixes handling of non-shared and shared hash tables. (Not the fault of this patch, ofc.) > Please find attached the patch which removes the changes for non-shared > hash tables > and keeps them for shared hash tables. > > I tested this by running make-check, make-check world and the reproducer > script shared > by David. I also ran pgbench to test creation and expansion of some of the > shared hash tables. > Thanks. I'm afraid the existing regression tests can't tell us much, considering it didn't fail once with the reverted patch :-( I did check the coverage in: https://coverage.postgresql.org/src/backend/utils/hash/dynahash.c.gcov.html and sure enough, dir_realloc() is not executed once. And there's a couple more pieces that have the same issue (e.g. hash_freeze or parts of get_hash_entry). I realize that's not the fault of this patch, but would you be willing to improve that? It'd certainly make me less concerned if we improved this first. regards -- Tomas Vondra