Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thypF-009VQZ-Tw for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Feb 2025 22:33:54 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1thypD-001Pcg-Ri for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Feb 2025 22:33:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thypD-001PcY-Em for pgsql-hackers@lists.postgresql.org; Tue, 11 Feb 2025 22:33:52 +0000 Received: from relay9-d.mail.gandi.net ([2001:4b98:dc4:8::229]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1thypB-000Kee-2F for pgsql-hackers@postgresql.org; Tue, 11 Feb 2025 22:33:51 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 8312343291; Tue, 11 Feb 2025 22:33:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1739313227; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=8m0cdTnT4ra34erEVGDWEjqN7dsFKLiiEwy1GSRqux4=; b=nx0WNWKnSBwzLPsRs5HFzb9L3O1qn/COt0DgesLfHWIpErccTfQ26DnXz/M6gOvvqGuN0A JvHOBwGQpQhvgN6TNs5L2ZIESmIxKfC0QI+qtDtY3FXBS5a0QT4MlQkORiod07T4h8HRYd AmWPthO0S6nxnK+gSrGw7pFTVSpc0ystJ+hG5EjwYfyiOvr8inEBXQnY/45I8tVPJ/Vkrw EoaW+LfpzwN4pDJ7hsHSkRvVHMAu7HSeUUqTt/Py/nJyxGEyjLNULLFeZH/qTxrv7g8IPt /5KeLj/kYXlJ3Dlo0rPI7QDn5McFxDlIq7HG1OlMO+bL5CeeA6XY0Wghpdw2bQ== Message-ID: Date: Tue, 11 Feb 2025 23:33:45 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup To: Tom Lane Cc: Jelte Fennema-Nio , PostgreSQL-development , Andres Freund References: <3203865.1739301613@sss.pgh.pa.us> <94798ef1-0f13-416a-983a-88447e434a7f@vondra.me> <3209989.1739305120@sss.pgh.pa.us> Content-Language: en-US From: Tomas Vondra In-Reply-To: <3209989.1739305120@sss.pgh.pa.us> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-GND-State: clean X-GND-Score: -100 X-GND-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdegvddvudcutefuodetggdotefrodftvfcurfhrohhfihhlvgemucfitefpfffkpdcuggftfghnshhusghstghrihgsvgenuceurghilhhouhhtmecufedtudenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujfgurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpefvohhmrghsucggohhnughrrgcuoehtohhmrghssehvohhnughrrgdrmhgvqeenucggtffrrghtthgvrhhnpeeludegieekgfelhffgffeuvdelteetveeghfdvieekfeduudduvdfhvedufefhveenucfkphepkeeirdegledrvdefiedrudelfeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepihhnvghtpeekiedrgeelrddvfeeirdduleefpdhhvghloheplgdutddrudefjedrtddrvdgnpdhmrghilhhfrhhomhepthhomhgrshesvhhonhgurhgrrdhmvgdpnhgspghrtghpthhtohepgedprhgtphhtthhopehtghhlsehsshhsrdhpghhhrdhprgdruhhspdhrtghpthhtohepphhoshhtghhrvghssehjvghlthgvfhdrnhhlpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrshesphhoshhtghhrvghsqhhlrdhorhhgpdhrtghpthhtoheprghnughrvghssegrnhgrrhgriigvlhdruggv X-GND-Sasl: tomas@vondra.me List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2/11/25 21:18, Tom Lane wrote: > Tomas Vondra writes: >> I did run into bottlenecks due to "too few file descriptors" during a >> recent experiments with partitioning, which made it pretty trivial to >> get into a situation when we start trashing the VfdCache. I have a >> half-written draft of a blog post about that somewhere. > >> But my conclusion was that it's damn difficult to even realize that's >> happening, especially if you don't have access to the OS / perf, etc. > > Yeah. fd.c does its level best to keep going even with only a few FDs > available, and it's hard to tell that you have a performance problem > arising from that. (Although I recall old war stories about Postgres > continuing to chug along just fine after it'd run the kernel out of > FDs, although every other service on the system was crashing left and > right, making it difficult e.g. even to log in. That scenario is why > I'm resistant to pushing our allowed number of FDs to the moon...) > >> So >> my takeaway was we should improve that first, so that people have a >> chance to realize they have this issue, and can do the tuning. The >> improvements I thought about were: > >> - track hits/misses for the VfdCache (and add a system view for that) > > I think what we actually would like to know is how often we have to > close an open FD in order to make room to open a different file. > Maybe that's the same thing you mean by "cache miss", but it doesn't > seem like quite the right terminology. Anyway, +1 for adding some way > to discover how often that's happening. > We can count the evictions (i.e. closing a file so that we can open a new one) too, but AFAICS that's about the same as counting "misses" (opening a file after not finding it in the cache). After the cache warms up, those counts should be about the same, I think. Or am I missing something? >> - maybe have wait event for opening/closing file descriptors > > Not clear that that helps, at least for this specific issue. > I don't think Jelte described any specific issue, but the symptoms I've observed were that a query was accessing a table with ~1000 relations (partitions + indexes), trashing the vfd cache, getting ~0% cache hits. And the open/close calls were taking a lot of time (~25% CPU time). That'd be very visible as a wait event, I believe. >> - show max_safe_fds value somewhere, not just max_files_per_process >> (which we may silently override and use a lower value) > > Maybe we should just assign max_safe_fds back to max_files_per_process > after running set_max_safe_fds? The existence of two variables is a > bit confusing anyhow. I vaguely recall that we had a reason for > keeping them separate, but I can't think of the reasoning now. > That might work. I don't know what were the reasons for not doing that, I suppose there were reasons not to do that. regards -- Tomas Vondra