Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thyjq-009TrE-7g for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Feb 2025 22:28:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1thyip-001GcE-Cg for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Feb 2025 22:27:16 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thyio-001Gbw-EE for pgsql-hackers@lists.postgresql.org; Tue, 11 Feb 2025 22:27:15 +0000 Received: from fhigh-b4-smtp.messagingengine.com ([202.12.124.155]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1thyim-000IdB-2Z for pgsql-hackers@postgresql.org; Tue, 11 Feb 2025 22:27:14 +0000 Received: from phl-compute-09.internal (phl-compute-09.phl.internal [10.202.2.49]) by mailfhigh.stl.internal (Postfix) with ESMTP id 3469D254012C; Tue, 11 Feb 2025 17:27:11 -0500 (EST) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-09.internal (MEProxy); Tue, 11 Feb 2025 17:27:11 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=anarazel.de; h= cc:cc:content-type:content-type:date:date:from:from:in-reply-to :in-reply-to:message-id:mime-version:references:reply-to:subject :subject:to:to; s=fm3; t=1739312831; x=1739399231; bh=hBjV4M6HPd 2N+jwRIdjY/E8fkVzUuKQfk9XPFLckbVQ=; b=J3ORpkXhok1Sbzq/a/HucVU1lH JxnHq7GaLyVz1wJ4KR20iUaNLqkNJQ/3dJSi7DgmVjYUdSwPudt/Em4E4goQwkjM OWtMbCzO+LL1zH9CuDjlMfkpTOns3Isa7TuKuPfz2KlgZ2XToKbzsJUnnyhpGGnP BQYojmn/v4M+b/jqNwh4tMiISUk0/kKrZZivsxGSes3+BldgT5jIp/Kc/wN6Nt+W Kbk/BATQLf094d2NyWBYyFyFtj232usucqqME/l7Fra05PX2FM6kaiRgw7kyVf+z pfJ+hM9a3t3CzhHM3lSOO+xU1Ms9bAiLbaKmTiZivDkzFf/2tWRRaLddyY1A== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:in-reply-to :message-id:mime-version:references:reply-to:subject:subject:to :to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t= 1739312831; x=1739399231; bh=hBjV4M6HPd2N+jwRIdjY/E8fkVzUuKQfk9X PFLckbVQ=; b=reqtLaPzq/DvT25N/waJs8S7FJpbOR1VQYYd8aEcgWmCGs7UMgK isuL1TB2LZ38Lrxgf4Xkq9p8n3voftYlobIa/dck7NBYX8ZhY6etcyl7pflKM//d PCfvfrSV9Wm5aLLxbsEWYT1/22T+8KbCSL/r5B3ALmAd5qE1Y8hX1LRmJfTlVRH1 2yFc0lAlRlyaSEJOZxEh1QBhlqGiHbmPAK+f28TousuftcdmXgJURBV7LoZyMXFk M7ZPhL/3cnJJt2jclGlPPhuoNmRZZ7+KShOkN2Mhk3DMWdo7TpB/zGIc+bt6tg33 h/f8RpL07WYQ+KPyNiCRl2vJNLTNPUkqiiw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgdegvddulecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivg hnthhsucdlqddutddtmdenucfjughrpeffhffvvefukfhfgggtuggjsehttdfstddttddv necuhfhrohhmpeetnhgurhgvshcuhfhrvghunhguuceorghnughrvghssegrnhgrrhgrii gvlhdruggvqeenucggtffrrghtthgvrhhnpeeffffgledvffegtdevlefgtdeggffhvdek gfegteeiveejkeetudelveejhfeugeenucevlhhushhtvghrufhiiigvpedtnecurfgrrh grmhepmhgrihhlfhhrohhmpegrnhgurhgvshesrghnrghrrgiivghlrdguvgdpnhgspghr tghpthhtohepgedpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepphhoshhtghhrvg hssehjvghlthgvfhdrnhhlpdhrtghpthhtohepphhgshhqlhdqhhgrtghkvghrshesphho shhtghhrvghsqhhlrdhorhhgpdhrtghpthhtohepthhglhesshhsshdrphhghhdrphgrrd hushdprhgtphhtthhopehtohhmrghssehvohhnughrrgdrmhgv X-ME-Proxy: Feedback-ID: id4a34324:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Tue, 11 Feb 2025 17:27:10 -0500 (EST) Date: Tue, 11 Feb 2025 17:27:09 -0500 From: Andres Freund To: Tom Lane Cc: Tomas Vondra , Jelte Fennema-Nio , PostgreSQL-development Subject: Re: Bump soft open file limit (RLIMIT_NOFILE) to hard limit on startup Message-ID: References: <3203865.1739301613@sss.pgh.pa.us> <94798ef1-0f13-416a-983a-88447e434a7f@vondra.me> <7u7dbn6s2i6bf3hjzkbqaexj2bpoblqxwbkffbetl4rjv6dcom@s2uickjc5z53> <3216369.1739308717@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3216369.1739308717@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On 2025-02-11 16:18:37 -0500, Tom Lane wrote: > Andres Freund writes: > > And when using something like io_uring for AIO, it'd allow to > > max_files_per_process in addition to the files requires for the io_uring > > instances. > > Not following? Surely we'd not be configuring that so early in > postmaster start? The issue is that, with io_uring, we need to create one FD for each possible child process, so that one backend can wait for completions for IO issued by another backend [1]. Those io_uring instances need to be created in postmaster, so they're visible to each backend. Obviously that helps to much more quickly run into an unadjusted soft RLIMIT_NOFILE, particularly if max_connections is set to a higher value. In the current version of the AIO patchset, the creation of those io_uring instances does happen as part of an shmem init callback, as the io uring creation also sets up queues visible in shmem. And shmem init callbacks are currently happening *before* postmaster's set_max_safe_fds() call: /* * Set up shared memory and semaphores. * * Note: if using SysV shmem and/or semas, each postmaster startup will * normally choose the same IPC keys. This helps ensure that we will * clean up dead IPC objects if the postmaster crashes and is restarted. */ CreateSharedMemoryAndSemaphores(); /* * Estimate number of openable files. This must happen after setting up * semaphores, because on some platforms semaphores count as open files. */ set_max_safe_fds(); So the issue would actually be that we're currently doing set_max_safe_fds() too late, not too early :/ Greetings, Andres Freund [1] Initially I tried to avoid that, by sharing a smaller number of io_uring instances across backends. Making that work was a fair bit of code *and* was considerably slower, due to now needing a lock around submission of IOs. Moving to one io_uring instance per backend fairly dramatically simplified the code while also speeding it up.