Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vCHgv-00Bcbc-Bj for pgsql-general@arkaria.postgresql.org; Fri, 24 Oct 2025 13:18:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vCHgu-00GbSN-Al for pgsql-general@arkaria.postgresql.org; Fri, 24 Oct 2025 13:18:47 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vCHgu-00GbSF-0Y for pgsql-general@lists.postgresql.org; Fri, 24 Oct 2025 13:18:47 +0000 Received: from lana.depesz.com ([88.198.49.178] helo=depesz.com) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vCHgr-003ztt-0H for pgsql-general@lists.postgresql.org; Fri, 24 Oct 2025 13:18:46 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=depesz.com; s=20170201; h=In-Reply-To:Content-Transfer-Encoding:Content-Type:MIME-Version :References:Reply-To:Message-ID:Subject:Cc:To:Sender:From:Date:Content-ID: Content-Description; bh=BRglmScMTScR9cb/t1cJWNT3NZm1qC6CaL6dwg5F70c=; b=RKhY8 Jmbf+CULw9pj57A/iX9vybWdiqfQT0TCVmPwgYrnzfucw7pQfKAGpGplD82Gf2wDGxMAzZ1+mE/iw SnNJIM36OvV62H5I6538+kqq7Fnii3e2uuRxHKdgwQv01T8/cRB5RPtLPA7mM9EqwQO/f3ouYKcrM oG/C30n2vw78=; Received: from depesz by depesz.com with local (Exim 4.96) (envelope-from ) id 1vCHgq-00EX4G-0h; Fri, 24 Oct 2025 15:18:44 +0200 Date: Fri, 24 Oct 2025 15:18:44 +0200 From: hubert depesz lubaczewski Sender: depesz@depesz.com To: Tom Lane Cc: PostgreSQL General Subject: Re: Why is this query touching 4gb of buffers? Message-ID: Reply-To: depesz@depesz.com References: <1754333.1761310446@sss.pgh.pa.us> <1755687.1761310871@sss.pgh.pa.us> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <1755687.1761310871@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Oct 24, 2025 at 09:01:11AM -0400, Tom Lane wrote: > hubert depesz lubaczewski writes: > > On Fri, Oct 24, 2025 at 08:54:06AM -0400, Tom Lane wrote: > >> The first execution probably had to set hint bits on a whole lot > >> of recently-deleted rows. > > > But why it doesn't happen/help on secondary? > > IIRC, secondaries aren't authorized to update hint bits for > themselves, they have to wait for the primary to do it and then > propagate the new data. There might also be some question of > what the oldest open transaction is ... OK. So tested this idea on yet another "setup". We have single primary, and two streaming replicas. Before test I ran: select now() - min(xact_start), now() - pg_last_xact_replay_timestamp() from pg_stat_activity on both replicas, and got: ?column? │ ?column? ═════════════════╪═════════════════ 00:00:00.003007 │ 00:00:00.003673 (1 row) and ?column? │ ?column? ══════════╪══════════════════ 00:00:00 │ -00:00:00.006129 (1 row) Then I ran the problematic query on replica 1 (the first one), and got numbers: Buffers: shared hit=21107 Execution Time: 18.621 ms Subsequent runs on the replica showed the same buffers usage, and similar time. Then I ran this query twice on primary, and noticed improvement: Buffers: shared hit=569 read=20927 dirtied=498 Execution Time: 2596.283 ms and on 2nd run: Buffers: shared hit=391 Execution Time: 2.015 ms Awesome. Well, mostly. Now, I waited ~ 1 minute, checked replication lag, and oldest transactions replies, where all lags were < 1s, and longest transaction across primary/secondary1/secondary2 was ~ 4s. Then I re-ran the query on secondary 1 and got: Buffers: shared hit=21107 Execution Time: 18.403 ms Interestingly, on one "setup" running vacuum analyze of the table helped execution on secondary, but on this one, it doesn't. Best regards, depesz