Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vCFYU-00B9ow-Mw for pgsql-general@arkaria.postgresql.org; Fri, 24 Oct 2025 11:01:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vCFYS-00FM8I-1a for pgsql-general@arkaria.postgresql.org; Fri, 24 Oct 2025 11:01:55 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vCFYR-00FM8A-JR for pgsql-general@lists.postgresql.org; Fri, 24 Oct 2025 11:01:54 +0000 Received: from lana.depesz.com ([88.198.49.178] helo=depesz.com) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vCFYN-003WYe-2I for pgsql-general@lists.postgresql.org; Fri, 24 Oct 2025 11:01:53 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=depesz.com; s=20170201; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Reply-To: Message-ID:Subject:To:Sender:From:Date:Cc:Content-ID:Content-Description: In-Reply-To:References; bh=ug8r4SQyMkBN1SYZKCeWxJ8mPTm0pMp89CwZR8/H2ek=; b=Ca U7EpzzqMR6CRYlo7AR2qomNqDNEhbE2M+Yg6yE/JmRa6agPxDDFVhtzsSzXBHFXi7AsEetJZp0J4e 7RCSCwcr7proMPhfX0SngzhAwBK+nPcWo+2iOn9SvMS5+/Zqk+uKR4eYRu6Kvz9vLLv2JBUpeS6Gx +meXENt9AdNbF+c=; Received: from depesz by depesz.com with local (Exim 4.96) (envelope-from ) id 1vCFYK-00Dd3S-2B for pgsql-general@lists.postgresql.org; Fri, 24 Oct 2025 13:01:48 +0200 Date: Fri, 24 Oct 2025 13:01:48 +0200 From: hubert depesz lubaczewski Sender: depesz@depesz.com To: PostgreSQL General Subject: Why is this query touching 4gb of buffers? Message-ID: Reply-To: depesz@depesz.com MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, I have weird-ish case, that I can't grok, or at least explain in hand-wavy way. Very simple query: SELECT some_table.communication_channel_id, some_table.root_account_id FROM some_schema.some_table WHERE workflow_state = 'pending' AND send_at <= '2025-10-23 12:01:13'; On complex-ish table: Table "some_schema.some_table" Column │ Type │ Collation │ Nullable │ Default ═════════════════════════════════╪═════════════════════════════╪═══════════╪══════════╪═══════════════════════════════════════════════════════════════════ id │ bigint │ │ not null │ nextval('some_schema.some_table_id_seq'::regclass) notification_id │ bigint │ │ │ notification_policy_id │ bigint │ │ │ context_id │ bigint │ │ │ context_type │ character varying(255) │ │ │ communication_channel_id │ bigint │ │ │ frequency │ character varying(255) │ │ │ workflow_state │ character varying(255) │ │ │ batched_at │ timestamp without time zone │ │ │ created_at │ timestamp without time zone │ │ │ updated_at │ timestamp without time zone │ │ │ send_at │ timestamp without time zone │ │ │ link │ text │ │ │ name_of_topic │ text │ │ │ summary │ text │ │ │ root_account_id │ bigint │ │ │ notification_policy_override_id │ bigint │ │ │ Indexes: "some_table_pkey" PRIMARY KEY, btree (id), tablespace "data1" … "index_some_table_pending" btree (send_at) WHERE workflow_state::text = 'pending'::text, tablespace "data1" … Running this via explain shows: Index Scan using index_some_table_pending on some_table (cost=0.43..399992.44 rows=2215063 width=16) (actual time=382.466..382.466 rows=0 loops=1) Index Cond: (send_at <= '2025-10-23 12:01:13'::timestamp without time zone) Buffers: shared hit=554347 Planning Time: 0.082 ms Execution Time: 382.482 ms (5 rows) Specifically, I see that it returned 0 rows, and to do it, it had to "touch" 554k buffers - ~ 4GB of data. Sizes of the relavant things: oid │ relkind │ pg_relation_size │ pg_table_size │ pg_total_relation_size ══════════════════════════════════════╪═════════╪══════════════════╪═══════════════╪════════════════════════ some_schema.some_table │ r │ 15339020288 │ 15376539648 │ 22347988992 some_schema.index_some_table_pending │ i │ 77127680 │ 77168640 │ 77168640 (2 rows) I didn't check analyze time, so can't comment on this, but I did reindex of this index, which shrunk it to 16973824 bytes, and now the query runs, as expected, in < 1ms: Index Scan using index_some_table_pending on some_table (cost=0.43..392423.37 rows=2215272 width=16) (actual time=0.005..0.005 rows=0 loops=1) Index Cond: (send_at <= '2025-10-23 12:01:13'::timestamp without time zone) Buffers: shared hit=3 Planning: Buffers: shared hit=3 Planning Time: 0.787 ms Execution Time: 0.016 ms (7 rows) While talking with others, there have been used certain terms, like "unbounded range", which I understand, but I still don't see why, with the same stats, and NO rows returned by index - before index, pg has to do stuff to 4GB of data?! Anyone could try to explain? If it matters, this is rather old Pg: PostgreSQL 14.7, and the query was running on streaming replica, but results on primary were basically the same. Best regards, depesz