Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vnOAW-006J0t-0G for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Feb 2026 21:42:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vnOAU-007SnG-0i for pgsql-hackers@arkaria.postgresql.org; Tue, 03 Feb 2026 21:42:42 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vnOAT-007Sn5-2s for pgsql-hackers@lists.postgresql.org; Tue, 03 Feb 2026 21:42:41 +0000 Received: from mail-wr1-x42c.google.com ([2a00:1450:4864:20::42c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vnOAR-00000000Q77-3GSZ for pgsql-hackers@postgresql.org; Tue, 03 Feb 2026 21:42:40 +0000 Received: by mail-wr1-x42c.google.com with SMTP id ffacd0b85a97d-42fbbc3df8fso4761093f8f.2 for ; Tue, 03 Feb 2026 13:42:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1770154958; cv=none; d=google.com; s=arc-20240605; b=evLuFFTjJvR4ESs8TS9Dfar6X7OKtkbv4e/uJ85X9MPTWQ/ULLVXYFhR0SlBj+ITSI scIfjdN1Wvbjf5jclemDKQOKuT9sz62gl/kZJiAp9lplA97q3MNzxuAxjzWZfoDj9xdA Lfv/sEZm6DKx+OgMEaFPyrOXnciYzW91EP2Wc3sxJ+b+QUSn1sQogib9ZUEgrXOmk0Vc gP6Zd/3QsP9WqRqYSS/BUu2byxibS74HkkBrhoLb85w/c1mMJ7UUkTzXopZvhT/c1QM4 esapDS99iKVjgx+J217Ant3EKWA56d/4sU4NC5ezgRgyEIFHBj2s9RkGCRxXQK6D/5pt wC7A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=8Bk99lplapICYTR3wuBkDNQdtndpnX/2/voVqygD5dY=; fh=tWzbxSi+SAovsZTRhCsf5GpEMsL1QNjUzmnr1HpOMJ0=; b=ciZ/uHAJzL8t+uQCGgq2diEh5eQXcOihrTMjg7OnzjSFc9WihcjcK1yHaWhZM+uEAk wfcNjkHrkwAqfnw6e+8822Ug4/7mkLokxyWu4VhbHmb+/5tjfsS6kbZtz/HZPQSqUyjT pavCnfOjOZAedMDiWeOPefWdYWqw7cokpLTVvKtWAhmhAwwtsAF7r2/cTBHmcSTpo00q epIKGcA2GVWtTgAd70Dl/U+aY0zr2Ryb9Hx4OZPpvKt6TMY+tAg6nKRDNX9I5fDrTZ4s Y/5GlE40BWHp+e3BRDRUhFlz+VFZ9780jmGlf4gTmA42EhXaVt3rq8/G4S6iGGAoy0Dr yv1w==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1770154958; x=1770759758; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8Bk99lplapICYTR3wuBkDNQdtndpnX/2/voVqygD5dY=; b=pBdl/h9aui8epmRvZYxSlehhHkzNS+eX3h62/vsDTJJLDzXsiOaSR/3TwpXQguqJsG q+BA7lXPtWceimeoFDG1BpK5+E5FzOvlANt4okiJ/8/klyV57WM6YHeyVVrap8cJGTf6 m0NFnkT9k7MuMVB5dhunw3w7KDravyNpn+rrziqb4iUfuMVEJdbhIpyaPPg+vuoR3FaZ bdKW4nECe3KU3h984I26JdS2WdLjxk31hWncJOjsFdsD5JiC+17oiRQ36vD0gz3K37bo Ilz4HM62uC/9KGEfLiqBpYeQh3w3oY0XJlG+VwgYVtu0GPTk98DKgtzc8h5Nb7BEGzmL iQHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1770154958; x=1770759758; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8Bk99lplapICYTR3wuBkDNQdtndpnX/2/voVqygD5dY=; b=YxR7wK+tEhneQYI8IqVX8XfRcnFwpgOEeJ6Wk06d4dDfMrE6+LjnitlRLzzUDng+Br L+JRm0dyAjxpKvypVGscduyy+ZFnGwU8IHcsLf/2l/zdVcD6pDP8qX+3XwbRLBnUX+UJ 11NqRUITm/t0N3SdbH9wBI+M4OlV6t9p3LXLDoS3OXQbQ/6E0bFgJX71CR/5u5WVzH/2 Re1fyAXOaxHUm6DGSZEURpZm6uqgLBMkAMQshxC8IES726ZKoekrxDTj608mhcLaWxem UX4pCBBy7rvNWrGz0cKIbLCAGmorSRVudirlUyR/9fZ/KPazF8vMDrrxc5CaKUG+mZUe I43A== X-Forwarded-Encrypted: i=1; AJvYcCXbNx5nGQAI+NOdR74bRAx1cWDed75s73+ZYKkGvmhC+5diKegpD2jbPIs8MptxxvLBj2J4JlMwM6PVFqHe@postgresql.org X-Gm-Message-State: AOJu0YyrVKJlG31hzjoDFJS9lVNRN76bjbCSBR8HgFcGik7A24i2VF+A aENzAfnfAHQwdK7rbaYjL8QgJ/w3BO7MFYDx4/ddMInpX3L6L7JDE1/385dWCubskIT9DW2625s N5gbOnitV8NLIJbUxILZE/74WS+d3m79pLL2x/kINPg== X-Gm-Gg: AZuq6aLRMUPDWli4OnVc6M/zDSheRNIcJG1cJbCmsDFN6rbG2q5XF1q62kDT0z3EdVe ZyG9lvsuvpKtl8RcVvFeQbF6Xfoqhkb6XjSiv3MdQxoelq8hU9NPx8UIeWqw0m9ZoDGkLPur8sR hmX1sRt/G708gFb6bX18JHzB+ybS79aIdkSRLtIuHZ9UnOg7PwBQWw2xH5bd02gVSLCoFLuVFVy cCoCRj8oF6D36tFVbGuwWzr0q5Dcn1O8fewKbSSARWL5VSA73ulPnMGZK/N9dJ9uUKllezG X-Received: by 2002:a05:6000:200e:b0:42b:3b55:8929 with SMTP id ffacd0b85a97d-43617e42a35mr1223858f8f.19.1770154958134; Tue, 03 Feb 2026 13:42:38 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Ants Aasma Date: Tue, 3 Feb 2026 23:42:26 +0200 X-Gm-Features: AZwV_QiewWb_oiPdbqAniR3-qS3qGDHWt1oA7Hzs5QH3Vl8nqq28C7C5P-RbO80 Message-ID: Subject: Re: New access method for b-tree. To: Tomas Vondra Cc: Alexandre Felipe , "pgsql-hackers@postgresql.org" Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, 2 Feb 2026 at 01:54, Tomas Vondra wrote: > I'm also wondering how common is the targeted query pattern? How common > it is to have an IN condition on the leading column in an index, and > ORDER BY on the second one? I have seen this pattern multiple times. My nickname for it is the timeline view. Think of the social media timeline, showing posts from all followed accounts in timestamp order, returned in reasonably sized batches. The naive SQL query will have to scan all posts from all followed accounts and pass them through a top-N sort. When the total number of posts is much larger than the batch size this is much slower than what is proposed here (assuming I understand it correctly) - effectively equivalent to running N index scans through Merge Append. My workarounds I have proposed users have been either to rewrite the query as a UNION ALL of a set of single value prefix queries wrapped in an order by limit. This gives the exact needed merge append plan shape. But repeating the query N times can get unwieldy when the number of values grows, so the fallback is: SELECT * FROM unnest(:friends) id, LATERAL ( SELECT * FROM posts WHERE user_id = id ORDER BY tstamp DESC LIMIT 100) ORDER BY tstamp DESC LIMIT 100; The downside of this formulation is that we still have to fetch a batch worth of items from scans where we otherwise would have only had to look at one index tuple. The main problem I can see is that at planning time the cardinality of the prefix array might not be known, and in theory could be in the millions. Having millions of index scans open at the same time is not viable, so the method needs to somehow degrade gracefully. The idea I had is to pick some limit, based on work_mem and/or benchmarking, and one the limit is hit, populate the first batch and then run the next batch of index scans, merging with the first result. Or something like that, I can imagine a few different ways to handle it with different tradeoffs. I can imagine that this would really nicely benefit from ReadStream'ification. One other connection I see is with block nested loops. In a perfect future PostgreSQL could run the following as a set of merged index scans that terminate early: SELECT posts.* FROM follows f JOIN posts p ON f.followed_id = p.user_id WHERE f.follower_id = :userid ORDER BY p.tstamp DESC LIMIT 100; In practice this is not a huge issue - it's not that hard to transform this to array_agg and = ANY subqueries. Regards, Ants Aasma