Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4iMN-002U40-0t for pgsql-general@arkaria.postgresql.org; Mon, 23 Mar 2026 16:42:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w4iML-001OOu-2D for pgsql-general@arkaria.postgresql.org; Mon, 23 Mar 2026 16:42:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w3a8t-006MkP-08 for pgsql-general@lists.postgresql.org; Fri, 20 Mar 2026 13:43:59 +0000 Received: from mail-ed1-x52f.google.com ([2a00:1450:4864:20::52f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w3a8q-00000000BVA-2ZvK for pgsql-general@postgresql.org; Fri, 20 Mar 2026 13:43:59 +0000 Received: by mail-ed1-x52f.google.com with SMTP id 4fb4d7f45d1cf-668d70fabc4so1217832a12.1 for ; Fri, 20 Mar 2026 06:43:57 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774014236; cv=none; d=google.com; s=arc-20240605; b=Nn2mBBzGA7IV7fLNR0P6qgPkQFnwIznQoyF/X7dziLRCY4I6DJwu2mCs4WZx0ZWCTA MZBhnLFlkr6ylRGr+mXnBHnsbVSz/arGNb4PASPa8exug7FNUHXTC9ft57hXDR7ewU8B Vi0ZQWw6B2roZpz/RoS+2rRibslwRlNUa1kkduX58n2H0ARgBInOvVN1gtreIipZeotL diWcj6m6xWmk7YvzzMjdWUMH4SAG/6PhncdFYzToC3/ykirWrd6VaPyfvvDKjEeN+ZXS +tqe3nmFidbmmRu/YMlHPzZQD8/2E/+X8xBc+T47qez4WJX/V2CPdAVd48PgiCi3u5lk mzzg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=kKeLTNEOPM904S5hXt+sPMsL1uD2zFPNxgQB3swFwh8=; fh=2J2KdXHU9XMQW0ovUIB7R0bLBVVVwHD9Br44WT2AkzM=; b=KhTwrMn+LETuMaXIwGdEU2YddI18HuJSVbthY1GRPKKLXfISxJERorNyDzmY3uvFvh K4sr7lpXCnUbaqMV4WUdHr+MoVxTkmRD4mK3grM08ndSxUWViPoj+OoO9PeuQ95iK9w9 L5OcggTt2rVY4ABODVwI3hwrPnauFTv83faeum5y2ig5irbHABHeUTzamO6dmcgPnNTF nWoZD0+PTNyQcdtvu0M4g6rTA8APxjJ9NI+DxBCiJTmcazZjx9aFg5oQcCKpW9q7V6gf 8pRKt80ECMSWKl5FW869g7JplqLZefcz8nwxlPu87ivYQCLh8cyPBWu1+GZReZNwDQIt UXRA==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774014236; x=1774619036; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=kKeLTNEOPM904S5hXt+sPMsL1uD2zFPNxgQB3swFwh8=; b=hyXqJgHUQEt7ajI4A98EP2o7QcoXic2RPWELJnczv8lU9uBlQ2tcQwvZ7wP5dTyd7c zXg1DsAJ3rmIwO5gn4PTzpQrR3IHTE/2HDiiA0kYL24nhFAe02Xcqhc/RDbOvyHMVi75 t4yvUe0yX1oqJtY0W+uWnB0fdGh5Om9MZOb9PuKXPX++o/JU2XAqgVgm/Aose7XiAr8R lL9VOhx404a6WKmdJ2ARf3knx516DMalHWvPy8KVhBRRf7qYp1OjQywtlkMs7RuVWEdF fMoATjGLa/HRBm+9ZYz/9f2ipvLDyKp5VxTaT6Kd2l2pfSiCF31Rc88istzXfLhVCE0d qhMw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774014236; x=1774619036; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kKeLTNEOPM904S5hXt+sPMsL1uD2zFPNxgQB3swFwh8=; b=R8cIO8iCw8yuo3ZOUCsRLcm1zjXC/zfhXBk2Atjl1m/5AaoY6be9MAoskiMp/0xeEh 5i5dQZketVAUmSlUfoSnYrvnYL01EIYi1SkixLBtz9rHvEECkpryu8RHKE4UleGwe9Vn l9yEfQI66F0Pt3Y8IgvLcz20uHHswjTeJ7+d5aSgV2ATIcSOvjVwUpJskPrJ1fwanDkB 1yxCU11QNLLLqju2N89Rl9jaGe7syImRoCQSyktlP3jl4vj3In5UMVvtvnKOwhCTPRz2 FFML7QcPxPF8hmH8SMNUsaMzBOmIDt8dt2qwaRkUd102NLJxnbILF6zFri6ELe9SmMwb VntA== X-Forwarded-Encrypted: i=1; AJvYcCU4GAUqg4ukYI45xtkyqsZ9FW4YAKl3pIUsABMXZFArHJOK/3OLXED/VidLyERMUrrzZPb34hd7yrpeRyrH@postgresql.org X-Gm-Message-State: AOJu0YylHa8Vp1Zjuj7ZmiT8p4P8L7b5Ob5Xu13HXW1zWHg56ByZOGLH IXeBmE68jgqdETi9IP4wp839otSPzO9lEn7w2PVCixGpcDbyt/H/Ho1a7fU9VsBpQjIwzQ6sF9y Ac30JY3ub5RFtrBb0kGwHOizUj5FYWcY= X-Gm-Gg: ATEYQzxqK5PoBNpEA7d+L6TIA3ksnsIHs/KE+RUS0fI4ycvuiBJ1IdYjq+diY36wb8j OnmR1ulzZHbZkaBdNxSTmSMBU55oY/Xu24cwIDbbyMh5vqrWjnW8upWS6gCK92J8Vwiis37crA0 VJK3Q12+ECxOUs+SPqpyU63GkFvnG7Ej0dvG8MpEFSD+eDQ7l1gAMlvt9UWc04fPn0ocxwQT97+ HpxJnfVYN5Z26pxfxVezRJ5MG/8ZAyuNUUWx/GKp8aK3j9pKcxh0yRa6tv7bHSR5HOjUqm71YHv 3aIZkhVMrF93rcuQf2Jbfamvoji6+iuxHM+/T9fECVh9bQ++DA== X-Received: by 2002:a05:6402:5287:b0:668:c2b6:f80 with SMTP id 4fb4d7f45d1cf-668c9b4291bmr2522289a12.19.1774014236023; Fri, 20 Mar 2026 06:43:56 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Alexandre Felipe Date: Fri, 20 Mar 2026 13:43:43 +0000 X-Gm-Features: AaiRm53DjED3NrpZuq4wjjNcl3KamvZgLXwYxhmZFIWAZm1Ks7qqAt0vWhiRe1o Message-ID: Subject: Re: Index scan with bitmap filter - has this been explored To: Tomas Vondra Cc: Laurence Newman , pgsql-general@postgresql.org, PostgreSQL Hackers Content-Type: multipart/alternative; boundary="0000000000005d6065064d74e0af" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005d6065064d74e0af Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable That is a sound idea. There are some middle ground that this would cover some middle ground. Will we rewrite postgres in rust? No more need for resource owner tracking! > It's hard to give clear opinion without looking at this much deeper, but > it seems interesting and possibly worth exploring. I have a lower reputation at stake, so I can write some opinions :) Laurence, It might be obvious for some, but I will write it either way: This could be implemented as an execution node instead of an index access method. The next step would be to update the planner to choose the new node when it is advantageous doing so. My suggestion is to have a GUC parameter so that you can easily enable or disable it, then you can run benchmarks and see how much it helps. If it hurts you blame (or fix) the planner. Having one more tool can't make the chosen query execution worse. And the planning time difference should be negligible. As said in the repository README.md > Bitmap scan on the GIN index =E2=80=94 finds all matching rows and then > heap-fetch every match, sort them by created_at, and return the top > 10. If there are many matching rows then this can be expensive. > B-tree index scan on created_at =E2=80=94 walks rows in order and checks > each one against the text search condition. If there are many > non-matching rows then it can be expensive finding 10 hits. The proposed alternative is to access two indices before checking the table data. So the savings must come from either, (1) sorting cost, very small for top 10, (2) wasted table heap page access. #A GIN -> bitmap -> heap. (GIN + Bitmap + Heap) * NumMatches + TopNSort say (g + b + h) * n1 + n1 * log(t) * x #B (Index + Heap + @@) * (rows after oldest match), assuming no cover index. say (s + h + r + @@) * n2 #C ordered-bitmap-scan (Gin + Bitmap) * NumMatches + (Index * (rows after oldest match)) (g + b) * n1 + s * n2 + t * h > If there are many > non-matching rows then it can be expensive finding 10 hits. This problem is still there, the difference is that the cost per non-matching row is reduced. Let's think in one dimension taking, if n2 is small, #B < #A < #C. If n2 is too large #C > #B > #A. So at both ends your method is worse than what is already there. So when is it promising? When the term h * n1 dominates in the bitmap scan, and h * n2 dominates in the index scan. If the documents are toasted, and we have mostly appended documents the scan will be mostly sequential, but backwards. You have to consider the selectivity of the tsvector filter, to the speedup potential (I would start with cost(IndexOnlyScan) / cost(IndexScan)). For your example, when you are accessing the most recent examples if the collection has mostly appends, that is similar to a sequential scan but backwards. Tomas, do we currently combine I/O when the index fetches pages backwards p, p-1, p-2, ... instead of p, p+1, p+2, ... If the bitmap is sparse enough the bitmap index scan will behave as a random access pattern. So let's assume everything to be random from now on. For the sake of example, Let's assume that the cost of an index only scan is 100x lower than fetchin= g a tsvector from a row. And let's assume that the matching documents are distributed uniformly among the rest of the table. The proposed approach could be up to 100x faster (when all documents match) But if only 0.1% matches, then just the index scan would be 10x slower. Because you can't jump to the next match without checking all the matches in between. Regards, Alexandre On Mon, Mar 16, 2026 at 9:24=E2=80=AFPM Tomas Vondra wrot= e: > Hello Laurence, > > I think pgsql-hackers would be a better list to discuss this. > pgsql-general is a user list, while pgsql-hakers is about development. > So I'm CCing that list, and let's continue the discussion there (you may > need to subscribe to that list, if you're not a subscriber already). > > FWIW we're in the final push for PG19 features, so people are focusing o > that for the next ~month. It may take time to get feedback. > > On 3/13/26 13:08, Laurence Newman wrote: > > Hello, > > > > I've been thinking about a scan strategy that is sort of in between a > > bitmap scan and an index scan, and I wanted to check whether this has > > been explored before much or if there's a > > reason it wouldn't work well in practice. > > > > I'm not aware of any formal implementation proposal, but that does not > mean it couldn't be a good idea (at least for some use cases). > > > Example scenario where it could be useful: you need the first N rows > > ordered by a B-tree column, filtered by a condition on another column > > with a GIN index, e.g. full text search. Today I believe Postgres has > > two options that use indexes: > > > > 1. Bitmap scan on the filter index =E2=86=92 heap fetch all matches= =E2=86=92 sort =E2=86=92 > > limit. Can be expensive > > when there are many matches but you only want a few results returned. > > 2. Index scan on the B-tree =E2=86=92 recheck the filter condition = against > > the heap for each > > row. Can be expensive when the match rate is low and many non-matching > > rows are found before finding N hits. > > > > There's also the possibility to create a an index with all the columns > (aka covering index), in which case we don't need to actually visit the > heap in most cases thanks to IOS. > > But that is imperfect for various reasons - sometimes the index can't > have all the columns, and then we don't check any filters. That could be > improved without inventing a new "hybrid" scan. > > But even then there are cases where doing what you describe could be > interesting. For example, it'd allow combining indexes with different > index AMs (e.g. btree + gist). Which covering indexes can't do easily. > > Something similar applies to vector search, where I'm not sure covering > indexes are viable. Maybe allowing to pass a bitmap to the index scan > would be helpful. > > > The idea for the new scan strategy is: build the TID bitmap from the > > filter index first (same as a bitmap scan would), then walk the B-tree > > in order, but instead of going to the heap to check the > > filter condition, check each TID against the bitmap. Matching TIDs get > > returned > > directly in order. > > > > It's hard to give clear opinion without looking at this much deeper, but > it seems interesting and possibly worth exploring. > > > I built a small proof of concept as a pgrx extension against PG 18 to > > try and test this: > > https://github.com/lauri3new/ordered-bitmap-scan > lauri3new/ordered-bitmap-scan>. The benchmarks there show it using fewe= r > > buffer hits than either a bitmap scan or index scan for the example dat= a > > distribution - which is quite skewed to show the benefit. > > > > I think it'd be helpful to include at least some benchmark numbers, so > that people can get an idea without having to build/run. > > > I want to caveat that I am a web backend developer and not a Postgres > > expert by any means, so there's probably good reasons this isn't a thin= g > > (planner cost or benefit is too small/niche). I would appreciate though > > any pointers to prior discussion or feedback on whether this is worth > > looking into further or not. > > > > I'm not aware of any prior discussions or reasons why this would not > work (or be competitive with other solutions). I think there's > definitely a lot of open questions. It's not just about the execution, > but also about generating the interesting paths (with indexes used as > inputs for other indexes) and costing. > > > regards > > -- > Tomas Vondra > > > > --0000000000005d6065064d74e0af Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
That is a sound idea.
There are some middle= ground that this would cover some middle ground.

= Will we rewrite postgres in rust? No more need for resource owner tracking!=

> It's hard to give clear opinion without = looking at this much deeper, but
> it seems interesting and possibly = worth exploring.

I have a lower reputation at stak= e, so I can write some opinions :)

Laurence,
=
It might be obvious for some, but I will write it either way:
This could be implemented as an execution node instead of an index=C2=A0<= /div>
access method. The next step would be to update the planner to
choose the new node when it is advantageous=C2=A0doing so.

My suggestion is to have a GUC parameter so that you can e= asily
enable or disable it, then you can run benchmarks and see h= ow much
it helps. If it hurts you blame (or fix) the planner. Hav= ing one more
tool can't make the chosen=C2=A0query execution = worse. And the planning
time difference should be negligible.

As said in the repository README.md

> Bitmap scan on the GIN index =E2=80=94 finds all matching rows= and then
> heap-fetch every match, sort them by created_at, a= nd return the top=C2=A0
> 10. If there are many matching rows = then this can be expensive.

> B-tree index scan on cr= eated_at =E2=80=94 walks rows in order and checks=C2=A0
> each= one against the text search condition. If there are many=C2=A0
&= gt; non-matching rows then it can be expensive finding 10 hits.
<= div>
The proposed alternative is to access two indices before= checking the
table data. So the savings must come from either, (= 1) sorting cost,
very small for top 10, (2) wasted table heap pag= e access.

#A
GIN -> bitmap -> heap= .
(GIN=C2=A0+ Bitmap=C2=A0+ Heap) * NumMatches=C2=A0+ TopNSort
say (g=C2=A0+ b=C2=A0+ h) * n1=C2=A0+ n1 * log(t) * x
#B
(Index=C2=A0+ Heap + @@) * (rows after oldest matc= h), assuming no cover index.
say (s=C2=A0+ h + r=C2=A0+ @@) * n2<= /div>

#C
ordered-bitmap-scan
(Gin=C2= =A0+ Bitmap) * NumMatches=C2=A0+ (Index * (rows after oldest match))
<= div>(g=C2=A0+ b) * n1=C2=A0+ s * n2=C2=A0+ t * h

<= div>> If there are many=C2=A0
> non-matching rows then it c= an be expensive finding 10 hits.

This proble= m is still there, the difference is that the cost per non-matching
row is reduced.=C2=A0

Let's think in one dim= ension taking, if n2 is small, #B < #A < #C. If n2 is too large
=
#C > #B > #A. So at both ends your method is worse than what is = already there.

So when is it promising?
=

When the term h * n1 dominates in the bitmap = scan, and h * n2 dominates=C2=A0
in the index scan. If the docume= nts are toasted, and we have mostly=C2=A0
appended documents the = scan will be mostly sequential, but backwards.

You= have to consider the selectivity of the=C2=A0tsvector filter,
to= the speedup potential (I would start with cost(IndexOnlyScan) / cost(Index= Scan)).

For your example, when you are accessing t= he most recent examples if=C2=A0
the collection has mostly append= s, that is similar to a sequential scan
but backwards.
=

Tomas,
do we currently combine I/O when the i= ndex fetches pages backwards
p, p-1, p-2, ... instead of p, p+1, = p+2, ...

If the bitmap is sparse enough the bitmap= index scan will behave as
a random access pattern. So let's = assume everything to be random
from now on.

<= div>For the sake of example,=C2=A0
Let's assume that the cost= of an index only scan is 100x lower than fetching
a tsvector fro= m a row.
And let's assume that the matching documents are dis= tributed uniformly
among the rest of the table.

The proposed approach could be up to 100x faster (when all document= s match)
But if only 0.1% matches, then just the index scan would= be 10x slower.
Because you can't jump to the next match with= out checking all the matches
in between.


Regards,
Alexandre


<= /div>




On Mon, Mar 16, 2026 at 9:24=E2=80=AFPM Tomas= Vondra <tomas@vondra.me> wrot= e:
Hello Laurenc= e,

I think pgsql-hackers would be a better list to discuss this.
pgsql-general is a user list, while pgsql-hakers is about development.
So I'm CCing that list, and let's continue the discussion there (yo= u may
need to subscribe to that list, if you're not a subscriber already).
FWIW we're in the final push for PG19 features, so people are focusing = o
that for the next ~month. It may take time to get feedback.

On 3/13/26 13:08, Laurence Newman wrote:
> Hello,
>
> I've been thinking about a scan strategy that is sort of in betwee= n a
> bitmap scan and an=C2=A0 index=C2=A0scan, and I wanted to check whethe= r this has
> been explored before much or if there's a
> reason it wouldn't work well in practice.
>

I'm not aware of any formal implementation proposal, but that does not<= br> mean it couldn't be a good idea (at least for some use cases).

> Example scenario where it=C2=A0could be useful: you need the first N r= ows
> ordered by a B-tree column, filtered by a condition on another column<= br> > with a GIN index, e.g. full text search. Today I believe Postgres has<= br> > two options that use indexes:
>
> =C2=A0 =C2=A0 1. Bitmap scan on the filter index =E2=86=92 heap fetch = all matches =E2=86=92 sort =E2=86=92
> limit. Can be expensive
> =C2=A0when there are many matches but you only want a few results retu= rned.
> =C2=A0 =C2=A0 2. Index scan on the B-tree =E2=86=92 recheck the filter= condition against
> the heap for each
> =C2=A0row. Can be expensive when the match rate is low and many non-ma= tching
> rows are found before=C2=A0finding N hits.
>

There's also the possibility to create a an index with all the columns<= br> (aka covering index), in which case we don't need to actually visit the=
heap in most cases thanks to IOS.

But that is imperfect for various reasons - sometimes the index can't have all the columns, and then we don't check any filters. That could b= e
improved without inventing a new "hybrid" scan.

But even then there are cases where doing what you describe could be
interesting. For example, it'd allow combining indexes with different index AMs (e.g. btree + gist). Which covering indexes can't do easily.<= br>
Something similar applies to vector search, where I'm not sure covering=
indexes are viable. Maybe allowing to pass a bitmap to the index scan
would be helpful.

> The idea for the new scan strategy is: build the TID bitmap from the > filter index first (same as a bitmap scan would), then walk the B-tree=
> in order, but instead of going to the heap to check the
> filter condition, check each TID against the bitmap. Matching TIDs get=
> returned
> directly in order.
>

It's hard to give clear opinion without looking at this much deeper, bu= t
it seems interesting and possibly worth exploring.

> I built a small proof of concept as a pgrx extension against PG 18 to<= br> > try and test this:
> https://github.com/lauri3new/ordered-bitmap-sca= n <https://github.com/
> lauri3new/ordered-bitmap-scan>. The benchmarks there show it using = fewer
> buffer hits than either a bitmap scan or index scan for the example da= ta
> distribution - which is quite skewed to show the benefit.
>

I think it'd be helpful to include at least some benchmark numbers, so<= br> that people can get an idea without having to build/run.

> I want to caveat that I am a web backend developer and not a Postgres<= br> > expert by any means, so there's=C2=A0probably good reasons this=C2= =A0isn't a thing
> (planner cost or benefit is too small/niche). I would appreciate thoug= h
> any pointers to prior discussion or feedback on whether this is worth<= br> > looking into further or not.
>

I'm not aware of any prior discussions or reasons why this would not work (or be competitive with other solutions). I think there's
definitely a lot of open questions. It's not just about the execution,<= br> but also about generating the interesting paths (with indexes used as
inputs for other indexes) and costing.


regards

--
Tomas Vondra



--0000000000005d6065064d74e0af--