public inbox for [email protected]  
help / color / mirror / Atom feed
From: Chao Li <[email protected]>
To: David Rowley <[email protected]>
Cc: PostgreSQL Developers <[email protected]>
Subject: Re: Add bms_offset_members() function for bitshifting Bitmapsets
Date: Tue, 14 Apr 2026 16:46:11 +0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAApHDvq=eEdw2Qp+aSzSOtTSe+h0fnVQ55CcTNqBkLDYiRZmxw@mail.gmail.com>
References: <CAApHDvq=eEdw2Qp+aSzSOtTSe+h0fnVQ55CcTNqBkLDYiRZmxw@mail.gmail.com>



> On Apr 13, 2026, at 12:35, David Rowley <[email protected]> wrote:
> 
> (v20 material)
> 
> While working on some new code which required offsetting the members
> of a Bitmapset, I decided to go and write a function to do this rather
> than copy the various other places where we manually construct a new
> set with a bms_next_member() -> bms_add_member() loop. The new use
> case I have is from pulling varattnos from a scan's targetlist, which
> there could be several hundred Vars in. I considered this might be
> noticeably expensive.
> 
> The current manual way we have of doing this is a bit laborious since
> we're only doing 1 member per bms_next_member() loop, and also, if the
> set has multiple words, we may end up doing several repallocs to
> expand the set, perhaps as little as 1 word at a time. That's not to
> mention the rather repetitive code that we have to do this in various
> places that might be nice to consolidate.
> 
> I've attached a patch which adds bms_offset_members(), which does
> bitshifting to move the members up or down by the given offset. While
> working on this I made a few choices which might be worth a revisit:
> 
> 1) The function modifies the given set in-place rather than making a new set.
> 2) The function will ERROR if any member would go above INT_MAX. These
> would be inaccessible, and that seems weird/wrong.
> 3) When offsetting by a negative value, members that would go below
> zero fall out of the set silently.
> 
> For #1; my original use-case that made me write this didn't need a
> copy, so I wrote the function to modify the set in-place. After
> hunting down and replacing the relevant existing bms_next_member()
> loops with the new function, I noticed all these seem to need a copy.
> Currently, I have coded the patch to do
> bms_offset_members(bms_copy(set), ...), but that's a little
> inefficient as it *could* result in a palloc for the copy then a
> repalloc in the offset. If bms_offset_members() just created a new
> set, then it could palloc() a set to the exact required size.  The
> counterargument to that is that I don't really want to copy the set
> for my intended use case. I thought about two versions, but thought
> that might be overkill. There could be a boolean parameter to define
> that, but we don't do that anywhere else in bitmapset.c, or we could
> avoid a copy-paste of the code with an always-inlined helper function.
> I couldn't decide, so left it as-is.
> 
> For #2, I could have equally made these fall off the top of the set,
> but I thought we might want to know about it in the unlikely event
> that this ever happens.
> 
> #3 We commonly want to get rid of system columns from a
> pull_varattnos() set which is offset by
> FirstLowInvalidHeapAttributeNumber, so those disappearing silently is
> what most use cases seem to want. I expect that's not for revisiting,
> but I listed this one anyway as it flies in the face of #2.
> 
> Patch attached. Comments welcome.
> 
> David
> <v1-0001-Introduce-bms_offset_members-function.patch>

I basically agree with design choices 1/2/3. And the implementation of v1 overall looks good to me.

The only issue I found is this overflow check:
```
+	/* Handle a positive offset (bitshift left) */
+	if (offset > 0)
+	{
+		/*
+		 * An unlikely scenario, but check we're not going to create a member
+		 * greater than PG_INT32_MAX.
+		 */
+		if (((uint64) new_nwords - 1) * BITS_PER_BITMAPWORD + high_bit + offset_bits > PG_INT32_MAX)
+			elog(ERROR, "bitmapset overflow");
```

This overflow check seems wrong. Because when high_bit + offset_bits > BITS_PER_BITMAPWORD, new_nwords has been increased by 1, so there high_bit + offset_bits are double counted.

To verify that, I added a new test:
```
-- 2147483583 is PG_INT32_MAX - 64, so offsetting by 1 should succeed,
-- but offsetting it by 65 should fail with overflow error
SELECT test_random_offset_operations_check_highest(2147483583, 1) AS result;
SELECT test_random_offset_operations_check_highest(2147483583, 65) AS result;
```

With v1, test_random_offset_operations_check_highest(2147483583, 1) reports an overflow error, but it should not.

Please see the attached diff for the test I added. In that diff, I also included a fix, and with that fix, the tests pass.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/






Attachments:

  [application/octet-stream] nocfbot_chao_test.diff (4.5K, 2-nocfbot_chao_test.diff)
  download | inline diff:
diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 48514a3b781..50201780204 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -1010,6 +1010,7 @@ bms_offset_members(Bitmapset *a, int offset)
 	int			new_nwords;
 	int			old_nwords;
 	int32		high_bit;
+	int64		old_highest;
 
 	Assert(bms_is_valid_set(a));
 
@@ -1022,6 +1023,7 @@ bms_offset_members(Bitmapset *a, int offset)
 	offset_bits = BITNUM(offset);
 	new_nwords = a->nwords + offset_words;
 	high_bit = bmw_leftmost_one_pos(a->words[a->nwords - 1]);
+	old_highest = ((int64) a->nwords - 1) * BITS_PER_BITMAPWORD + high_bit;
 
 	/* Adjust the number of words in the new set to how many we'll need */
 	new_nwords += (offset_bits + high_bit) >= BITS_PER_BITMAPWORD;
@@ -1041,7 +1043,8 @@ bms_offset_members(Bitmapset *a, int offset)
 		 * An unlikely scenario, but check we're not going to create a member
 		 * greater than PG_INT32_MAX.
 		 */
-		if (((uint64) new_nwords - 1) * BITS_PER_BITMAPWORD + high_bit + offset_bits > PG_INT32_MAX)
+		/*if (((uint64) new_nwords - 1) * BITS_PER_BITMAPWORD + high_bit + offset_bits > PG_INT32_MAX)*/
+		if (old_highest + offset > PG_INT32_MAX)
 			elog(ERROR, "bitmapset overflow");
 
 		/*
diff --git a/src/test/modules/test_bitmapset/expected/test_bitmapset.out b/src/test/modules/test_bitmapset/expected/test_bitmapset.out
index 3a6f08427cd..984174cfaf7 100644
--- a/src/test/modules/test_bitmapset/expected/test_bitmapset.out
+++ b/src/test/modules/test_bitmapset/expected/test_bitmapset.out
@@ -1613,4 +1613,14 @@ SELECT test_random_offset_operations(-1, 10000, 1024, 0) AS result;
   10000
 (1 row)
 
+-- 2147483583 is PG_INT32_MAX - 64, so offsetting by 1 should succeed,
+-- but offsetting it by 65 should fail with overflow error
+SELECT test_random_offset_operations_check_highest(2147483583, 1) AS result;
+   result   
+------------
+ 2147483584
+(1 row)
+
+SELECT test_random_offset_operations_check_highest(2147483583, 65) AS result;
+ERROR:  bitmapset overflow
 DROP EXTENSION test_bitmapset;
diff --git a/src/test/modules/test_bitmapset/sql/test_bitmapset.sql b/src/test/modules/test_bitmapset/sql/test_bitmapset.sql
index 5c5cfe3fc15..072d264ad9a 100644
--- a/src/test/modules/test_bitmapset/sql/test_bitmapset.sql
+++ b/src/test/modules/test_bitmapset/sql/test_bitmapset.sql
@@ -416,4 +416,9 @@ SELECT test_random_operations(-1, 10000, 81920, 0) > 0 AS result;
 -- perform some random test on bms_offset_members()
 SELECT test_random_offset_operations(-1, 10000, 1024, 0) AS result;
 
+-- 2147483583 is PG_INT32_MAX - 64, so offsetting by 1 should succeed,
+-- but offsetting it by 65 should fail with overflow error
+SELECT test_random_offset_operations_check_highest(2147483583, 1) AS result;
+SELECT test_random_offset_operations_check_highest(2147483583, 65) AS result;
+
 DROP EXTENSION test_bitmapset;
diff --git a/src/test/modules/test_bitmapset/test_bitmapset--1.0.sql b/src/test/modules/test_bitmapset/test_bitmapset--1.0.sql
index ea12ec40057..88bf6719aa6 100644
--- a/src/test/modules/test_bitmapset/test_bitmapset--1.0.sql
+++ b/src/test/modules/test_bitmapset/test_bitmapset--1.0.sql
@@ -145,4 +145,8 @@ CREATE FUNCTION test_random_offset_operations(integer, integer, integer, integer
 RETURNS integer STRICT
 AS 'MODULE_PATHNAME' LANGUAGE C;
 
+CREATE FUNCTION test_random_offset_operations_check_highest(integer, integer)
+RETURNS integer STRICT
+AS 'MODULE_PATHNAME' LANGUAGE C;
+
 COMMENT ON EXTENSION test_bitmapset IS 'Test code for Bitmapset';
diff --git a/src/test/modules/test_bitmapset/test_bitmapset.c b/src/test/modules/test_bitmapset/test_bitmapset.c
index 1fbf9ffc200..59da3fb78ca 100644
--- a/src/test/modules/test_bitmapset/test_bitmapset.c
+++ b/src/test/modules/test_bitmapset/test_bitmapset.c
@@ -67,6 +67,7 @@ PG_FUNCTION_INFO_V1(test_bitmap_match);
 /* Test utility functions */
 PG_FUNCTION_INFO_V1(test_random_operations);
 PG_FUNCTION_INFO_V1(test_random_offset_operations);
+PG_FUNCTION_INFO_V1(test_random_offset_operations_check_highest);
 
 /* Convenient macros to test results */
 #define EXPECT_TRUE(expr)	\
@@ -827,3 +828,20 @@ test_random_offset_operations(PG_FUNCTION_ARGS)
 
 	PG_RETURN_INT32(num_ops);
 }
+
+Datum
+test_random_offset_operations_check_highest(PG_FUNCTION_ARGS)
+{
+	int highest_member = 0;
+	int offset = 0;
+	Bitmapset  *bms;
+
+	highest_member = PG_GETARG_INT32(0);
+	offset = PG_GETARG_INT32(1);
+
+	bms = bms_add_member(NULL, highest_member);
+	bms = bms_offset_members(bms, offset);
+	highest_member = bms_next_member(bms, -1);
+
+	PG_RETURN_INT32(highest_member);
+}
\ No newline at end of file


view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Add bms_offset_members() function for bitshifting Bitmapsets
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox