Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w944H-001A1I-1t for pgsql-bugs@arkaria.postgresql.org; Sat, 04 Apr 2026 16:41:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w944G-00HMKn-0E for pgsql-bugs@arkaria.postgresql.org; Sat, 04 Apr 2026 16:41:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w944F-00HMKf-2b for pgsql-bugs@lists.postgresql.org; Sat, 04 Apr 2026 16:41:52 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w944D-00000000aWW-1yFj for pgsql-bugs@lists.postgresql.org; Sat, 04 Apr 2026 16:41:51 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 634GffKg4102597; Sat, 4 Apr 2026 12:41:41 -0400 From: Tom Lane To: Dmytro Astapov cc: pgsql-bugs@lists.postgresql.org Subject: Re: array_agg(anyarray) silently produces corrupt results with parallel workers when inputs mix NULL and non-NULL array elements In-reply-to: References: Comments: In-reply-to Dmytro Astapov message dated "Fri, 27 Mar 2026 12:29:12 -0000" MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0" Content-ID: <4102525.1775320863.0@sss.pgh.pa.us> Date: Sat, 04 Apr 2026 12:41:41 -0400 Message-ID: <4102596.1775320901@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <4102525.1775320863.1@sss.pgh.pa.us> Dmytro Astapov writes: > array_agg(ARRAY[...]) produces silently corrupted 2-D arrays when the query > uses parallel partial aggregation and the input arrays contain a mix of > NULL and non-NULL elements. NULL values appear at the wrong positions in > the output, and non-NULL values disappear or shift. Right you are. > I think that the following two changes to array_agg_array_combine() in > array_userfuncs.c fix the issue: > 1. Change the condition guarding the null bitmap block from > "if (state2->nullbitmap)" to > "if (state1->nullbitmap || state2->nullbitmap)". > 2. Change the bitmap reallocation size from > "state1->aitems + state2->aitems" to > "Max(state1->aitems + state2->aitems, newnitems)" > to ensure the bitmap is always large enough. This seems basically right, but I think we could simplify the code some more: AFAICS the required bitmap size is newnitems, full stop. The initial-setup path is confused about that too, allocating newnitems+1 which is pointless. It also troubled me that there's no checks for integer overflow when calculating the new sizes. I believe that the pg_nextpower2_32 bits are okay even with large inputs (we'll end in palloc rejecting the request size if there's an overflow), but if reqsize or newnitems overflows it could be bad. So I end with the attached revised patch, where I also made one or two cosmetic adjustments like putting the type-comparison checks next to the dimension comparisons. Look good to you? regards, tom lane ------- =_aaaaaaaaaa0 Content-Type: text/x-diff; name="v2-fix_array_agg_parallel_nullbitmap.patch"; charset="us-ascii" Content-ID: <4102525.1775320863.2@sss.pgh.pa.us> Content-Description: v2-fix_array_agg_parallel_nullbitmap.patch Content-Transfer-Encoding: quoted-printable diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/a= dt/array_userfuncs.c index 50c81a088df..9d7f67cc188 100644 --- a/src/backend/utils/adt/array_userfuncs.c +++ b/src/backend/utils/adt/array_userfuncs.c @@ -24,6 +24,7 @@ #include "utils/builtins.h" #include "utils/datum.h" #include "utils/lsyscache.h" +#include "utils/memutils.h" #include "utils/tuplesort.h" #include "utils/typcache.h" = @@ -1033,10 +1034,11 @@ array_agg_array_combine(PG_FUNCTION_ARGS) } = /* We only need to combine the two states if state2 has any items */ - else if (state2->nitems > 0) + if (state2->nitems > 0) { MemoryContext oldContext; - int reqsize =3D state1->nbytes + state2->nbytes; + int reqsize; + int newnitems; int i; = /* @@ -1059,6 +1061,17 @@ array_agg_array_combine(PG_FUNCTION_ARGS) errmsg("cannot accumulate arrays of different dimensionality"))); } = + /* Types should match already. */ + Assert(state1->array_type =3D=3D state2->array_type); + Assert(state1->element_type =3D=3D state2->element_type); + + /* Calculate new sizes, guarding against overflow. */ + if (pg_add_s32_overflow(state1->nbytes, state2->nbytes, &reqsize) || + pg_add_s32_overflow(state1->nitems, state2->nitems, &newnitems)) + ereport(ERROR, + (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED), + errmsg("array size exceeds the maximum allowed (%zu)", + MaxArraySize))); = oldContext =3D MemoryContextSwitchTo(state1->mcontext); = @@ -1073,17 +1086,16 @@ array_agg_array_combine(PG_FUNCTION_ARGS) state1->data =3D (char *) repalloc(state1->data, state1->abytes); } = - if (state2->nullbitmap) + /* Combine the null bitmaps, if present. */ + if (state1->nullbitmap || state2->nullbitmap) { - int newnitems =3D state1->nitems + state2->nitems; - if (state1->nullbitmap =3D=3D NULL) { /* * First input with nulls; we must retrospectively handle any * previous inputs by marking all their items non-null. */ - state1->aitems =3D pg_nextpower2_32(Max(256, newnitems + 1)); + state1->aitems =3D pg_nextpower2_32(Max(256, newnitems)); state1->nullbitmap =3D (uint8 *) palloc((state1->aitems + 7) / 8); array_bitmap_copy(state1->nullbitmap, 0, NULL, 0, @@ -1091,17 +1103,17 @@ array_agg_array_combine(PG_FUNCTION_ARGS) } else if (newnitems > state1->aitems) { - int newaitems =3D state1->aitems + state2->aitems; - - state1->aitems =3D pg_nextpower2_32(newaitems); + state1->aitems =3D pg_nextpower2_32(newnitems); state1->nullbitmap =3D (uint8 *) repalloc(state1->nullbitmap, (state1->aitems + 7) / 8); } + /* This will do the right thing if state2->nullbitmap is NULL: */ array_bitmap_copy(state1->nullbitmap, state1->nitems, state2->nullbitmap, 0, state2->nitems); } = + /* Finally, combine the data and adjust sizes. */ memcpy(state1->data + state1->nbytes, state2->data, state2->nbytes); state1->nbytes +=3D state2->nbytes; state1->nitems +=3D state2->nitems; @@ -1109,9 +1121,6 @@ array_agg_array_combine(PG_FUNCTION_ARGS) state1->dims[0] +=3D state2->dims[0]; /* remaining dims already match, per test above */ = - Assert(state1->array_type =3D=3D state2->array_type); - Assert(state1->element_type =3D=3D state2->element_type); - MemoryContextSwitchTo(oldContext); } = ------- =_aaaaaaaaaa0--