Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4arT-002LWd-1u for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Mar 2026 08:42:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w4arR-00GTRe-35 for pgsql-hackers@arkaria.postgresql.org; Mon, 23 Mar 2026 08:42:10 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w4arR-00GTRT-1N for pgsql-hackers@lists.postgresql.org; Mon, 23 Mar 2026 08:42:10 +0000 Received: from mail-ed1-x529.google.com ([2a00:1450:4864:20::529]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w4arP-00000000avb-21RW for pgsql-hackers@lists.postgresql.org; Mon, 23 Mar 2026 08:42:08 +0000 Received: by mail-ed1-x529.google.com with SMTP id 4fb4d7f45d1cf-661b16ac011so67642a12.2 for ; Mon, 23 Mar 2026 01:42:07 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1774255326; cv=none; d=google.com; s=arc-20240605; b=UeJVs5LUMtixyCt6/Re3E7nn6630D1D0gubR7vAJBBw6+ZjTPx5Rf+jrUonmfG8RHT aBuynEDNxQF7BZR+WgG0Q9PDqzHpU0MjwzBlWcl570E9JiotUTYqp6+gLJN4CqsbgIs0 v4x+0r2BkUx9pE9Ni2FWU8PQeJb2969/sSnxg1FUdFaQdr966RHQPAKCDd7YsZ/FBy01 cKVzUQySkySGOHkQAX8iYxYHcoZ6OoUUuWcr9v4Mb5PZNPl1Y5i6penohLuKQWnk4qrk 6E/dSfQZ3pYS+BfrPWODgOxjdAHcnc7zWHmUJa3O32H9N8Ju5gXF0wNLAoZrciaGwuS8 XRgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=f87BgBHpJg5KUrVQgAFl55U5xEbGCPM3Eg+mnTPI7RY=; fh=API7pHFvHz//VzzmsjMiZjVU2agzigd09gVcY4RidXg=; b=Coxhmx3ZWhOdljwAolaP+AyNHi6aeYCVznQFa7RXv0OzT2XjHDtrOyQXgV+izCxdYj wsSU35Dtrfwm7vlibZiiYdi/7TXEahySTLWHoB3s2RYnebUauIk1VJa1gXxk6c9DWhs8 beQ0mBuc2LtKG3MeKvlB1IGRSnLpLSHk31xHuWY5bDswTvN4BmHFI1mv9i2vOLymJ//A JHHw3gKfqdNBIpV8jr5DXHsA/OKwb6QQFQC5TW/QHXE8kyUQZCVC7TfRsMN5edc5M1KD xDRsC9renxxrQEB1oG+HUiTwK5lt75Fi/C1Dm0Qzr2MROrO37HbkxFNZuhk+7CrrQKdb tcVQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1774255326; x=1774860126; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=f87BgBHpJg5KUrVQgAFl55U5xEbGCPM3Eg+mnTPI7RY=; b=km0hmddWzpvClBVIZ6IcDByOUjOrLvPTpV4Pwi9M8eFsBk/8rVSomot5XO9GdB+qzk dOhCW/Kj+1iXYTF5A50k1vqR1ynVOi6P+hpHzHj0mdSvRwHPgcIplaP+SJFaLLGQ2n3j miwrSBS3HbsTTthTFLqcp3j3KCZkB4HpN1cnVTzUIio2BrStwJlSHd391C/PlmyjETt2 /egZPp8T7P1DA8RjLr6ll9cqOXasJWRSfDHOOtsmkg3hon96/OJMM9yARoErNlIfuiJF ul6pnKPnja4gwbNYrSjA9KAG2nhwSdY5ZWZlJLibZc2BJgmPW5VWjyg8J7kv0l3qvxkY XUNw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774255326; x=1774860126; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=f87BgBHpJg5KUrVQgAFl55U5xEbGCPM3Eg+mnTPI7RY=; b=Dgg+e/NCHzZDn+MT6be6nBnJIPZG/Xuj1xnqwBt42GRxMvgHvBUUAdvj5tmavZPlUg ZDY3lRMa2r3QB8LF0Twy8sLVZsqwojzeV4f6F4M2vmZFi5aM/QGzqhSHul3oR+aMmBi9 bYMhMTv/JjriGgtT4qSVr7YlTl/y8JoF79Q+leY6lzJe7DzgtlxNbW5xI9UGFJ2/MQPa SknJ7UvcznZKiMp6wN66/Ddh237NndYs8SJIG1ye2Zz1901asAriPuACI8LlCiIMN4al CJg4+JV8WnBz+8p5bKhnmVKsvgYuNErRfOJ0LQvucN8dvny/eoSN/qXMDn+ubjfTtXWb Swxg== X-Gm-Message-State: AOJu0YxI5UcW3Fk/g6JxNl0oNk4SyQDu0kHFez0S3CXnb3yaEyU0GIQo 2jfaGMx71Ru+OOQ7Cq2rmtRTGQ78HfTPqiJ6r58OeOy6O8QXY8Gp+YWngha74QINC+dIkGXbLGN yGyE34LfxihhJAhRO8vd6odUT3d2z+QIpaGBD X-Gm-Gg: ATEYQzwkioKubDBGCgYhrojQ88RcH8x9doljXRW1Bt5sgUJpVnYPF8VJrJAv9GNRhHa XL3V2XVHL1+42pwgn3in4NBRRi+/A7BQNCVFFZDiaA4mOTt11aZcPw6UD3ztmp+Z1PPT1986Dxr 7+VhLH2Kcd2Pl60vT08W54LlhX4LF+ifonqAwnjgeiF7P/vFzd3v1fn19TFUArCAyO+w1rssVue +ig1EN9xjYuu7qZGN4I3yhg0VXnzlptVnSVHSidCsI3FJHZk1V8Ic60Jgv0bHRgVfDqr3HOTFun 2ZXmlEWb1ArDUQvJAr3aSjX/lelKy/f3fLTj68KUQUcmlBH1hK1ucz3Tzkpl47Yx8ayo84gMAOT CQIVB9aEF X-Received: by 2002:a17:906:7d4:b0:b98:24d7:1a2b with SMTP id a640c23a62f3a-b982f3a2b05mr558137266b.39.1774255325662; Mon, 23 Mar 2026 01:42:05 -0700 (PDT) MIME-Version: 1.0 References: <0B38B9DC-A53C-437C-B9BD-807F30A032BA@gmail.com> <424A3799-D7FD-4691-BD10-6CBDACB14938@gmail.com> <43825F3D-6156-4030-B5E1-9A0841969022@gmail.com> <260E4616-43B7-4C24-BF53-352DC8DEB7AA@gmail.com> In-Reply-To: From: Xuneng Zhou Date: Mon, 23 Mar 2026 16:41:53 +0800 X-Gm-Features: AQROBzBEnASsIo1V52isTc08YPVTrKIX8x3dgU5NnP4J9GTrVemPie8rz49MQDI Message-ID: Subject: Re: tablecmds: fix bug where index rebuild loses replica identity on partitions To: Chao Li Cc: Postgres hackers Content-Type: multipart/alternative; boundary="0000000000006d5111064dad0237" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000006d5111064dad0237 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, On Mon, Mar 23, 2026 at 3:57=E2=80=AFPM Chao Li wr= ote: > > > > > On Mar 21, 2026, at 18:29, Xuneng Zhou wrote: > > > > On Thu, Mar 19, 2026 at 1:07=E2=80=AFPM Chao Li wrote: > >> > >> > >> > >>> On Feb 26, 2026, at 14:59, Chao Li wrote: > >>> > >>> > >>> > >>>> On Jan 28, 2026, at 10:49, Chao Li wrote: > >>>> > >>>> > >>>> > >>>>> On Jan 27, 2026, at 16:30, Chao Li wrote: > >>>>> > >>>>> > >>>>> > >>>>>> On Jan 27, 2026, at 15:59, Chao Li wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On Jan 27, 2026, at 15:39, Michael Paquier wrote: > >>>>>>> > >>>>>>> On Tue, Jan 27, 2026 at 01:13:32PM +0800, Chao Li wrote: > >>>>>>>> I found this bug while working on a related patch [1]. > >>>>>>>> > >>>>>>>> When ALTER TABLE ... ALTER COLUMN TYPE causes an index rebuild, and > >>>>>>>> that index is used as REPLICA IDENTITY on a partitioned table, the > >>>>>>>> replica identity marking on partitions can be silently lost after the > >>>>>>>> rebuild. > >>>>>>> > >>>>>>> I am slightly confused by the tests included in the proposed patch. > >>>>>>> On HEAD, if I undo the proposed changes of tablecmds.c, the tests > >>>>>>> pass. If I run the tests of the patch with the changes of > >>>>>>> tablecmds.c, the tests also pass. > >>>>>> > >>>>>> Oops, that isn=E2=80=99t supposed to be so. I=E2=80=99ll check the= test. > >>>>>> > >>>>> > >>>>> Okay, I see the problem is here: > >>>>> ``` > >>>>> +CREATE UNIQUE INDEX test_replica_identity_partitioned_pkey ON test_replica_identity_partitioned (id); > >>>>> ``` > >>>>> > >>>>> I missed to add column =E2=80=9Cval=E2=80=9D into the index, so tha= t alter type of val didn=E2=80=99t cause index rebuild. > >>>>> > >>>>> Ideally, it=E2=80=99s better to also verify that index OIDs should = have changed before and after alter column type, but I haven=E2=80=99t figured o= ut how to do so. Do you have an idea? > >>>> > >>>> I just updated the test to store index OIDs before and after rebuild into 2 temp tables, so that we can compare the OIDs to verify rebuild happens and replica identity preserved. > >>>> > >>>> I tried to port the test to master branch, and the test failed. From the test diff file, we can see replica identity lost on 3 leaf partitions: > >>>> ``` > >>>> @@ -360,9 +360,9 @@ > >>>> ORDER BY b.index_name; > >>>> index_name | rebuilt | ri_lost > >>>> ---------------------------------------------------+---------+--------- > >>>> - test_replica_identity_partitioned_p1_id_val_idx | t | f > >>>> - test_replica_identity_partitioned_p2_1_id_val_idx | t | f > >>>> - test_replica_identity_partitioned_p2_2_id_val_idx | t | f > >>>> + test_replica_identity_partitioned_p1_id_val_idx | t | t > >>>> + test_replica_identity_partitioned_p2_1_id_val_idx | t | t > >>>> + test_replica_identity_partitioned_p2_2_id_val_idx | t | t > >>>> test_replica_identity_partitioned_p2_id_val_idx | t | f > >>>> test_replica_identity_partitioned_pkey | t | f > >>>> (5 rows) > >>>> ``` > >>>> > >>>> With this patch, the test passes and all replica identity are preserved. > >>>> > >>>> PFA v3: > >>>> * Enhanced the test. > >>>> * A small change in find_partition_replica_identity_indexes(): if we will not update a partition, then unlock it. > >>>> > >>>> Best regards, > >>>> -- > >>>> Chao Li (Evan) > >>>> HighGo Software Co., Ltd. > >>>> https://www.highgo.com/ > >>>> > >>>> > >>>> > >>>> > >>>> > >>> > >>> The CF asked for a rebase, thus rebased as v4. > >>> > > > > > > Hi, I reproduced this with the test case, and the patch appears > > to resolve it. > > > > Some comments on v5: > > Thanks a lot for your review. > > > > > -- Whether it makes sense to use a single list of pair structs instead > > of two parallel OID lists (replicaIdentityIndexOids + > > replicaIdentityTableOids) to avoid accidental desync. > > I don=E2=80=99t think that helps much. The current code of rebuilding ind= ex uses two lists changedIndexOids and changedIndexDefs. So, this patch matches the pattern of the existing code. > > > > > -- It would be better to make lock handling in > > find_partition_replica_identity_indexes() consistent > > (relation_open(..., NoLock) if child is already locked, and avoid > > mixed relation_close(..., lockmode)/NoLock behavior). > > That=E2=80=99s because if we are going to update a partition, then we nee= d to hold the lock on the partition. There is one locking cleanup in find_partition_replica_identity_indexes()= . find_inheritance_children(relId, lockmode) already acquires lockmode on every partition it returns, so I think the later relation_open() should use NoLock, not lockmode. For the same reason, all relation_close() calls in this function should use NoLock as well. Today the code does: partRel =3D relation_open(partRelOid, lockmode); ... relation_close(partRel, lockmode); That does not cause a correctness issue, because the lock manager reference-counts same-transaction acquisitions, so the lock remains held either way. But it is misleading: it suggests that relation_open() is where the partition lock is taken, and that the early relation_close(..., lockmode) is intentionally releasing it. Neither is actually true here, because the lock was already acquired by find_inheritance_children(). So I think this should be adjusted to: partRel =3D relation_open(partRelOid, NoLock); and all close sites in this function should be: relation_close(partRel, NoLock); The comment on the early-close path should also be updated, since it is not really unlocking the partition. Something like "No matching partition index; just close the relcache entry" would match the actual behavior better. > > > > -- Some typos in comments/tests (partion/parition). > > > > Fixed. > > PFA v6: fixed a typo in comment. > > Best regards, > -- > Chao Li (Evan) > HighGo Software Co., Ltd. > https://www.highgo.com/ > > > > --=20 Best, Xuneng --0000000000006d5111064dad0237 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

On Mon, Mar 2= 3, 2026 at 3:57=E2=80=AFPM Chao Li <li.evan.chao@gmail.com> wrote:
>
&g= t;
>
> > On Mar 21, 2026, at 18:29, Xuneng Zhou <xunengzhou@gmail.com= > wrote:
> >
> > On Thu, Mar 19, 2026 at 1:07=E2=80=AF= PM Chao Li <= li.evan.chao@gmail.com> wrote:
> >>
> >>
= > >>
> >>> On Feb 26, 2026, at 14:59, Chao Li <<= a href=3D"mailto:li.evan.chao@gmail.com" target=3D"_blank">li.evan.chao@gma= il.com> wrote:
> >>>
> >>>
> >= ;>>
> >>>> On Jan 28, 2026, at 10:49, Chao Li <<= a href=3D"mailto:li.evan.chao@gmail.com" target=3D"_blank">li.evan.chao@gma= il.com> wrote:
> >>>>
> >>>>
= > >>>>
> >>>>> On Jan 27, 2026, at 16:3= 0, Chao Li <= li.evan.chao@gmail.com> wrote:
> >>>>>
> = >>>>>
> >>>>>
> >>>>&= gt;> On Jan 27, 2026, at 15:59, Chao Li <li.evan.chao@gmail.com> wrote:
&g= t; >>>>>>
> >>>>>>
> >&g= t;>>>>
> >>>>>>> On Jan 27, 2026, at= 15:39, Michael Paquier <michael@paquier.xyz> wrote:
> >>>>>= >>
> >>>>>>> On Tue, Jan 27, 2026 at 01:13= :32PM +0800, Chao Li wrote:
> >>>>>>>> I foun= d this bug while working on a related patch [1].
> >>>>&g= t;>>>
> >>>>>>>> When ALTER TABLE ..= . ALTER COLUMN TYPE causes an index rebuild, and
> >>>>&g= t;>>> that index is used as REPLICA IDENTITY on a partitioned tabl= e, the
> >>>>>>>> replica identity marking on= partitions can be silently lost after the
> >>>>>>= >> rebuild.
> >>>>>>>
> >>>= >>>> I am slightly confused by the tests included in the propos= ed patch.
> >>>>>>> On HEAD, if I undo the propo= sed changes of tablecmds.c, the tests
> >>>>>>> = pass.=C2=A0 If I run the tests of the patch with the changes of
> >= ;>>>>>> tablecmds.c, the tests also pass.
> >>= ;>>>>
> >>>>>> Oops, that isn=E2=80=99t= supposed to be so. I=E2=80=99ll check the test.
> >>>>&g= t;>
> >>>>>
> >>>>> Okay, I se= e the problem is here:
> >>>>> ```
> >>>= ;>> +CREATE UNIQUE INDEX test_replica_identity_partitioned_pkey ON te= st_replica_identity_partitioned (id);
> >>>>> ```
&= gt; >>>>>
> >>>>> I missed to add colum= n =E2=80=9Cval=E2=80=9D into the index, so that alter type of val didn=E2= =80=99t cause index rebuild.
> >>>>>
> >>&= gt;>> Ideally, it=E2=80=99s better to also verify that index OIDs sho= uld have changed before and after alter column type, but I haven=E2=80=99t = figured out how to do so. Do you have an idea?
> >>>>
= > >>>> I just updated the test to store index OIDs before an= d after rebuild into 2 temp tables, so that we can compare the OIDs to veri= fy rebuild happens and replica identity preserved.
> >>>>=
> >>>> I tried to port the test to master branch, and th= e test failed. From the test diff file, we can see replica identity lost on= 3 leaf partitions:
> >>>> ```
> >>>> @= @ -360,9 +360,9 @@
> >>>> ORDER BY b.index_name;
> = >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 index_name =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 | rebuilt | ri_lost
> >>>> ----------------= -----------------------------------+---------+---------
> >>>= ;> - test_replica_identity_partitioned_p1_id_val_idx =C2=A0 | t =C2=A0 = =C2=A0 =C2=A0 | f
> >>>> - test_replica_identity_partitio= ned_p2_1_id_val_idx | t =C2=A0 =C2=A0 =C2=A0 | f
> >>>> -= test_replica_identity_partitioned_p2_2_id_val_idx | t =C2=A0 =C2=A0 =C2=A0= | f
> >>>> + test_replica_identity_partitioned_p1_id_val= _idx =C2=A0 | t =C2=A0 =C2=A0 =C2=A0 | t
> >>>> + test_re= plica_identity_partitioned_p2_1_id_val_idx | t =C2=A0 =C2=A0 =C2=A0 | t
= > >>>> + test_replica_identity_partitioned_p2_2_id_val_idx |= t =C2=A0 =C2=A0 =C2=A0 | t
> >>>> test_replica_identity_= partitioned_p2_id_val_idx =C2=A0 | t =C2=A0 =C2=A0 =C2=A0 | f
> >&= gt;>> test_replica_identity_partitioned_pkey =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0| t =C2=A0 =C2=A0 =C2=A0 | f
> >>>> (5 r= ows)
> >>>> ```
> >>>>
> >>= >> With this patch, the test passes and all replica identity are pres= erved.
> >>>>
> >>>> PFA v3:
> &g= t;>>> * Enhanced the test.
> >>>> * A small chan= ge in find_partition_replica_identity_indexes(): if we will not update a pa= rtition, then unlock it.
> >>>>
> >>>> = Best regards,
> >>>> --
> >>>> Chao Li = (Evan)
> >>>> HighGo Software Co., Ltd.
> >>&= gt;> https://www.h= ighgo.com/
> >>>>
> >>>>
> &g= t;>>>
> >>>>
> >>>> <v3-000= 1-tablecmds-fix-bug-where-index-rebuild-loses-repli.patch>
> >&= gt;>
> >>> The CF asked for a rebase, thus rebased as v4.=
> >>>
> >
> >
> > Hi, =C2=A0I re= produced this with the test case, and the patch appears
> > to res= olve it.
> >
> > Some comments on v5:
>
> Tha= nks a lot for your review.
>
> >
> > -- Whether it = makes sense to use a single list of pair structs instead
> > of tw= o parallel OID lists (replicaIdentityIndexOids +
> > replicaIdenti= tyTableOids) to avoid accidental desync.
>
> I don=E2=80=99t th= ink that helps much. The current code of rebuilding index uses two lists ch= angedIndexOids and changedIndexDefs. So, this patch matches the pattern of = the existing code.
>
> >
> > -- It would be better = to make lock handling in
> > find_partition_replica_identity_index= es() consistent
> > (relation_open(..., NoLock) if child is alread= y locked, and avoid
> > mixed relation_close(..., lockmode)/NoLock= behavior).
>
> That=E2=80=99s because if we are going to updat= e a partition, then we need to hold the lock on the partition.

=C2= =A0 There is one locking cleanup in find_partition_replica_identity_indexes= ().

=C2=A0 find_inheritance_children(relId, lockmode) already acquir= es lockmode on
=C2=A0 every partition it returns, so I think the later r= elation_open() should use
=C2=A0 NoLock, not lockmode. For the same reas= on, all relation_close() calls in
=C2=A0 this function should use NoLock= as well.

=C2=A0 Today the code does:

=C2=A0 partRel =3D rela= tion_open(partRelOid, lockmode);
=C2=A0 ...
=C2=A0 relation_close(par= tRel, lockmode);

=C2=A0 That does not cause a correctness issue, bec= ause the lock manager
=C2=A0 reference-counts same-transaction acquisiti= ons, so the lock remains held
=C2=A0 either way. But it is misleading: i= t suggests that relation_open() is where
=C2=A0 the partition lock is ta= ken, and that the early relation_close(..., lockmode)
=C2=A0 is intentio= nally releasing it. Neither is actually true here, because the lock
=C2= =A0 was already acquired by find_inheritance_children().

=C2=A0 So I= think this should be adjusted to:

=C2=A0 partRel =3D relation_open(= partRelOid, NoLock);

=C2=A0 and all close sites in this function sho= uld be:

=C2=A0 relation_close(partRel, NoLock);

=C2=A0 The co= mment on the early-close path should also be updated, since it is not
= =C2=A0 really unlocking the partition. Something like "No matching par= tition index;
=C2=A0 just close the relcache entry"=C2=A0would matc= h the actual behavior better.

> >
> > -- Some typos i= n comments/tests (partion/parition).
> >
>
> Fixed.>
> PFA v6: fixed a typo in comment.
>
> Best regards= ,
> --
> Chao Li (Evan)
> HighGo Software Co., Ltd.
&g= t; https://www.highgo= .com/
>
>
>
>


--
Best,
Xuneng=
--0000000000006d5111064dad0237--