Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wBMdX-000ylP-2H for pgsql-hackers@arkaria.postgresql.org; Sat, 11 Apr 2026 00:55:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wBMdU-00Eza3-2Z for pgsql-hackers@arkaria.postgresql.org; Sat, 11 Apr 2026 00:55:45 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wBMdU-00EzZu-15 for pgsql-hackers@lists.postgresql.org; Sat, 11 Apr 2026 00:55:45 +0000 Received: from mail-yx1-xb12b.google.com ([2607:f8b0:4864:20::b12b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wBMdS-00000000RXg-2wKe for pgsql-hackers@lists.postgresql.org; Sat, 11 Apr 2026 00:55:44 +0000 Received: by mail-yx1-xb12b.google.com with SMTP id 956f58d0204a3-6501c4857b2so2542715d50.3 for ; Fri, 10 Apr 2026 17:55:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775868940; cv=none; d=google.com; s=arc-20240605; b=RZAjGgkWoEQsAvz1Qbvxb8YUCaXbMWGXOBpyZ+3RNlWlbJnY4vGcL/Yup5WxQkYOga KJauLJAIXsI/+CEwAMSUZv3oJ8lcxpQHdEO0sQ0ICRepfcjyp7fHvQjRtZTDM2kzF8op 0GDG51uhsYw5N9nLF+hbhkQiA0w2Ecp5005z6MW5hhw5p+omtjV7x07+t1/2phxZ8FJE AlWsIZzNP4g+vr7lzI/IpVTLhEWOLg8uAsFFkaNgztndss8GCoKILdMJSVdXG25+Ck+N Wa2bvmpWmU3Xocj6zE+E76bb7VBaewiNyvUI3WZl6NLf/OglIB5v2FIqH3vLjivCch0U WcNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=WW+TNUOdaYKszstqNEPOIKuneonA4tkB4lUpPrFtULo=; fh=y6lKquhcm5NrRy7yNmyWLegA9XrpFL/l4pSJn9ZGrOw=; b=H0WAUfDEmGC/2HKjFIz+6Vy6XhMmn7kEoxsrXlnatzBoqU0rZgbhKFrv645fHvxzD/ o/zHtKLewLQqzyeee49zaiX26so9Yq0mp7D7y0pWYj8/GRVoL0vwuzmQfpH0W497VNt8 Tv88iPQRed3asAxMva0U5grpywcy/FyeKcymWH7SXDfLGk0jdGa7526qb/7m3m7a6Tca J4b3b4NJl9uaK7tJaydh0cmdtkV6qMs5bYEwUHAQ26WrLN1tx6TpH9e+fILBXdDrWGox ETRJTVJR9Iqxg7NOYhkyNQ5OIIqWzMYOLa5uPe/ha94f+L9Ws5WXUk7XRazRKwEWyOl5 s5GQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775868940; x=1776473740; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=WW+TNUOdaYKszstqNEPOIKuneonA4tkB4lUpPrFtULo=; b=lOUm7Jsz0G0Uko7GNSyqy3JWm96BiU88ouOEXPsGwCKgX8BCVoaMIjml7OG7dMV5Kw nIOqM8rGoInINtu5/VH9XpVt9I4qFBMUrmPqPf2Ha/qSgO4DsSQzjvhOTuKNaM+8VvOL HnHsLDt0MxUpdPQb0/dstZLv3fU+cfEree68aLy3nW2457GQSM7SO5VyO0OnjUYhAEJI 1L8Nlgr7/G1ieHFlUtcVN15yqgiTkiyV8QiIjo3cNVNM4pC15FotxxDTUPPnerGhrjId +3Wced60q5q8w8e8zDOgPNNYpYrE40hvn8uVDW1oZsu/uSxlkkXR6u38vjf3pVzNwKxf /5Vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775868940; x=1776473740; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WW+TNUOdaYKszstqNEPOIKuneonA4tkB4lUpPrFtULo=; b=o9nyCl3gPBFk7RUi+V7E1ME8Km52/fN8h/urR5T+TxDE25LrPvf7SWSUoGEhW5hnms HourgErS7VwixnSOcBjxGXxdM/9AafWTSr141vTSG9b4v5uEE9ztEf9nWubu5lSW+i5z F/mlu2fyeNWs/Ivj4+FGbfwKYdZ+Gn9Ai0Gcg1hjmM4CcRVmZmbN+QFrR6g/Jd1Snz32 blSASHRMFpUOyMCs5nH5wOoDRUxLRbx5tzDvunhwtlwxFZFFwOC8ZkKIxvO0jBPTlSXh HOAHioWIYTg5ZwVaT5US9h0svuImxL0NByG/71go6u8o5jsIGQj2i4OPoelLDMQmN+nQ OKDA== X-Gm-Message-State: AOJu0YxTV7ho4w9pH7ZGMbffP0GpLoGqvM0koDCWIAiSqR6N/2YDpiBL NwAFDqiYq8sOghDRAXKz/unMdeU8KuIoitvA+xKKz5mcpl+uzakqV5LUss6sgPqfiX/ntZ8dSK2 1o3ji11H5k5jLQQ0ZSeFGMlm2qbZmX1k= X-Gm-Gg: AeBDiespNs3P7mA4w6O0CrYqSslRo9gon1AcgG8Y6cvwdbFJ0Gg9J0lI4rAFltDF0bX MN2YEIPizLfTy+EnXPL0CERJo/Xb27MXtzx9aPpm174cLS6wCrKdzr9xIkXBIG09b3Sy/Rog6TP i8wVtTRx0R2qSzpExcz7Jn6r3jtaQSI8IhSeCxhNUrNUUpWLsOzlIj+MvrBfFffkhvcZBB9zHGX sK3e+YZJ+s9euUbn3z4mQzN3UjJJ8RMVHxPmTwh/IfhkK/eLPxID65GeWn+DGz4eeoZdaKPE1q0 c58Tyhc+Gf7H0slfaw7iumTYcSPeosD6jVLCRafP9cd5dHjRJ8s= X-Received: by 2002:a53:ad12:0:b0:64c:9ec3:d71a with SMTP id 956f58d0204a3-65198b82bc8mr3866901d50.48.1775868940290; Fri, 10 Apr 2026 17:55:40 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Haibo Yan Date: Fri, 10 Apr 2026 17:55:29 -0700 X-Gm-Features: AQROBzDreux3TsQJyfdcwN2bCJY49NOkjgCz9yESyC3v5LDegnsvsqxkk9le-eg Message-ID: Subject: Re: [PATCH] Fix: Partitioned parent index remains invalid after child indexes are repaired To: Mohamed ALi Cc: pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000005a8e0c064f24b556" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005a8e0c064f24b556 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Apr 8, 2026 at 11:17=E2=80=AFPM Mohamed ALi wr= ote: > Hi hackers, > > A partitioned (parent) index in PostgreSQL can become permanently > stuck with `indisvalid =3D false` even after all of its child partition > indexes have been repaired and are valid. There is no built-in > mechanism to re-validate the parent index after a child is fixed via > `REINDEX`. This affects all currently supported PostgreSQL versions > (13 through 18) > The root cause is that `validatePartitionedIndex()` =E2=80=94 the only > function that can mark a partitioned index as valid is never called > after `REINDEX` operations, and is skipped when re-running `ALTER > INDEX ATTACH PARTITION` on an already-attached index. > > How the Bug Manifests > > Typical Scenario : > 1. A partitioned table has multiple partitions. > 2. The user creates indexes on partitions concurrently. One fails (due > to deadlock, cancellation, timeout, etc.), leaving an invalid > partition index. > 3. A parent index is created (or the invalid index is attached to an > existing parent). The parent is correctly marked `indisvalid =3D false` > because at least one child is invalid. > 4. The user fixes the broken child index with `REINDEX INDEX CONCURRENTLY= `. > 5. The child index becomes valid (`indisvalid =3D true`). > 6. The parent index remains `indisvalid =3D false` permanently. No SQL > command can fix it. > > Reproduction steps: > > ```sql > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- SETUP: Partitioned table with two partitions and sample data > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > DROP TABLE IF EXISTS orders; > CREATE TABLE orders ( > id serial, > order_date date NOT NULL, > amount numeric > ) PARTITION BY RANGE (order_date); > CREATE TABLE orders_2023 PARTITION OF orders > FOR VALUES FROM ('2023-01-01') TO ('2024-01-01'); > CREATE TABLE orders_2024 PARTITION OF orders > FOR VALUES FROM ('2024-01-01') TO ('2025-01-01'); > INSERT INTO orders (order_date, amount) > SELECT d, random() * 1000 > FROM generate_series('2023-01-01'::date, '2023-12-31'::date, '1 day') d; > INSERT INTO orders (order_date, amount) > SELECT d, random() * 1000 > FROM generate_series('2024-01-01'::date, '2024-12-31'::date, '1 day') d; > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- STEP 1: Create parent index with ONLY (starts as invalid) > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > CREATE INDEX orders_amount_idx ON ONLY orders (amount); > -- Verify: parent index is invalid (no children attached yet) > SELECT c.relname, i.indisvalid > FROM pg_class c > JOIN pg_index i ON c.oid =3D i.indexrelid > WHERE c.relname LIKE 'orders%idx%' > ORDER BY c.relname; > -- Expected: > -- orders_amount_idx | f > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- STEP 2: Create valid index on first partition > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > CREATE INDEX CONCURRENTLY orders_2023_amount_idx ON orders_2023 (amount); > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- STEP 3: Create an INVALID index on second partition > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- In a separate session, hold a lock: > BEGIN; LOCK TABLE orders_2024 IN SHARE MODE; > -- Then in the main session: > SET statement_timeout =3D '1ms'; > CREATE INDEX CONCURRENTLY orders_2024_amount_idx ON orders_2024 (amount); > RESET statement_timeout; > -- it will fail/timeout, leaving an invalid index. > -- Verify state: > SELECT c.relname, i.indisvalid > FROM pg_class c > JOIN pg_index i ON c.oid =3D i.indexrelid > WHERE c.relname LIKE 'orders%idx%' > ORDER BY c.relname; > -- Expected: > -- orders_2023_amount_idx | t (valid) > -- orders_2024_amount_idx | f (invalid) > -- orders_amount_idx | f (invalid, created with ONLY) > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- STEP 4: Attach both partition indexes to the parent > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- Attach the invalid one first > ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx; > -- Succeeds. Parent stays invalid (correct =E2=80=94 child is invalid). > -- Attach the valid one > ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2023_amount_idx; > -- Succeeds. Parent still invalid (correct =E2=80=94 one child still inva= lid). > -- Verify attachment and validity: > SELECT c.relname, i.indisvalid, > pg_get_indexdef(i.indexrelid) AS indexdef > FROM pg_class c > JOIN pg_index i ON c.oid =3D i.indexrelid > WHERE c.relname LIKE 'orders%amount%' > ORDER BY c.relname; > -- Expected: > -- orders_2023_amount_idx | t > -- orders_2024_amount_idx | f > -- orders_amount_idx | f > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- STEP 5: Fix the invalid child index via REINDEX > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > REINDEX INDEX CONCURRENTLY orders_2024_amount_idx; > -- Verify: child is now valid > SELECT c.relname, i.indisvalid > FROM pg_class c > JOIN pg_index i ON c.oid =3D i.indexrelid > WHERE c.relname LIKE 'orders%amount%' > ORDER BY c.relname; > -- ACTUAL (buggy) result: > -- orders_2023_amount_idx | t (valid) > -- orders_2024_amount_idx | t (valid =E2=80=94 fixed by REINDEX) > -- orders_amount_idx | f (STILL INVALID =E2=80=94 this is the bug= !) > -- > -- EXPECTED result (if bug were fixed): > -- orders_2023_amount_idx | t > -- orders_2024_amount_idx | t > -- orders_amount_idx | t (should be valid now) > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- STEP 6: Demonstrate that re-running ATTACH does not help > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx; > -- Returns "ALTER INDEX" (succeeds silently, does nothing) > SELECT c.relname, i.indisvalid > FROM pg_class c > JOIN pg_index i ON c.oid =3D i.indexrelid > WHERE c.relname LIKE 'orders%amount%' > ORDER BY c.relname; > -- Parent is STILL invalid. The "silently do nothing" path > -- skips validatePartitionedIndex() entirely. > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > -- CLEANUP > -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > DROP TABLE orders; > ``` > > > Root Cause Analysis: > > Where `validatePartitionedIndex()` Is Called > > The function is called in exactly these code paths: > 1. During `ALTER INDEX ... ATTACH PARTITION` =E2=80=94 inside > `ATExecAttachPartitionIdx()` > 2. During `ALTER TABLE ... ATTACH PARTITION` =E2=80=94 via > `AttachPartitionEnsureIndexes()` > 3. During `CREATE INDEX` on partitioned tables =E2=80=94 via `DefineIndex= ()` > It is NOT called: > - After `REINDEX` of a partitioned index > - During any maintenance operation > - As any periodic validation check > > Bug 1: REINDEX Does Not Validate Parent > > > When `reindex_index()` in `src/backend/catalog/index.c` marks a > partition index as valid (setting `indisvalid =3D true`), it does not > check whether the parent partitioned index should also become valid. > The function simply updates the child's `pg_index` entry and returns. > > Bug 2: Re-running ATTACH Skips Validation > > > In `ATExecAttachPartitionIdx()` (tablecmds.c, around line 21923 in PG > 16 / line ~22900 in HEAD): > > https://github.com/postgres/postgres/blob/master/src/backend/commands/tab= lecmds.c#L21923 > > ```c > /* Silently do nothing if already in the right state */ > currParent =3D partIdx->rd_rel->relispartition ? > get_partition_parent(partIdxId, false) : InvalidOid; > if (currParent !=3D RelationGetRelid(parentIdx)) > { > // ... all validation checks and attachment logic ... > validatePartitionedIndex(parentIdx, parentTbl); // ONLY called here > } > // If already attached, entire block is skipped =E2=80=94 no validation! > ``` > > When the child is already attached (`currParent =3D=3D parentIdx`), the > condition is false, the entire if-block is skipped, and > `validatePartitionedIndex()` is never called. The comment "Silently do > nothing if already in the right state" is misleading "already > attached" does not mean "parent validity is correct." > > Proposed Fixes: > > Fix 1 : Always Validate Parent Index in ALTER INDEX ATTACH PARTITION > > Patch File : 0001-Always-validate-parent-index-in-ALTER-INDEX-ATTACH.patc= h > > Move the validatePartitionedIndex() call outside the if-block so it runs > unconditionally =E2=80=94 both when a new attachment is made and when the > partition is > already attached. This provides a user-accessible recovery path: after > fixing a > child index with REINDEX, re-running ALTER INDEX ATTACH PARTITION trigger= s > parent validation. > > When the partition is already attached, a NOTICE is emitted: > > NOTICE: partition index "child_idx" is already attached to > "parent_idx", validating parent index > > > This follows PostgreSQL's existing convention of using NOTICE for > informational messages about no-op or reduced-scope operations (e.g., > DROP TABLE IF EXISTS, CREATE INDEX IF NOT EXISTS). It tells the user: > > 1- Nothing went wrong > 2- The index was already attached (so they know the state) > 3- Validation still happened (so they know the fix path works) > > > Fix 2: Validate Parent Partitioned Index After REINDEX of Child > > Patch File : 0001-Validate-parent-partitioned-index-after-REINDEX.patch > > Same underlying bug but this patch addresses it from the > REINDEX side. When a partition index is repaired via REINDEX or > REINDEX CONCURRENTLY, the parent partitioned index remains permanently > stuck with indisvalid =3D false even though all children are now valid. > > This is because validatePartitionedIndex() =E2=80=94 the only function th= at can > mark a partitioned index as valid is never called from any REINDEX code > path. > > > validatePartitionedIndex() is only called during: > > 1- ALTER INDEX ... ATTACH PARTITION (tablecmds.c) > 2- ALTER TABLE ... ATTACH PARTITION (tablecmds.c) > 3- CREATE INDEX on partitioned tables (indexcmds.c) > > It is NOT called after: > > 1- REINDEX INDEX (regular) =E2=80=94 handled by reindex_index() in index.= c > 2- REINDEX INDEX CONCURRENTLY =E2=80=94 handled by ReindexRelationConcurr= ently() > > in indexcmds.c, which uses index_concurrently_swap() in index.c > > Three changes are made: > > 1. Make validatePartitionedIndex() public > The function was static in tablecmds.c. It is now exported via > tablecmds.h so it can be called from index.c and indexcmds.c. > > Files changed: > > src/backend/commands/tablecmds.c =E2=80=94 remove static, update comment > src/include/commands/tablecmds.h =E2=80=94 add extern declaration > > 2. Call from reindex_index() (regular REINDEX) > After reindex_index() marks a partition index as valid (indisvalid =3D tr= ue), > check if the index is a partition (iRel->rd_rel->relispartition) and if s= o, > look up the parent and call validatePartitionedIndex(). > > A CommandCounterIncrement() is required before the call so that the child= 's > updated indisvalid is visible to the syscache lookup that > validatePartitionedIndex() performs internally. > > File changed: src/backend/catalog/index.c > > 3. Call from ReindexRelationConcurrently() (REINDEX CONCURRENTLY) > REINDEX CONCURRENTLY uses a completely different code path: it creates a > new > index, builds it concurrently, then swaps it with the old one via > index_concurrently_swap(). The new index inherits the old index's partiti= on > status during the swap. > > After the swap and the existing CommandCounterIncrement() (which makes th= e > swap visible), check if the new index is a partition and call > validatePartitionedIndex() on the parent. > > File changed: src/backend/commands/indexcmds.c > > Multi-level Hierarchy Support > validatePartitionedIndex() already handles multi-level partition > hierarchies. > When it marks a mid-level parent valid, it checks if that parent is itsel= f > a > partition and recursively validates the grandparent. No additional > recursion > logic is needed in the REINDEX patches. > > > Thanks, > Mohamed Ali > Senior DBE > AWS RDS > Hi, Mohamed Thanks for working on this. I went through the problem statement and the two proposed fixes. I agree that the underlying issue looks real: after repairing an invalid child index with REINDEX, the parent partitioned index can remain stuck in an invalid state because validatePartitionedIndex() is not reached from the relevant REINDEX paths. The analysis around ATExecAttachPartitionIdx() also looks correct: when the child index is already attached, the current code takes the no-op path and skips the validation call entirely. Overall, I think this is worth fixing, but I do not view the two patches equally. I think patch 2 is the real fix. The state transition that matters here is that a child index goes from invalid to valid, and the natural place to trigger parent revalidation is where that transition actually happens, namely in REINDEX. By contrast, patch 1 feels more like a secondary hardening/workaround path: it makes ALTER INDEX ... ATTACH PARTITION retry parent validation even in the already-attached case, but that is not really where the underlying state change happens. My main comments are below. 1. Patch 2 should be treated as the primary fix This seems like the correct place to repair the catalog state. If REINDEX repairs a partition index that was previously invalid, and that index is attached to a partitioned parent index, then rechecking the parent with validatePartitionedIndex() is a reasonable and direct solution. I also think it is good that the patch covers both the regular REINDEX path and the REINDEX CONCURRENTLY path. Those are distinct enough that both need explicit attention, and the extra CommandCounterIncrement() before revalidation also seems necessary. So at a high level, this patch makes sense to me. 2. I am less convinced by patch 1 in its current form The main issue here is not correctness, but design and placement. Once the child is already attached, ALTER INDEX ... ATTACH PARTITION is conceptually supposed to be a no-op. With this patch, it becomes a generic =E2=80=9Cretry parent validation=E2=80=9D hook. That means users can run an= attach command that does not change attachment state at all, yet still triggers a full parent validation attempt. That is especially questionable if there are still other invalid child indexes elsewhere under the same parent. In that case, each already-attached ATTACH command will do the validation work again, but it is already known in advance that the parent still cannot become valid. So this is not =E2=80=9Creinvalidating=E2=80=9D anything, but it is repeated c= hecking with no state change, which feels misplaced on a no-op path. Because of that, I do not think patch 1 should be the main bug fix. At most, I would see it as an optional hardening patch. 3. The NOTICE added by patch 1 does not seem like a good fit The existing code explicitly says =E2=80=9CSilently do nothing if already i= n the right state=E2=80=9D. Changing that into a NOTICE every time we hit the already-attached case seems noisy to me. If the community decides that the validation call should stay in this path, I would still suggest dropping the NOTICE. The command is syntactically an ATTACH command, not a repair command, and emitting a message for an otherwise no-op case does not feel very PostgreSQL-like. 4. Please double-check coverage of all REINDEX entry points I agree with the direction in patch 2, but I think reviewers will want confidence that all relevant REINDEX flows are covered consistently. For example, it would be good to confirm that the fix behaves correctly not just for REINDEX INDEX, but also for the broader forms that eventually reach the same logic, such as REINDEX TABLE, and that there is no alternate path where a repaired child index can still leave the parent stale. 5. Multi-level partition trees need explicit testing One especially important point here is recursion. If a repaired child index causes its immediate parent partitioned index to become valid, and that parent is itself a child of another partitioned index, we need to be sure that validity propagates all the way up as intended. The current reasoning suggests that validatePartitionedIndex() already handles that, but this is important enough that it should be demonstrated with a regression test, not just assumed. Minor comments: - In patch 1, I would avoid turning the already-attached path into a behavioral special case unless there is a strong reason to keep it. It would be cleaner if the fix relied primarily on the actual state-changin= g paths. - If patch 1 remains, I would remove the NOTICE. - The commit message for patch 2 should clearly explain why REINDEX is the right place to do this, rather than making ATTACH PARTITION the repair mechanism. - It may also help to mention explicitly whether the extra validation call is only intended for indexes that were previously invalid and have just been repaired, since that is an important part of why the patch is reasonably narrow. I do not think the current patches are complete without regression coverage. At minimum, I think the following tests should be added: 1. Regular REINDEX case Create a partitioned table with a parent index left invalid initially, then attach child indexes such that one child index is invalid. Repair t= hat child with plain REINDEX INDEX, and verify that the child becomes valid and the parent also becomes valid. 2. REINDEX CONCURRENTLY case Same setup, but use REINDEX INDEX CONCURRENTLY. This should be tested separately because the code path is different. 3. Multi-level partition hierarchy Use at least a three-level hierarchy and verify that repairing a leaf-level invalid child index can cause validity to propagate upward through intermediate partitioned indexes. 4. Negative case Repair one child index while another child index under the same parent remains invalid, and verify that the parent remains invalid. 5. Already-attached ATTACH PARTITION case Only if patch 1 is kept: exercise ALTER INDEX ... ATTACH PARTITION on a child index that is already attached, and verify both behaviors: - parent becomes valid if all children are now valid - parent stays invalid if some other child is still invalid My overall view is: - The bug itself looks real. - The diagnosis looks sound. - Patch 2 is the right direction and should be the primary fix. - Patch 1 is much less compelling as written, and I would not take it as the main solution. - The series needs regression tests before it is ready. Regards, Haibo --0000000000005a8e0c064f24b556 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Wed, Apr 8, 2026 at 11:17=E2=80=AFPM M= ohamed ALi <moali.pg@gmail.com= > wrote:
Hi hackers,

A partitioned (parent) index in PostgreSQL can become permanently
stuck with `indisvalid =3D false` even after all of its child partition
indexes have been repaired and are valid. There is no built-in
mechanism to re-validate the parent index after a child is fixed via
`REINDEX`. This affects all currently supported PostgreSQL versions
(13 through 18)
The root cause is that `validatePartitionedIndex()` =E2=80=94 the only
function that can mark a partitioned index as valid is never called
after `REINDEX` operations, and is skipped when re-running `ALTER
INDEX ATTACH PARTITION` on an already-attached index.

How the Bug Manifests

Typical Scenario :
1. A partitioned table has multiple partitions.
2. The user creates indexes on partitions concurrently. One fails (due
to deadlock, cancellation, timeout, etc.), leaving an invalid
partition index.
3. A parent index is created (or the invalid index is attached to an
existing parent). The parent is correctly marked `indisvalid =3D false`
because at least one child is invalid.
4. The user fixes the broken child index with `REINDEX INDEX CONCURRENTLY`.=
5. The child index becomes valid (`indisvalid =3D true`).
6. The parent index remains `indisvalid =3D false` permanently. No SQL
command can fix it.

Reproduction steps:

```sql
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- SETUP: Partitioned table with two partitions and sample data
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
DROP TABLE IF EXISTS orders;
CREATE TABLE orders (
=C2=A0 =C2=A0 id serial,
=C2=A0 =C2=A0 order_date date NOT NULL,
=C2=A0 =C2=A0 amount numeric
) PARTITION BY RANGE (order_date);
CREATE TABLE orders_2023 PARTITION OF orders
=C2=A0 =C2=A0 FOR VALUES FROM ('2023-01-01') TO ('2024-01-01= 9;);
CREATE TABLE orders_2024 PARTITION OF orders
=C2=A0 =C2=A0 FOR VALUES FROM ('2024-01-01') TO ('2025-01-01= 9;);
INSERT INTO orders (order_date, amount)
SELECT d, random() * 1000
FROM generate_series('2023-01-01'::date, '2023-12-31'::date= , '1 day') d;
INSERT INTO orders (order_date, amount)
SELECT d, random() * 1000
FROM generate_series('2024-01-01'::date, '2024-12-31'::date= , '1 day') d;
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- STEP 1: Create parent index with ONLY (starts as invalid)
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
CREATE INDEX orders_amount_idx ON ONLY orders (amount);
-- Verify: parent index is invalid (no children attached yet)
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid =3D i.indexrelid
WHERE c.relname LIKE 'orders%idx%'
ORDER BY c.relname;
-- Expected:
--=C2=A0 orders_amount_idx | f
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- STEP 2: Create valid index on first partition
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
CREATE INDEX CONCURRENTLY orders_2023_amount_idx ON orders_2023 (amount); -- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- STEP 3: Create an INVALID index on second partition
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- In a separate session, hold a lock:
BEGIN; LOCK TABLE orders_2024 IN SHARE MODE;
-- Then in the main session:
SET statement_timeout =3D '1ms';
CREATE INDEX CONCURRENTLY orders_2024_amount_idx ON orders_2024 (amount); RESET statement_timeout;
-- it will fail/timeout, leaving an invalid index.
-- Verify state:
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid =3D i.indexrelid
WHERE c.relname LIKE 'orders%idx%'
ORDER BY c.relname;
-- Expected:
--=C2=A0 orders_2023_amount_idx | t=C2=A0 =C2=A0(valid)
--=C2=A0 orders_2024_amount_idx | f=C2=A0 =C2=A0(invalid)
--=C2=A0 orders_amount_idx=C2=A0 =C2=A0 =C2=A0 | f=C2=A0 =C2=A0(invalid, cr= eated with ONLY)
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- STEP 4: Attach both partition indexes to the parent
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- Attach the invalid one first
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx;
-- Succeeds. Parent stays invalid (correct =E2=80=94 child is invalid).
-- Attach the valid one
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2023_amount_idx;
-- Succeeds. Parent still invalid (correct =E2=80=94 one child still invali= d).
-- Verify attachment and validity:
SELECT c.relname, i.indisvalid,
=C2=A0 =C2=A0 =C2=A0 =C2=A0pg_get_indexdef(i.indexrelid) AS indexdef
FROM pg_class c
JOIN pg_index i ON c.oid =3D i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- Expected:
--=C2=A0 orders_2023_amount_idx | t
--=C2=A0 orders_2024_amount_idx | f
--=C2=A0 orders_amount_idx=C2=A0 =C2=A0 =C2=A0 | f
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- STEP 5: Fix the invalid child index via REINDEX
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
REINDEX INDEX CONCURRENTLY orders_2024_amount_idx;
-- Verify: child is now valid
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid =3D i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- ACTUAL (buggy) result:
--=C2=A0 orders_2023_amount_idx | t=C2=A0 =C2=A0(valid)
--=C2=A0 orders_2024_amount_idx | t=C2=A0 =C2=A0(valid =E2=80=94 fixed by R= EINDEX)
--=C2=A0 orders_amount_idx=C2=A0 =C2=A0 =C2=A0 | f=C2=A0 =C2=A0(STILL INVAL= ID =E2=80=94 this is the bug!)
--
-- EXPECTED result (if bug were fixed):
--=C2=A0 orders_2023_amount_idx | t
--=C2=A0 orders_2024_amount_idx | t
--=C2=A0 orders_amount_idx=C2=A0 =C2=A0 =C2=A0 | t=C2=A0 =C2=A0(should be v= alid now)
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- STEP 6: Demonstrate that re-running ATTACH does not help
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
ALTER INDEX orders_amount_idx ATTACH PARTITION orders_2024_amount_idx;
-- Returns "ALTER INDEX" (succeeds silently, does nothing)
SELECT c.relname, i.indisvalid
FROM pg_class c
JOIN pg_index i ON c.oid =3D i.indexrelid
WHERE c.relname LIKE 'orders%amount%'
ORDER BY c.relname;
-- Parent is STILL invalid. The "silently do nothing" path
-- skips validatePartitionedIndex() entirely.
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
-- CLEANUP
-- =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
DROP TABLE orders;
```


Root Cause Analysis:

Where `validatePartitionedIndex()` Is Called

The function is called in exactly these code paths:
1. During `ALTER INDEX ... ATTACH PARTITION` =E2=80=94 inside
`ATExecAttachPartitionIdx()`
2. During `ALTER TABLE ... ATTACH PARTITION` =E2=80=94 via
`AttachPartitionEnsureIndexes()`
3. During `CREATE INDEX` on partitioned tables =E2=80=94 via `DefineIndex()= `
It is NOT called:
- After `REINDEX` of a partitioned index
- During any maintenance operation
- As any periodic validation check

Bug 1: REINDEX Does Not Validate Parent


When `reindex_index()` in `src/backend/catalog/index.c` marks a
partition index as valid (setting `indisvalid =3D true`), it does not
check whether the parent partitioned index should also become valid.
The function simply updates the child's `pg_index` entry and returns.
Bug 2: Re-running ATTACH Skips Validation


In `ATExecAttachPartitionIdx()` (tablecmds.c, around line 21923 in PG
16 / line ~22900 in HEAD):
https://gith= ub.com/postgres/postgres/blob/master/src/backend/commands/tablecmds.c#L2192= 3

```c
/* Silently do nothing if already in the right state */
currParent =3D partIdx->rd_rel->relispartition ?
=C2=A0 =C2=A0 get_partition_parent(partIdxId, false) : InvalidOid;
if (currParent !=3D RelationGetRelid(parentIdx))
{
=C2=A0 =C2=A0 // ... all validation checks and attachment logic ...
=C2=A0 =C2=A0 validatePartitionedIndex(parentIdx, parentTbl);=C2=A0 // ONLY= called here
}
// If already attached, entire block is skipped =E2=80=94 no validation! ```

When the child is already attached (`currParent =3D=3D parentIdx`), the
condition is false, the entire if-block is skipped, and
`validatePartitionedIndex()` is never called. The comment "Silently do=
nothing if already in the right state" is misleading=C2=A0 "alrea= dy
attached" does not mean "parent validity is correct."

Proposed Fixes:

Fix 1 : Always Validate Parent Index in ALTER INDEX ATTACH PARTITION

Patch File : 0001-Always-validate-parent-index-in-ALTER-INDEX-ATTACH.patch<= br>
Move the validatePartitionedIndex() call outside the if-block so it runs unconditionally =E2=80=94 both when a new attachment is made and when the p= artition is
already attached. This provides a user-accessible recovery path: after fixi= ng a
child index with REINDEX, re-running ALTER INDEX ATTACH PARTITION triggers<= br> parent validation.

When the partition is already attached, a NOTICE is emitted:

NOTICE:=C2=A0 partition index "child_idx" is already attached to<= br> "parent_idx", validating parent index


This follows PostgreSQL's existing convention of using NOTICE for
informational messages about no-op or reduced-scope operations (e.g.,
DROP TABLE IF EXISTS, CREATE INDEX IF NOT EXISTS). It tells the user:

1- Nothing went wrong
2- The index was already attached (so they know the state)
3-=C2=A0 Validation still happened (so they know the fix path works)


Fix 2: Validate Parent Partitioned Index After REINDEX of Child

Patch File : 0001-Validate-parent-partitioned-index-after-REINDEX.patch

Same underlying bug but this patch addresses it from the
REINDEX side. When a partition index is repaired via REINDEX or
REINDEX CONCURRENTLY, the parent partitioned index remains permanently
stuck with indisvalid =3D false even though all children are now valid.

This is because validatePartitionedIndex() =E2=80=94 the only function that= can
mark a partitioned index as valid is never called from any REINDEX code
path.


validatePartitionedIndex() is only called during:

1- ALTER INDEX ... ATTACH PARTITION (tablecmds.c)
2- ALTER TABLE ... ATTACH PARTITION (tablecmds.c)
3- CREATE INDEX on partitioned tables (indexcmds.c)

It is NOT called after:

1- REINDEX INDEX (regular) =E2=80=94 handled by reindex_index() in index.c<= br> 2- REINDEX INDEX CONCURRENTLY =E2=80=94 handled by ReindexRelationConcurren= tly()

in indexcmds.c, which uses index_concurrently_swap() in index.c

Three changes are made:

1. Make validatePartitionedIndex() public
The function was static in tablecmds.c. It is now exported via
tablecmds.h so it can be called from index.c and indexcmds.c.

Files changed:

src/backend/commands/tablecmds.c =E2=80=94 remove static, update comment src/include/commands/tablecmds.h =E2=80=94 add extern declaration

2. Call from reindex_index() (regular REINDEX)
After reindex_index() marks a partition index as valid (indisvalid =3D true= ),
check if the index is a partition (iRel->rd_rel->relispartition) and = if so,
look up the parent and call validatePartitionedIndex().

A CommandCounterIncrement() is required before the call so that the child&#= 39;s
updated indisvalid is visible to the syscache lookup that
validatePartitionedIndex() performs internally.

File changed: src/backend/catalog/index.c

3. Call from ReindexRelationConcurrently() (REINDEX CONCURRENTLY)
REINDEX CONCURRENTLY uses a completely different code path: it creates a ne= w
index, builds it concurrently, then swaps it with the old one via
index_concurrently_swap(). The new index inherits the old index's parti= tion
status during the swap.

After the swap and the existing CommandCounterIncrement() (which makes the<= br> swap visible), check if the new index is a partition and call
validatePartitionedIndex() on the parent.

File changed: src/backend/commands/indexcmds.c

Multi-level Hierarchy Support
validatePartitionedIndex() already handles multi-level partition hierarchie= s.
When it marks a mid-level parent valid, it checks if that parent is itself = a
partition and recursively validates the grandparent. No additional recursio= n
logic is needed in the REINDEX patches.


Thanks,
Mohamed Ali
Senior DBE
AWS RDS

Hi, Mohamed

Thanks for working on this. I went through the proble= m statement and the two proposed fixes. I agree that the underlying issue l= ooks real: after repairing an invalid child index with REINDEX, the parent = partitioned index can remain stuck in an invalid state because validatePartitionedIndex() is not reached from the rel= evant REINDEX paths. The analysis around ATExecAtt= achPartitionIdx() also looks correct: when the child index is alread= y attached, the current code takes the no-op path and skips the validation = call entirely.

Overall, I think this is worth fixing, but I do not v= iew the two patches equally.

I think patch 2 is the real fix. The state transition= that matters here is that a child index goes from invalid to valid, and th= e natural place to trigger parent revalidation is where that transition act= ually happens, namely in REINDEX. By contrast, patch 1 feels more like a se= condary hardening/workaround path: it makes ALTER = INDEX ... ATTACH PARTITION retry parent validation even in the alrea= dy-attached case, but that is not really where the underlying state change = happens.

My main comments are below.

1. Patch 2 should= be treated as the primary fix

This seems like the correct place to repair the catal= og state. If REINDEX repairs a partition index that was previously invalid,= and that index is attached to a partitioned parent index, then rechecking = the parent with validatePartitionedIndex() = is a reasonable and direct solution.

I also think it is good that the patch covers both th= e regular REINDEX path and the REINDEX CONCURRENTL= Y path. Those are distinct enough that both need explicit attention,= and the extra CommandCounterIncrement() be= fore revalidation also seems necessary.

So at a high level, this patch makes sense to me.

2. I am less conv= inced by patch 1 in its current form

The main issue here is not correctness, but design an= d placement.

Once the child is already attached, ALTER INDEX ... ATTACH PARTITION is conceptually supposed to= be a no-op. With this patch, it becomes a generic =E2=80=9Cretry parent va= lidation=E2=80=9D hook. That means users can run an attach command that doe= s not change attachment state at all, yet still triggers a full parent vali= dation attempt.

That is especially questionable if there are still ot= her invalid child indexes elsewhere under the same parent. In that case, ea= ch already-attached ATTACH command will do the validation work again, but i= t is already known in advance that the parent still cannot become valid. So= this is not =E2=80=9Creinvalidating=E2=80=9D anything, but it is repeated = checking with no state change, which feels misplaced on a no-op path.

Because of that, I do not think patch 1 should be the= main bug fix. At most, I would see it as an optional hardening patch.

3. The NOTICE add= ed by patch 1 does not seem like a good fit

The existing code explicitly says =E2=80=9CSilently d= o nothing if already in the right state=E2=80=9D. Changing that into a NOTI= CE every time we hit the already-attached case seems noisy to me.

If the community decides that the validation call sho= uld stay in this path, I would still suggest dropping the NOTICE. The comma= nd is syntactically an ATTACH command, not a repair command, and emitting a= message for an otherwise no-op case does not feel very PostgreSQL-like.

4. Please double-= check coverage of all REINDEX entry points

I agree with the direction in patch 2, but I think re= viewers will want confidence that all relevant REINDEX flows are covered co= nsistently.

For example, it would be good to confirm that the fix= behaves correctly not just for REINDEX INDEX, but also for the broader forms that eventually reach the same logic, su= ch as REINDEX TABLE, and that there is no a= lternate path where a repaired child index can still leave the parent stale= .

5. Multi-level pa= rtition trees need explicit testing

One especially imp= ortant point here is recursion. If a repaired child index causes its immedi= ate parent partitioned index to become valid, and that parent is itself a c= hild of another partitioned index, we need to be sure that validity propaga= tes all the way up as intended.

The current reasoning suggests that validatePartitionedIndex() already handles that, but this is= important enough that it should be demonstrated with a regression test, no= t just assumed.

Minor comments:

  • In patch 1, I would avoid turning the already-attache= d path into a behavioral special case unless there is a strong reason to ke= ep it. It would be cleaner if the fix relied primarily on the actual state-= changing paths.

  • If patch 1 remains, I would remove the NOTICE.

  • The commit message for patch 2 should clearly explain= why REINDEX is the right place to do this, rather than making ATTACH PARTI= TION the repair mechanism.

  • It may also help to mention explicitly whether the ex= tra validation call is only intended for indexes that were previously inval= id and have just been repaired, since that is an important part of why the = patch is reasonably narrow.

I do not think the current patches are complete witho= ut regression coverage. At minimum, I think the following tests should be a= dded:

    <= li>

    Regular REINDEX case

    Create a partitioned table with a parent index left i= nvalid initially, then attach child indexes such that one child index is in= valid. Repair that child with plain REINDEX INDEX<= /span>, and verify that the child becomes valid and the parent also becomes= valid.

  1. REINDEX CONCURRENTLY case

    Same setup, but use REINDEX = INDEX CONCURRENTLY. This should be tested separately because the cod= e path is different.

  2. Multi-level partition hierarchy

    Use at least a three-level hierarchy and verify that = repairing a leaf-level invalid child index can cause validity to propagate = upward through intermediate partitioned indexes.

  3. Negative case

    Repair one child index while another child index unde= r the same parent remains invalid, and verify that the parent remains inval= id.

  4. Already-attached ATTACH PARTITION case

    Only if patch 1 is kept: exercise ALTER INDEX ... ATTACH PARTITION on a child index that is alre= ady attached, and verify both behaviors:

  • parent becomes valid if all children are now valid

  • parent stays invalid if some other child is still inv= alid

My overall view is:

  • The bug i= tself looks real.
  • The diagnosis looks sound.
  • Patch 2 is the= right direction and should be the primary fix.
  • Patch 1 is much les= s compelling as written, and I would not take it as the main solution.
  • =
  • The series needs regression tests before it is ready.

Regards,

Haibo=C2=A0
--0000000000005a8e0c064f24b556--