Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vQP5m-006ncN-2w for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Dec 2025 12:02:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vQP5l-007Y92-1B for pgsql-hackers@arkaria.postgresql.org; Tue, 02 Dec 2025 12:02:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vQP5k-007Y8u-2z for pgsql-hackers@lists.postgresql.org; Tue, 02 Dec 2025 12:02:49 +0000 Received: from mail-lj1-x232.google.com ([2a00:1450:4864:20::232]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vQP5i-002jzT-1M for pgsql-hackers@postgresql.org; Tue, 02 Dec 2025 12:02:48 +0000 Received: by mail-lj1-x232.google.com with SMTP id 38308e7fff4ca-37b95f87d64so46433551fa.2 for ; Tue, 02 Dec 2025 04:02:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764676965; x=1765281765; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=VwxLExduLWWN/Ie+B/7HaiPKkqQc8jm8bPFOAGzS6hM=; b=OUxCbJ+pKkSe2iJ5oQ+eg/IphXBa+93Q7zOTvStB8WAOOz0MF2YWGD1uQemz2ERYBp mPecVC+UJeV5dFls/JJJpYczp/UcxQ1waJlmRKNE0zOOkHjqFvL+RkPiePD12di36uc+ qBmB6gG2YEGlwjQ7fn6cXlVbzVLvamu5FPg/QYGjf6H4jbAiue7lqYUx+NN+e8f+6j0j vX9SbzT7D1tShKwg6yfC0TZuKje/urKmfdaA91pH1yGBT+PDsy8HBP9t4Zu5tzj90dKt w8aIIhUeyg3jKOCfpcfuSf7HINonBQClmevmaEyR8dNPI9aH4XO80WRopTbmt0jN2X/m SInw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764676965; x=1765281765; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=VwxLExduLWWN/Ie+B/7HaiPKkqQc8jm8bPFOAGzS6hM=; b=J/LvoK5G3ZWmdbiMd68P3X9pu3oBIo1iA/9zNd47zsQhq1DsKIBW/ivRgVzYzEVCv7 v+IDXAXilro9KECnzrMxoSYJkGp1T7BWmvxdaXJ29Auy0CSvHQeNcN2Hkt13xNIuqQ3J zYYynLVLXkUe5ztDnPkfPXgCoyrDlAD5gth25z6ALpPHhWdNfkITKsNGkuizi/sHSfhB 9sS2oa2JSSyPkm3+MKSMR5reiCj0mueG00QkM8uhp8ucKngvRJGkoIiot7KCA55wbgTO jqk6lF03n4GHeVlZQ9SKIBtbpi3EFNXNQer3cu/ckEE/Hdn3uUbP5qgDAal5rAmGMOoZ dkCQ== X-Forwarded-Encrypted: i=1; AJvYcCWm9HYgBSRy1/XsdyqhM0fvHHjpYidUV7mgyuaNGnKrbjmgaOjrzzYvuKCVTRlyjts4Y8boi19bzfuXyXsA@postgresql.org X-Gm-Message-State: AOJu0Yyy0gWua8AadIHr8tH3iFWihtQihsOCpCakahCLtF/2eNihhPo1 krEUJZZykjJUguHZ9PCGJeBgVRnKLbWwHRJb72HJgOTWh3J2/rF/nTId3nEODLfIyeBGOdhQtcR w+ggXGI2yoIHL64szVB7pDjNE+N1qRJs= X-Gm-Gg: ASbGnctYrBcRcvSXZdZRM+FCyVpraLS7/KtwzlI4eRNVCo9jBH93sJe7LstNXmXzaVE XWn2wO0I5u61VMsh7PRUJbpPARJ1Yhc3wnmOGV/c5k7wRtFqSPWtiODo7nOWNQEEwQAhwiyX/rA LBYJWZ1L8aUWsybhhEdk0G7RRpd3/VCwx8yruyErgtYWOmkOq9ZLuzTaDfMNSebfNobcKwdOPvv 8jS94ji1ccomV+qAlzlThk2f97S3+1vZihGaaNVjVWnX0ct7/bFZNV4qRDMQ7bmlVPQyU+uZenv W1dLIUSqbCMAefkidTsnZTITT/0ICexwhy0ZdZ4f6W8nfoIvmJ3otn+Fk4GNmdhQTRBd X-Google-Smtp-Source: AGHT+IGVv0F5NsMOv0zcpxK8bFG6ynpHh5+FJbHWYR9ngIsd9NZiEkzWXQWmKjlv83rN4wAUYzMsp5hUBcHzALknDHc= X-Received: by 2002:a2e:a00e:0:10b0:37a:5cb7:968f with SMTP id 38308e7fff4ca-37d07893bbfmr74202681fa.29.1764676965057; Tue, 02 Dec 2025 04:02:45 -0800 (PST) MIME-Version: 1.0 References: <202505181556.3n6oiowvntyr@alvherre.pgsql> In-Reply-To: From: Matthias van de Meent Date: Tue, 2 Dec 2025 13:02:32 +0100 X-Gm-Features: AWmQ_bnQj1e_va2szb8kov_Xopn4gOu1ja2yVAefvNsyB18ZtNy31PEUJnj9f5Q Message-ID: Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements To: Hannu Krosing Cc: Mihail Nikalayeu , Sergey Sargsyan , =?UTF-8?Q?=C3=81lvaro_Herrera?= , Andres Freund , Michael Paquier , PostgreSQL Hackers , Andrey Borodin , Melanie Plageman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 28 Nov 2025 at 20:08, Hannu Krosing wrote: > > On Fri, Nov 28, 2025 at 7:31=E2=80=AFPM Matthias van de Meent > wrote: > > > > On Fri, 28 Nov 2025 at 18:58, Hannu Krosing wrote: > > > > > > On Fri, Nov 28, 2025 at 5:58=E2=80=AFPM Matthias van de Meent > > > wrote: > > > > > > > ... > > > > I'm a bit worried, though, that LR may lose updates due to commit > > > > order differences between WAL and PGPROC. I don't know how that's > > > > handled in logical decoding, and can't find much literature about i= t > > > > in the repo either. > > > > > > Now the reference to logical decoding made me think that maybe to rea= l > > > fix for CIC would be to leverage logical decoding for the 2nd pass of > > > CIC and not wore about in-page visibilities at all. > > > > -1: Requiring the logical decoding system just to reindex an index > > without O(tablesize) lock time adds too much overhead, and removes > > features we currently have (CIC on unlogged tables). wal_level=3Dlogica= l > > *must not* be required for these tasks if we can at all avoid it. > > I'm also not sure whether logical decoding gets access to the HOT > > information of the updated tuples involved, and therefore whether the > > index build can determine whether it must or can't insert the tuple. > > There are more and more cases (not just CIC here) where using logical > decoding would be the most efficient solution, so why not instead > start improving it instead of complicating the system in various > places? Because Logical Replication implies Replication, which in turn implies (more) WAL generation. And if an unlogged table still generates WAL in DML, then it's not really an unlogged table, in which case we've broken a promise to the user [see: CREATE TABLE's UNLOGGED description]. Adding features to WAL which replicas can't (mustn't!) do anything with is always going to be bloat in my view. I also don't know how you measure efficiency, but I don't consider LR to be particularly efficient in any metric, apart from maybe "wasting DBA time with abandoned slots". LR parses WAL, which is a conveyor belt with _all_ changes, and given that WAL has no real upper boundary on how large it can grow, LR would have to touch an unbounded amount of data to get only the changes it needs. We already have ways to get those changes without parsing an unbounded amount of data, so why not use that instead? > We could even start selectively logging UNLOGGED and TEMP tables when > we start CIC if CIC has enough upsides. Which is why I hate this idea. There can't be enough upsides to counteract the enormous downside of increasing the size of the data we need to ship to replicas when the replicas can't ever use that data. Replicas were able to use the added data of LR before 17 when they were promoted, so it wasn't terrible to include more data in the WAL, but what's proposed here is to add data that literally nobody on the replica can use; wasting WAL storage and replication bandwidth. Lastly, LR requires replication slots, which are very expensive to maintain. Currently, you can do CIC/RIC with any number of backends you want up to max_backends, but this doesn't work if you'd want to use LR, as you'd now need to have max_replication_slots proportional to max_connections. Again, -1 on LR for UNLOGGED/TEMP tables. Or LR in general when the user explicitly asked for `wal_level NOT IN ('logical')` > > I don't think logical decoding is sufficient, because we don't know > > which tuples were already inserted into the index by their own > > backends, so we don't know which tuples' index entries we must skip. > > The premise of pass2 in CIC is that we collect all the rows that were > inserted after CIC started for which we are not 100% sure that they > are inserted in the index. We can only be sure they are inserted for > transactions started after pass1 completed and the index became > visible and available for inserts. I'm not sure this is true; wouldn't it be possible for a transaction to start before the index became visible, but because of READ COMMITTED get access to the index after one statement? I.e. two statements that straddle the index becoming visible? That way, a transaction could start to see the index after it first modified some tuples; creating a hybrid visibility state. > And we do not care about hot update chains dusing normal CREATE INDEX > or first pass of CIC - we just index what is visible NOW wit no regard > of weather the tuple is at the end of HOT update chain. We do care about HOT update chains, because the TID of the HOT root is indexed, and not necessarily the TID of the scanned tuple. > > PS. I think the same should be true for REPACK CONCURRENTLY, but > > that's a new command with yet-to-be-determined semantics, unlike CIC > > which has been part of PG for 6 years. > > CIC has been around way longer, since 8.2 released in 2006, so more > like 20 years :) Ah, so RIC wasn't introduced together with CIC? TIL. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)