public inbox for [email protected]  
help / color / mirror / Atom feed
From: Mihail Nikalayeu <[email protected]>
To: Hannu Krosing <[email protected]>
Cc: Matthias van de Meent <[email protected]>
Cc: Sergey Sargsyan <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Michael Paquier <[email protected]>
Cc: PostgreSQL Hackers <[email protected]>
Cc: Andrey Borodin <[email protected]>
Cc: Melanie Plageman <[email protected]>
Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Date: Fri, 28 Nov 2025 19:40:45 +0100
Message-ID: <CADzfLwX7u5P=Utt4kj-KOjgC_8fhsinpmLtadqkHrohE5Y5H+Q@mail.gmail.com> (raw)
In-Reply-To: <CAMT0RQSbFJCpetFy22=O=gKR2ZfH=tMTQeCM743T4o3rMjaeTQ@mail.gmail.com>
References: <CADzfLwW9QczZW-E=McxcjUv0e5VMDctQNETbgao0K-SimVhFPA@mail.gmail.com>
	<[email protected]>
	<CADzfLwXKtriMnfCNVGNH2ahwXaByjo-QOMWiDTU-9WZqh+zQ5g@mail.gmail.com>
	<CADzfLwW5bDWSxjHK7mqX8Lewki3+5FBydBC+nVcxg4xMGKscyw@mail.gmail.com>
	<CAMAof6-4xaV3QE2ErYJaJhu6qjFn99sWyo_HQeBhHikZM3GexA@mail.gmail.com>
	<CADzfLwXocKhpW3eFP1oScz+m+1XJ3bpi9QmVpoqC9RX9oyX=UA@mail.gmail.com>
	<CAMAof695VA+mbVRhWCTus=E0WnsMAQyqXxfOTohbcb7VUHSP4g@mail.gmail.com>
	<CAMAof69JSL8MYWG2qRScs3RQDpfcyZT_wFwW4SoAvftW+K_p1g@mail.gmail.com>
	<CADzfLwVMtwjHh8KY9kP=_vcYPqHs=JDzuexO4RFQ2fM8VoqovA@mail.gmail.com>
	<CAMAof68L0GO0F0bwuXtLZAjh9k_Hj+o0-8mqfO6iEQyXr4PuVA@mail.gmail.com>
	<CADzfLwUrodAcOggK+3j3LbPLaSXemgHxa-n=LhZTwRAsaakL2g@mail.gmail.com>
	<CAMAof691D4O=3QTuPwJXBYxYpG6s3A=tVhL9vN=T3eeRTMnaig@mail.gmail.com>
	<CADzfLwVT3Y14g6Maz2y92sP2L7rPvpznt+MHM++xiy-U3XMLZQ@mail.gmail.com>
	<CADzfLwXQe9XfQfJs3W-DCPqeqG4rq-6FoYUpGbbpgjcT1Eotpg@mail.gmail.com>
	<CAMAof68kNgwWdkhmZd1ysfyU3PF66Wz+UaUr9g-LJg-_0xBV_Q@mail.gmail.com>
	<CADzfLwUtLqYrupZp4QQuWwv4W_LgYWBRStybvQ+S0SZiHrp62A@mail.gmail.com>
	<CADzfLwVYUBb8cUVQ_1mzVzNMyJH84VZKFCRyATvBZKbLW377CA@mail.gmail.com>
	<CADzfLwWbV1i7+cP_Hqr3qgQnBXkAqgrCQxd5PFzqp2AOTK=40w@mail.gmail.com>
	<CADzfLwXJc0jdDDS43-Fj0gKmwX-FURS3eY7MyLQ89qDPA6T5Ug@mail.gmail.com>
	<CADzfLwVaV15R2rUNZmKqLKweiN3SnUBg=6_qGE_ERb7cdQUD8g@mail.gmail.com>
	<CAEze2WgBffcC_SKGLmVxW8uRTEsrwWOHDQujN6zyxy1tSYLJ=Q@mail.gmail.com>
	<CADzfLwVon8ESWOkg+8KU0F9=Hg7QKriNVX-hqcm-v-XZmHkzig@mail.gmail.com>
	<CAEze2WiXYx1LKr=9d7PLsZOYrGytY9AN__tFFw4p_Ysgm1-e5g@mail.gmail.com>
	<CADzfLwUKXcXKZgX+e8ACsOXe_CgtWmNJY_6dyn8EO0AXYOn2pA@mail.gmail.com>
	<CAEze2WiiR2PeXg_vaURjjiiwvjQ=Um8wxWi1BcVS0BGyxiD2gQ@mail.gmail.com>
	<CAMT0RQQP9JiGqqB+pVBzPT7unG1BMBuLj=kGPk4BeS3g6VyT1A@mail.gmail.com>
	<CAMT0RQSbFJCpetFy22=O=gKR2ZfH=tMTQeCM743T4o3rMjaeTQ@mail.gmail.com>

Hello!

On Fri, Nov 28, 2025 at 7:05 PM Hannu Krosing <[email protected]> wrote:
> 1. While the first pass of CIC is collecting the visible tuple for
> index the logical decoding collector also collects any new tuples
> added after the CIC start.
> 2. When the first pass collection finishes, it also gets the indexes
> collected so far by the logical decoding collectoir and adds them to
> the first set before the sorting and creating the index.
>
> 3. once the initial index is created, the CIC just gets whatever else
> was collected after 2. and adds these to the index

It feels very similar to the approach with STIR (upper in that thread)
- instead of doing the second scan - just collect all the new-coming
TIDs in short-term-index-replacement access method.

I think STIR lightweight AM (contains just TID) is a better option
here than logical replication due several reason (Mathias already
mentioned some of them).

Anyway, it looks like things\threads became a little bit mixed-up,
I'll try to structure it a little bit.

For CIC/RC approach with resetting snapshot during heap scan - it is
enough to achieve vacuum-friendly state in phase 1.
For phase 2 (validation) - we need an additional thing - something to
collect incoming tuples (STIR index AM is proposed). In that case we
achieve vacuum-friendly for both phases + single heap scan.

STIR at the same time may be used as just way to make CIC faster
(single scan) - without any improvements related to VACUUM.

You may check [0] for links.

Another topic is REPACK CONCURRENTLY, which itself leaves in [1]. It
is already based on LR.
I was talking about a way to use the same tech (reset snapshot during
the scan) for REPACK also, leveraging the already introduced LR
decoding part.

Mikhail.

[0]: https://www.postgresql.org/message-id/flat/CADzfLwWkYi3r-CD_Bbkg-Mx0qxMBzZZFQTL2ud7yHH2KDb1hdw%40mai...
[1]: https://www.postgresql.org/message-id/flat/202507262156.sb455angijk6%40alvherre.pgsql





view thread (64+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
  In-Reply-To: <CADzfLwX7u5P=Utt4kj-KOjgC_8fhsinpmLtadqkHrohE5Y5H+Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox