Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uRGLD-009Iwm-U1 for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Jun 2025 20:22:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uRGLB-003Kgv-VQ for pgsql-hackers@arkaria.postgresql.org; Mon, 16 Jun 2025 20:22:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uRGLB-003Kgn-Cr for pgsql-hackers@lists.postgresql.org; Mon, 16 Jun 2025 20:22:02 +0000 Received: from mail-ua1-x936.google.com ([2607:f8b0:4864:20::936]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uRGL9-002NIS-35 for pgsql-hackers@postgresql.org; Mon, 16 Jun 2025 20:22:00 +0000 Received: by mail-ua1-x936.google.com with SMTP id a1e0cc1a2514c-87ec9a4c86cso1237268241.1 for ; Mon, 16 Jun 2025 13:21:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750105319; x=1750710119; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/kSlKPT7lHWzle24+YHDUvm2aolEedU1MdHLKByzVw0=; b=P0XDwF5txF1YhYxd9kyikE1NoTu8G/X6LF16d0CXGmU4sEMbL3mZz0b6Bnh+QHxmHH TQJ/8xJ2JDcAmaLuV16otxrSQhe99B/lhc2r9tpAXBUIcq2IMCkP4SHGowsc9g/B1EaR GBABz57+kld5iDENnUcZq9QBZKRE6IIG/FFnAO9YD+p0RMj7OPAXya89LsCWcdkuh1kp cXnm6gMKLIUFHVS7Az3yaZvY6AgO7zXJQ1pN3UHL80gOc1CcbZutVE1yR7Jb83jpRr0R W2wEOSmj8gU0kmpBSuRw9j+PfChW29ASA49bdRnKEUHfGeRLUm65WpPEMJ+IKQySQUTJ VBQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750105319; x=1750710119; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/kSlKPT7lHWzle24+YHDUvm2aolEedU1MdHLKByzVw0=; b=ZWHeJ+avVZCLVOA5BfFdAVDFHSMkvIJc/Nx0yZEGrcSV7OGn/qdzQwDi0rcpYO3yw1 pc2WJRABCe+NXwTnNHiZUZ25p+pEHr+hqnMz0iqPNxqlfmK2151BzZ9j7KgeZ2RpsQNo ZuYiH3xq88dwTuoYUO36Ys/YKTMBsJCl7BLjgA1ExUNr+3WxGWTZlz9P5AnTDhr0Jbpe iAI+0lv6pA3h2QwaccPKN+elr6xR/to8+XCi0M44Rkhi0Dl9E1IhqGClOqRVrIDiHBrX 3JWOYJcNCCwmBlFX3wbNZucoNNjvPkj0/DhTLDYE4W8aBxMR47MbTarAAGi8nrGybEjX MsAg== X-Forwarded-Encrypted: i=1; AJvYcCWQYZtoIbbW4n9ua7ACJxBnnoezYLN5W6aYVS58KRe2OztDYI0y2EfrKra9gvby6YbHfxoP3S+CRcoCog9D@postgresql.org X-Gm-Message-State: AOJu0YwmmpyQOZYlaH7yzHKaw46C1/CA6+kI5X0sWOyL1apxMAR7aPZC fTodtg+8uKf/tly3tzPPttqRR+MsaATJ/gm/mczj6vduKOMUOx/3oam8PO75hxFHyWyVqSt6j4f TRi9/iWiQbF4LqoEAUf/4GT2YCRT4ZPg= X-Gm-Gg: ASbGncvF9j0tWkv3AprRJ42dbqCNUTwOqk0JFonohr/kMJOOCjHiv4JGYiiQ+dvSMVE XFfYKFzrQYL4clxpwzgcf4K2uJkZCE5G0bz1/LA4HIuAX1T2sdK0BmKJKwNvBFnDaEN2mhsDzK6 CNZDyMj+D7tKjl/RmPyDrtA7Tjqk4kvJwkc/Z+bvo+vZ/rDQ== X-Google-Smtp-Source: AGHT+IEOkXLi/5UNJ4nB4Tp4cXw4S9ZWdYvGEvin72wM50e+uabSuaoWYEQeC6QNb5RZL2xq3YvSNJO9y7FS2QdP+wc= X-Received: by 2002:a05:6102:4bc6:b0:4e7:dbeb:f3d0 with SMTP id ada2fe7eead31-4e7f629e242mr6980554137.4.1750105318977; Mon, 16 Jun 2025 13:21:58 -0700 (PDT) MIME-Version: 1.0 References: <202505181556.3n6oiowvntyr@alvherre.pgsql> In-Reply-To: From: Sergey Sargsyan Date: Mon, 16 Jun 2025 23:21:47 +0300 X-Gm-Features: AX0GCFs51xtO5loBNoNDDkaQ2hCI3Ija8jWOqayqD13Nc4yxh1xtWhIyztDmTjY Message-ID: Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements To: Mihail Nikalayeu Cc: =?UTF-8?Q?=C3=81lvaro_Herrera?= , Andres Freund , Michael Paquier , PostgreSQL Hackers , Andrey Borodin , Melanie Plageman , Matthias van de Meent Content-Type: multipart/alternative; boundary="000000000000db74740637b62529" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000db74740637b62529 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you for the information. Tomorrow, I will also run a few tests to measure the time required to collect tids from the index; however, since I do not work with vanilla postgres, the results may vary. If the results indicate that this procedure is time-consuming, I maybe will develop an additional patch specifically for b-tree indexes, as they are the default and most commonly used type. Best regards, Sergey On Mon, Jun 16, 2025, 11:01=E2=80=AFPM Mihail Nikalayeu wrote: > Hello, Sergey! > > > I think it's to avoid duplicate errors when adding tuples from STIP to > the main index, > > but couldn't we just suppress that error during validation and skip the > new tuple insertion if it already exists? > > In some cases, it is not possible: > =E2=80=93 Some index types (GiST, GIN, BRIN) do not provide an easy way t= o > detect such duplicates. > =E2=80=93 When we are building a unique index, we cannot simply skip > duplicates, because doing so would also skip the rows that should > prevent the unique index from being created (unless we add extra logic > for B-tree indexes to compare TIDs as well). > > > The main index may get huge after building, and iterating over it in a > single thread and then sorting tids can be time consuming. > My tests indicate that the overhead is minor compared with the time > spent scanning the heap and building the index itself. > > > At least I guess one can skip it when STIP is empty. > Yes, that=E2=80=99s a good idea; I=E2=80=99ll add it later. > > > p.s. I noticed that `stip.c` has a lot of functions that don't follow > the Postgres coding style of return type on separate line. > Hmm... I=E2=80=99ll fix that as well. > > Best regards, > Mikhail. > --000000000000db74740637b62529 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for the information. Tomorrow, I will als= o run a few tests to measure the time required to collect tids from the ind= ex; however, since I do not work with vanilla postgres, the results may var= y.

If = the results indicate that this procedure is time-consuming, I maybe will de= velop an additional patch specifically for b-tree indexes, as they are the = default and most commonly used type.

Best regards,=C2=A0
Sergey

On Mon, Jun 16, 2025, 11:01=E2=80=AFPM Mihail Nikalayeu <mihailnikalayeu@gmail.com>= ; wrote:
Hello, Sergey!

> I think it's to avoid duplicate errors when adding tuples from STI= P to the main index,
> but couldn't we just suppress that error during validation and ski= p the new tuple insertion if it already exists?

In some cases, it is not possible:
=E2=80=93 Some index types (GiST, GIN, BRIN) do not provide an easy way to<= br> detect such duplicates.
=E2=80=93 When we are building a unique index, we cannot simply skip
duplicates, because doing so would also skip the rows that should
prevent the unique index from being created (unless we add extra logic
for B-tree indexes to compare TIDs as well).

> The main index may get huge after building, and iterating over it in a= single thread and then sorting tids can be time consuming.
My tests indicate that the overhead is minor compared with the time
spent scanning the heap and building the index itself.

> At least I guess one can skip it when STIP is empty.
Yes, that=E2=80=99s a good idea; I=E2=80=99ll add it later.

> p.s. I noticed that `stip.c` has a lot of functions that don't fol= low the Postgres coding style of return type on separate line.
Hmm... I=E2=80=99ll fix that as well.

Best regards,
Mikhail.
--000000000000db74740637b62529--