Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vOv8Y-00CqLy-1s for pgsql-hackers@arkaria.postgresql.org; Fri, 28 Nov 2025 09:51:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vOv8W-00AcZl-1U for pgsql-hackers@arkaria.postgresql.org; Fri, 28 Nov 2025 09:51:32 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vOv8W-00AcZd-0P for pgsql-hackers@lists.postgresql.org; Fri, 28 Nov 2025 09:51:32 +0000 Received: from mail-lj1-x22f.google.com ([2a00:1450:4864:20::22f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vOv8U-001wkC-0K for pgsql-hackers@postgresql.org; Fri, 28 Nov 2025 09:51:32 +0000 Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-37d275cb96cso11558321fa.2 for ; Fri, 28 Nov 2025 01:51:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1764323488; x=1764928288; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=WkvV4Mu+6/6wCtE+ekaIlRzlNznKn883wrT7wLHz6ak=; b=hBIuAfg1iYuF/I7FEXDLkCgUyqds4ynXmGkRU6jlOR35SvHcG8E0LI/tKCyDA9QJ2M 9Sq/HHQ4RevDrL+IGSyp+iPV8zfx3gzV53boF0JVzSgp2hZLYEM8vn1KUL+PFvt2E3/Y Ep5SKII/9KukRVlUDkA2iATWiuqIrIJpqoz2dnzET+qQvmIWxwQECViQXBjKEmEGqzsH CNp+AmQpiqCS1AWFd1HO8TF+P+FXJoiZ40HzazDHhZDvCjgj9paOisyo3o4IMu/7lWeg DsuYlws7lfJHt6TZ1f1T1oj4aqW+cpqQHji+r7ToNZRe0SoFB9RT1Cc8EfvZl+e/e0IQ tZvQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1764323488; x=1764928288; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=WkvV4Mu+6/6wCtE+ekaIlRzlNznKn883wrT7wLHz6ak=; b=MrgMz2Magj0Tm2M4lTp+kFAprRzQ6O5vssHF/76+fZKbgXa53KJKSYkDhH3ovGdbAE JPT2UiUYWXcRUFASow+KS4HUp9UcJQgQGUwXFwbplNq6Wswtwe8Ly4vC4IMoXWatJQ8T 6pwQMJnYwYzKJ/2Lx5B5K8bTAqzHk0uEpZb9j/Gk2GdxaTffEt7/MLk1rbF8MH0LB7ry +MNujtVixfyFk6ePZJ1hATCtkbBhQr3zaWeS0kE4H12j8HmMdaGhvmKEFLV/6hc7CShX SUTz+X0ol/wAXiS7M/KBJjO+qCx+il9mldguv6GGfXlaKCFIdiuLiPZKmt4DmWL47zpg CwOw== X-Forwarded-Encrypted: i=1; AJvYcCV5SxQ76SroQZ3rEgbiC9nuCZPB11Pe87cLd0R3T9warkTshDD42Hy8JEKNyJJ/aNr/y+nfx1wEFbBLlYOH@postgresql.org X-Gm-Message-State: AOJu0Ywy0ztI61T+9IkINC3HL37q2xKowC9f2fqOV80GygOtshNnAqYe Wbl68PaE4fnFZ/Nlaie8M7J07aQegObbuOnB0X96QhY379E15Ai/1XqDBV0nLUUtyUIDPf7/c4a Hnvq9IsYU1LI3GatpfIy5HjxiFoweILPxF+aV X-Gm-Gg: ASbGncuO9XCGoD5FH8ySg822cPs9DdZLc40cClW80IdXdalhHPuDd1nF6oXCjbqbXMl 9KBJJ9y7v098z0IQQslz42xYBYVfdyPF+4PELLJ2OTacgXsr7zA5Y2v65Ar6jv4Rpl68i/pJXEZ wKdTevJKhffbN0P6Bc9zK+fuyQcqvSa56EpT5GkJWdzJh3sbl8lRJOXLQBnAVJAfJgoVN9MOgGp 2TsIybBF0AB0YhJDRQD0wV/UmPJlioBdk5Gy0S0NmPt8I3eI8CdPbcj4TU+UPS+87XztdL2DoDM nXUj0dVpFWxQNLSl/NEkl2mhGiJ1b1Jc24dqSUBxgnfl2Q7IqlQRSN+op3RorJKajITP X-Google-Smtp-Source: AGHT+IEBGVbYyVHI1MIhCOXYdOTlV9rwdzRSWIxvPseMtmbx8HDPNNdDeWtbF/nNdKv6kFxnncmMOiFBIJLHcuiMXZE= X-Received: by 2002:a05:651c:324f:b0:372:8cc8:fc3d with SMTP id 38308e7fff4ca-37cd9153526mr83104961fa.1.1764323488090; Fri, 28 Nov 2025 01:51:28 -0800 (PST) MIME-Version: 1.0 References: <17483.1764262585@localhost> <8837.1764320724@localhost> In-Reply-To: <8837.1764320724@localhost> From: Matthias van de Meent Date: Fri, 28 Nov 2025 10:51:16 +0100 X-Gm-Features: AWmQ_bnP_IbEBohtQcowfpzvYsm8buxfIZ7Af_xudL6v8_PNq3EWGN4kjyzhHG8 Message-ID: Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements To: Antonin Houska Cc: Michail Nikolaev , PostgreSQL Hackers , Alvaro Herrera Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, 28 Nov 2025 at 10:05, Antonin Houska wrote: > Matthias van de Meent wrote: > > 3. Having the index marked indisready before it contains any data is > > going to slow down the indexing process significantly: > > a. The main index build now must go through shared memory and buffer > > locking, instead of being able to use backend-local memory > > b. The tuple-wise insertion path (IndexAmRoutine->aminsert) can have a > > significantly higher overhead than the bulk insertion logic in > > ambuild(); in metrics of WAL, pages accessed (IO), and CPU cycles > > spent. > > > > So, I don't think moving away from ambuild() as basis for initially > > building the index this is such a great idea. > > > > (However, I do think that having an _option_ to build the index using > > ambuildempty()+aminsert() instead of ambuild() might be useful, if > > only to more easily compare "natural grown" indexes vs freshly built > > ones, but that's completely orthogonal to CIC snapshotting > > improvements.) > > The retail insertions are not something this proposal depends on. I think it'd > be possible to build a separate index locally and then "merge" it with the > regular one. I just tried to propose a solution that does not need snapshots. I'm not sure we can generalize indexes to the point where merging two built indexes is always both possible and efficient. For example, the ANN indexes of pgvector (both HNSW and IVF) could possibly have merge operations between indexes of the same type and schema, but it would require a lot of effort on the side of the AM to support merging; there is no trivial merge operation that also retains the quality of the index without going through the aminsert() path. Conversely, the current approach to CIC doesn't require additional work on the index AM's side, and that's a huge enabler for every kind of index. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)