Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rckMY-001qj6-UY for pgsql-hackers@arkaria.postgresql.org; Wed, 21 Feb 2024 11:02:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rckMW-004AZv-DY for pgsql-hackers@arkaria.postgresql.org; Wed, 21 Feb 2024 11:02:04 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rckMW-004AZn-38 for pgsql-hackers@lists.postgresql.org; Wed, 21 Feb 2024 11:02:04 +0000 Received: from mail-lj1-x22f.google.com ([2a00:1450:4864:20::22f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rckMP-000FnS-TQ for pgsql-hackers@postgresql.org; Wed, 21 Feb 2024 11:02:03 +0000 Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2d243797703so32428611fa.3 for ; Wed, 21 Feb 2024 03:01:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708513317; x=1709118117; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=sR0PwwrUAiSuYEl94TKpw8hrqabqSlH6P72Qc1fBflY=; b=dh3XNj19mAjKMlIMBGwB6VNz40Q74OIrALcH1CmVxB5BHZHBu2InHSeGKMwpc8JIfQ WwufZhzGXP0gan1zBVFWOmuYPIPJ1uO2HRrvx6dCcEdmNEl+VpXJ3PaaEpHUb28GIiCd zDVLqOb3F7S92nkgPv2pzuZ6dSzijoY2JDCgkUl69lQXld1q7gDzXBOH2QE/GD52lSA9 n04L4q57kQpeR4ud4ePhx3pG5Jsgep1dqWHQCPMgcKe5kwNHDNHm3CilUsILmA2FrNmT slTO4k/eQfmZ05eQ6UJp/KRkRTLYTzg9tPtlNgtjNUJhSdNrTiD0AumYqqzBWk6drcVW MB4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708513317; x=1709118117; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=sR0PwwrUAiSuYEl94TKpw8hrqabqSlH6P72Qc1fBflY=; b=w46XDbdQMdf+a+O7NsJq61cyZtW/bD8bh8kL8eqwg7fvNfzixcthnujzi2RYyiHZik 2/zd6W5UiQqsQKBCg/PrOLd5uqHIlbJ5x0pU4HTqn/PbpNQw/5UZcgzSQen+O3KBkTIQ 4KZ+ECP4BInOBodm+NriXoF9gxZaoc+hX8sw2/0r0YbaeuLpz0sBNk48Q5wdgsWcuYII e2n1OdEwwQUVl+dR1J4EsFn6B3cjyU3L8TZatVkWgESUeDNpV2eP/GtcvcuCxKrR5ECN s4hRcBREdblzCE93wXPkDkf0eE+cUPjojsOF3DDfZrerGCM8HhP4MKPkQ987Rr7AFZ1h 43Tw== X-Forwarded-Encrypted: i=1; AJvYcCXps2dQi4+eGVA8G4Btqr3UbPO9r4B5vY/5+bGnprD/BakxqTCXASLRquWogfdYEPEkJGY+o2wG60062i/ZHoJgsKloK5jjAvSkaK/C X-Gm-Message-State: AOJu0Yx0neT298ndvQ0ouNxIbZNC1M2jjNV7SraFoPLQYLfTxR8EogUQ dkFIEtRJfGpM1YfICoMnxeM5c2CV95crQ7eY19+KMkx5speaqdSoQKpL1RaDyKbEr853M3K0KqW 7IplGXSKFqwfW8ZjdXBAQxIWaxe4= X-Google-Smtp-Source: AGHT+IGoNb4sccCI3E/wL2SJ8pjhNP1gsSMroyqDzs9GAdptTv6zK3S5uKZogTcdwvo2sD62wTcd4q3L0xjSDSdhzkI= X-Received: by 2002:a2e:99c7:0:b0:2d2:3b9e:dcff with SMTP id l7-20020a2e99c7000000b002d23b9edcffmr5268175ljj.46.1708513317197; Wed, 21 Feb 2024 03:01:57 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Matthias van de Meent Date: Wed, 21 Feb 2024 12:01:45 +0100 Message-ID: Subject: Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements To: Michail Nikolaev Cc: Melanie Plageman , PostgreSQL Hackers , Alvaro Herrera Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, 21 Feb 2024 at 00:33, Michail Nikolaev wrote: > > Hello! > > I think the best way for this to work would be an index method that > > exclusively stores TIDs, and of which we can quickly determine new > > tuples, too. I was thinking about something like GIN's format, but > > using (generation number, tid) instead of ([colno, colvalue], tid) as > > key data for the internal trees, and would be unlogged (because the > > data wouldn't have to survive a crash) > > Yeah, this seems to be a reasonable approach, but there are some > doubts related to it - it needs new index type as well as unlogged > indexes to be introduced - this may make the patch too invasive to be > merged. I suppose so, though persistence is usually just to keep things correct in case of crashes, and this "index" is only there to support processes that don't expect to survive crashes. > Also, some way to remove the index from the catalog in case of > a crash may be required. That's less of an issue though, we already accept that a crash during CIC/RIC leaves unusable indexes around, so "needs more cleanup" is not exactly a blocker. > A few more thoughts: > * it is possible to go without generation number - we may provide a > way to do some kind of fast index lookup (by TID) directly during the > second table scan phase. While possible, I don't think this would be more performant than the combination approach, at the cost of potentially much more random IO when the table is aggressively being updated. > * one more option is to maintain a Tuplesorts (instead of an index) > with TIDs as changelog and merge with index snapshot after taking a > new visibility snapshot. But it is not clear how to share the same > Tuplesort with multiple inserting backends. Tuplesort requires the leader process to wait for concurrent backends to finish their sort before it can start consuming their runs. This would make it a very bad alternative to the "changelog index" as the CIC process would require on-demand actions from concurrent backends (flush of sort state). I'm not convinced that's somehow easier. > * crazy idea - what is about to do the scan in the index we are > building? We have tuple, so, we have all the data indexed in the > index. We may try to do an index scan using that data to get all > tuples and find the one with our TID :) We can't rely on that, because we have no guarantee we can find the tuple quickly enough. Equality-based indexing is very much optional, and so are TID-based checks (outside the current vacuum-related APIs), so finding one TID can (and probably will) take O(indexsize) when the tuple is not in the index, which is one reason for ambulkdelete() to exist. Kind regards, Matthias van de Meent