Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thXR1-003fde-4B for pgsql-hackers@arkaria.postgresql.org; Mon, 10 Feb 2025 17:19:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1thXQ1-00357n-I3 for pgsql-hackers@arkaria.postgresql.org; Mon, 10 Feb 2025 17:18:01 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thXQ1-00357e-6w for pgsql-hackers@lists.postgresql.org; Mon, 10 Feb 2025 17:18:01 +0000 Received: from mail-lj1-x22e.google.com ([2a00:1450:4864:20::22e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1thXPx-0004WX-2q for pgsql-hackers@postgresql.org; Mon, 10 Feb 2025 17:18:00 +0000 Received: by mail-lj1-x22e.google.com with SMTP id 38308e7fff4ca-30227c56b11so49602661fa.3 for ; Mon, 10 Feb 2025 09:17:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739207877; x=1739812677; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=bR2Ywcdxpdg07ew7AI+Y3KFnGY7Dq5QTg016VXsBGPo=; b=LqgGyQFFWxRE+oqHvdEqyj1iJoZBJCA3pcr1Hrghs9nUzJUIdAFeDxWMhsJdLar6oz b7dcaMkyy5erGdUiQOkI2duXQ13uG8bod9ZccPb7rFnL3L2Bztkw1nxOBWdBD42bs9t3 gr+QM2SBzz0DQ35y3zwgAyhPA55PrxUUfCVz08yo5n44Dgz3v1kJYdaX7ydERnY5/I7/ jJObH8o/kol5WJ2zwxE0vfZmNbLOqSVbjM8vBnogR0oit1meOe7mI2bZUqGsqPyti+Ht huSeZ/g+VceeaGja4tcQ92/q8pN/PJmJCPUeTfTwVXZkUb/nxVrGjxrmeZ5+Ks+TUns9 qbRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739207877; x=1739812677; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bR2Ywcdxpdg07ew7AI+Y3KFnGY7Dq5QTg016VXsBGPo=; b=KorUBnrSdH2Zq4k0jMJNQEgUleUESRpTCl2XWFjbyI6rZYhmtoue/L7qK7bRFcKcmP uW9A735Qsw67fXttI6u7DDFY1ZWAvtV+ClH1lWDqoV/waiXwL2C4tLHe5TUfmzoodBFe 8ZH1MdH5q1T2n/RxgB4NdPRqu1Hq8QeGUp6SedQH9h9a8fdtv6cVPYoG8Ad5+i9YFrjF zIKCAFJPYOg3RSKuctyqMuTYpNE2ralLJhsuB+6Kl1vqjkLcTAAZXmmHevo0AkuT0fej QLfLgTJGfbfYEQp7mvXHg9GIskUg0nh706kBZIWy2rYpTOpmGtcT7JyB349bZzaZuRjQ /s2g== X-Gm-Message-State: AOJu0YwtC1uhn0Yp0wRo444yaqeecJ6/n+qzVdPwsg54v/qgll4HUKrn zcDWNOhDUCUAya0i+gP8mHmCGTzxdiuqtUc3M06QSItdCeBVE43tRXS6R3rPJvcPGA/uVpic6U3 smpV7vkot3tWLVre9V4/fDnzyV/G7pg5I X-Gm-Gg: ASbGncuop2GNxXgCCV1xA0AaYg1VRnjti2TMW9TO6f0oN7l3SX7J71TNO1GLSBHW2cj 2Mbg2czsKRxtwrgEJULsv910o6zndbFkBtUVxf1b9fOysTOaGckMXS6zIssPrOo/8oxH3Ymwv X-Google-Smtp-Source: AGHT+IFlQTCGPOHMjwo4GV9mijvDEZTiZJGRyNhQKl0LK+T2GGaQ6gH61eRh9k4VAK3lmsiBStJoaCeF/6/EmeCmHBY= X-Received: by 2002:a05:651c:4ca:b0:308:860e:d4d3 with SMTP id 38308e7fff4ca-308f8373fe5mr2377481fa.22.1739207876561; Mon, 10 Feb 2025 09:17:56 -0800 (PST) MIME-Version: 1.0 References: <78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com> In-Reply-To: <78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com> From: Matthias van de Meent Date: Mon, 10 Feb 2025 18:17:42 +0100 X-Gm-Features: AWEUYZnVGKkcVDuGeUF4CII-QBYxe5h7GIxXtly5VuF7KqcNKTPkdRa8i2MXx1I Message-ID: Subject: Re: Expanding HOT updates for expression and partial indexes To: "Burd, Greg" Cc: "pgsql-hackers@postgresql.org" Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 6 Feb 2025 at 23:24, Burd, Greg wrote: > > Attached find a patch that expands the cases where heap-only tuple (HOT) = updates are possible without changing the basic semantics of HOT. This is a= ccomplished by examining expression indexes for changes to determine if ind= exes require updating or not. A similar approach is taken for partial index= es, the predicate is evaluated and, in some cases, HOT updates are allowed.= Even with this patch if any index is changed, all indexes are updated. Onl= y in cases where none are modified will this patch allow the HOT path. So, effectively this disables the amsummarizing-based optimizations of https://postgr.es/c/19d8e2308 ? That sounds like a bad degradation in behaviour. > I=E2=80=99m also aware of PHOT [4] and WARM [5] which allow for updating = some, but not all indexes while remaining on the HOT update path, this patc= h does not attempt to accomplish that. > > [...] This opens the door to future improvements by providing a way to pa= ss a bitmap of modified indexes along to be addressed by something similar = to the PHOT/WARM logic. I have serious doubts about the viability of any proposal working to implement PHOT/WARM in PostgreSQL, as they seem to have an inherent nature of fundamentally breaking the TID lifecycle: We won't be able to clean up dead-to-everyone TIDs that were PHOT-updated, because some index Y may still rely on it, and we can't remove the TID from that same index Y because there is still a live PHOT/WARM tuple later in the chain whose values for that index haven't changed since that dead-to-everyone tuple, and thus this PHOT/WARM tuple is the one pointed to by that index. For HOT, this isn't much of an issue, because there is just one TID that's impacted (and it only occupies a single LP slot, with LP_REDIRECT). However, with PHOT/WARM, you'd relatively easily be able to fill a page with TIDs (or even full tuples) you can't clean up with VACUUM until the moment a the PHOT/WARM/HOT chain is broken (due to UPDATE leaving the page or the final entry getting DELETE-d). Unless we are somehow are able to replace the TIDs in indexes from "intermediate dead PHOT" to "base TID"/"latest TID" (either of which is probably also problematic for indexes that expect a TID to appear exactly once in the index at any point in time) I don't think the system is viable if we maintain only a single data structure to contain all dead TIDs. If we had a datastore for dead items per index, that'd be more likely to work, but it also would significantly increase the memory overhead of vacuuming tables. > I have a few concerns with the patch, things I=E2=80=99d greatly apprecia= te your thoughts on: > > First, I pass an EState along the update path to enable running the check= s in heapam, this works but leaves me feeling as if I violated separation o= f concerns. If there is a better way to do this let me know or if you think= the cost of creating one in the execIndexing.c ExecIndexesRequiringUpdates= () is okay that=E2=80=99s another possibility. I think that doesn't have to be bad. > Third, there is overhead to this patch, it is no longer a single simple b= itmap test to choose HOT or not in heap_update(). Why can't it mostly be that simple in simple cases? I mean, it's clear that "updated indexed column's value =3D=3D non-HOT update". And that to determine whether an updated *projected* column's value (i.e., expression index column's value) was actually updated we need to calculate the previous and current index value, thus execute the projection twice. But why would we have significant additional overhead if there are no expression indexes, or when we can know by bitmap overlap that the only interesting cases are summarizing indexes? I would've implemented this with (1) two new bitmaps, one each for normal and summarizing indexes, each containing which columns are exclusively used in expression indexes (and which should thus be used to trigger the (comparatively) expensive recalculation). Then, I'd maintain a (cached) list of unique projections/expressions found in indexes, so that 30 indexes on e.g. ((mycolumn::jsonb)->>'metadata') only extend to 1 check for differences, rather than 30. The "new" output of these expression evaluations would be stored to be used later as index datums, reducing the number of per-expression evaluations down to 2 at most, rather than 2+1 when the index needs an insertion but the expression itself wasn't updated. So, it'd be something like (pseudocode): if (bms_overlap(updated_columns, hotblocking)) /* if columns only indexed through expressions were updated, do expensive stuff. Otherwise, it's a normal non-HOT update. */ if (bms_subset_compare(updated_columns, hot_expression_columns) in (BMS_EQUAL, BMS_SUBSET1)) expensive check for expression changes + populate index column data else normal_update else if (bms_overlap(updated_columns, summarizing)) /* same as above for hotblocking, but now summarizing */ if (bms_subset_compare(updated_columns, sum_expression_columns) in (BMS_EQUAL, BMS_SUBSET1)) expensive check for summarized expression changes + populate summarized index column data else summarizing_update else hot_update Note that it is relatively expensive to do check whether any one index needs to be updated. It's generally cheaper to do all those checks at once, where possible; using one or 2 more bitmaps would be sufficient. Also note that this approach doesn't update specific summarizing indexes, just all of them or none. I think that "update only summarizing indexes that were updated" should be a separate patch from "check if indexed expressions' values changed", potentially in the patchset, but not as part of the main bulk. > Fourth, I=E2=80=99d like to know which version the community prefers (v3 = or v4). I think v4 moves the code in a direction that is cleaner overall, = but you may disagree. I realize that the way I use the modified_indexes bi= tmapset is a tad overloaded (NULL means all indexes should be updated, othe= rwise only update the indexes in the set which may be all/some/none of the = indexes) and that may violate the principal of least surprise but I feel th= at it is better than the TU_UpdateIndexes enum in the code today. I would be hesitant to let table AMs decide which indexes to update at that precision. Note that this API would allow the AM to update only (say) the PK index and no other indexes, which is not allowed to happen if index consistentcy is required (which it is). ----->8----- Do you have any documentation on the approaches used, and the specific differences between v3 and v4? I don't see much of that in your initial mail, and the patches themselves also don't show much of that in their details. I'd like at least some documentation of the new behaviour in src/backend/access/heap/README.HOT at some point before this got marked as RFC in the commitfest app, though preferably sooner rather than later. Kind regards, Matthias van de Meent Neon (https://neon.tech)