Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thxeq-009BjS-O4 for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Feb 2025 21:19:04 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1thxeo-000eUS-OB for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Feb 2025 21:19:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1thxeo-000eUH-D6 for pgsql-hackers@lists.postgresql.org; Tue, 11 Feb 2025 21:19:03 +0000 Received: from mail-lj1-x233.google.com ([2a00:1450:4864:20::233]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1thxem-000JxR-2W for pgsql-hackers@postgresql.org; Tue, 11 Feb 2025 21:19:02 +0000 Received: by mail-lj1-x233.google.com with SMTP id 38308e7fff4ca-308f4436cb1so1869371fa.1 for ; Tue, 11 Feb 2025 13:19:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1739308740; x=1739913540; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=p+2uP40ReRSU+1zWSs4Rtm1/K68AXSvMU9nbCA4eXfc=; b=Y+VyjweYz0icMjp+EUnz6Nj8FYaGmAX6FBcSHqSg0v5/JJXM+z/eDHlxWtelAl3Q0w hxCutE7EI89ZiUlUyUVugxa5/po0rpfjiT6PVpwkjSMyhFJvwDMTSLN7Td9r2U7wztH0 WhKIDwKIcqgw0iuiSkvyl3qTAwJl3h2HlQ1AOVceA3HjqxSaMG3Ga9tIMjeDe+/vunbu wnxqKE4oy1ka7jmiAej8kTTCwcKfcdB6rJOx8ArCsn2yTQNQv+aWvqNvlzlrceJ7c2Ck ddoS1kwK4UN/MxVv0yUUebAi/pLEq+yz9pjAo5Vu7rkLHOmjRFUGVyKWSdcI3iGr+dYa Qr/A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739308740; x=1739913540; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=p+2uP40ReRSU+1zWSs4Rtm1/K68AXSvMU9nbCA4eXfc=; b=A1Eb2Khki9WGF7lJi8MKp9/Qpam2OGOdo1at2Q8ZaHn2WWpkSBq6LDVygFSfiQ7vnx rx3MSocLXJmYSrtgNern6MdwI3F37Km7p8sZbokdei2ibk3b4h9NksO0RbS/ujoImQwy GNteXj1/yNHvCtTBs9+Zk1PYfQTqI0oHbY++GuKktW0wMnh14c5doOgQJwpGZQ2FOPBj 4gdkik1pLnAfEKN4imuMxftabWd06GG5emol6pvJyMti0FFzkMEdyEfBAyt1KLzWk3gN 92jCfwPXSnD4NJ0Ic6Ts+X359uytnNOeCj/wdExIGh3fap2ndUw9IRG++4LP1Cd1zEj0 InFw== X-Gm-Message-State: AOJu0YzWtGKzT44eB/xE8tXKAW1kTubk9E7m4orXSjXbGn66ls1c0wOI TXKNughanQgOd/T8O/ZA7DKj6QT/VIMzqj+rSHk6wD3sV/VWeKngjO2bRb7iWyFbVMQVourfUwY Ir+SbegyoeUwKPREk0+dFhydaOIM= X-Gm-Gg: ASbGncs5t6DfaJQ6mChRjXBmTVnLmJRPd6YsKK88WBCAbwHlIhNDtjpzayYOG6lzvnb 7f9Zf+Hkp3Bf0D+Bj1jfM5tVUwnNFG6DqzyJTxEb4ZDkNIctIZgRshp6eULNGpZ7Wc10PVWm3 X-Google-Smtp-Source: AGHT+IEsSnkfkJDR1VZKzUSpRyGmraLn2lLAeX0gQ5DqYVbIQLIW/w7BBvlWtV5Oj+0ew6in7VB/Z5uD38Ym+tKup9c= X-Received: by 2002:a05:651c:895:b0:308:dd96:765f with SMTP id 38308e7fff4ca-30904563abemr2228851fa.1.1739308739357; Tue, 11 Feb 2025 13:18:59 -0800 (PST) MIME-Version: 1.0 References: <78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com> In-Reply-To: From: Matthias van de Meent Date: Tue, 11 Feb 2025 22:18:45 +0100 X-Gm-Features: AWEUYZlcdIJrDaPOuTbF7fUzNu_LZRA3ezE3xXAulu4qINgXK04gPyDcgaSynSs Message-ID: Subject: Re: Expanding HOT updates for expression and partial indexes To: "Burd, Greg" Cc: "pgsql-hackers@postgresql.org" Content-Type: multipart/mixed; boundary="000000000000908ead062de45f42" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000908ead062de45f42 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, 10 Feb 2025 at 20:11, Burd, Greg wrote: > > On Feb 10, 2025, at 12:17=E2=80=AFPM, Matthias van de Meent wrote: > > > >> > >> I have a few concerns with the patch, things I=E2=80=99d greatly appre= ciate your thoughts on: > >> > >> First, I pass an EState along the update path to enable running the ch= ecks in heapam, this works but leaves me feeling as if I violated separatio= n of concerns. If there is a better way to do this let me know or if you th= ink the cost of creating one in the execIndexing.c ExecIndexesRequiringUpda= tes() is okay that=E2=80=99s another possibility. > > > > I think that doesn't have to be bad. > > Meaning that the approach I=E2=80=99ve taken is okay with you? If you mean "passing EState down through the AM so we can check if we really need HOT updates", then Yes, that's OK. I don't think the basic idea (executing the projections to check for differences) is bad per se, however I do think we may need more design work on the exact shape of how ExecIndexesRequiringUpdates() receives its information (which includes the EState). E.g. we could pass an opaquely typed pointer that's passed from the executor state through the table_update method into this ExecIndexesRequiringUpdates(). That opaque struct would then contain the important information for index update state checking, so that the AM can't realistically break things without bypassing the separation of concerns, and doesn't have to know about any executor nodes. >>> Third, there is overhead to this patch, it is no longer a single simple= bitmap test to choose HOT or not in heap_update(). >> >> Why can't it mostly be that simple in simple cases? > > It can remain that simple in the cases you mention. In relcache the hot = blocking attributes are nearly the same, only the summarizing attributes ar= e removed. The first test then in heap_update() is for overlap with the mo= dified set. When there is none, the update will proceed on the HOT path. > > The presence of a summarizing index is determined in ExecIndexesRequiring= Updates() in execIndexing.c, so a slightly longer code path but not much ne= w overhead. Yes, but that's only determined at an index-by-index level, rather than determined all at once, and that's real bad when you have hundreds of indexes to go through (however unlikely it might be, I've heard of cases where there are 1000s of indexes on one table). So, I prefer limiting any O(n_indexes) operations to only the most critical cases. > > I mean, it's clear that "updated indexed column's value =3D=3D non-HOT > > update". And that to determine whether an updated *projected* column's > > value (i.e., expression index column's value) was actually updated we > > need to calculate the previous and current index value, thus execute > > the projection twice. But why would we have significant additional > > overhead if there are no expression indexes, or when we can know by > > bitmap overlap that the only interesting cases are summarizing > > indexes? > > You=E2=80=99re right, there=E2=80=99s not a lot of new overhead in that c= ase except what happens in ExecIndexesRequiringUpdates() to scan over the l= ist of IndexInfo. It is really only when there are many expressions/predic= ates requiring examination that there is any significant cost to this appro= ach AFAICT (but if you see something please point it out). See the attached approach. Evaluation of the expressions only has to happen if there are any HOT-blocking attributes which are exclusively hot-blockingly indexed through expressions, so if the updated attribute numbers are a subset of hotblockingexprattrs. (substitute hotblocking with summarizing for the summarizing approach) > > I would've implemented this with (1) two new bitmaps, one each for > > normal and summarizing indexes, each containing which columns are > > exclusively used in expression indexes (and which should thus be used > > to trigger the (comparatively) expensive recalculation). > > That was one where I started, over time that became harder to work as the= bitmaps contain the union of index attributes for the table not per-column= . I think it's fairly easy to create, though. > Now there is one bitmap to cover the broadest case and then a function to= find the modified set of indexes where each is examined against bitmaps th= at contain only attributes specific to the index in question. This helped = in cases where there were both expression and non-expression indexes on the= same attribute. Fair, but do we care about one expression index on (attr1->>'data')'s value *not* changing when an index on (attr1) exists and attr1 has changed? That index on att1 would block HOT updates regardless of the (lack of) changes to the (att1->>'data') index, so doing those expensive calculations seems quite wasteful. So, in my opinion, we should also keep track of those attributes only included in expressions of indexes, and that's fairly easy: see attached prototype.diff.txt (might need some work, the patch was drafted on v16's codebase, but the idea is clear). The resulting %exprattrs bitmap contains attributes that are used only in expressions of those index types. > > The "new" output of these expression > > evaluations would be stored to be used later as index datums, reducing > > the number of per-expression evaluations down to 2 at most, rather > > than 2+1 when the index needs an insertion but the expression itself > > wasn't updated. > > Not reforming the new index tuples is also an interesting optimization. = I wonder how that can be passed from within heapam=E2=80=99s call into a fu= nction in execIndexing up into nodeModifiyTable and back down into execInde= xing and on to the index access method? I=E2=80=99ll have to think about t= hat, ideas welcome. Note that index tuple forming happens only in the index AM, it's the Datum construction (i.e. projection from attributes/tuple to indexed value) that I'd like to deduplicate. Though looking at the current code, I don't think it's reasonable to have that as a requirement for this work. It'd be a nice-to-have for sure, but not as requirement. > > Do you have any documentation on the approaches used, and the specific > > differences between v3 and v4? I don't see much of that in your > > initial mail, and the patches themselves also don't show much of that > > in their details. I'd like at least some documentation of the new > > behaviour in src/backend/access/heap/README.HOT at some point before > > this got marked as RFC in the commitfest app, though preferably sooner > > rather than later. > > Good point, I should have updated README.HOT with the initial patchset. = I=E2=80=99ll jump on that and update ASAP. Thanks in advance. Kind regards, Matthias van de Meent Neon (https://neon.tech) --000000000000908ead062de45f42 Content-Type: text/plain; charset="US-ASCII"; name="prototype.diff.txt" Content-Disposition: attachment; filename="prototype.diff.txt" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_m70xowjw0 ZGlmZiAtLWdpdCBhL3NyYy9iYWNrZW5kL3V0aWxzL2NhY2hlL3JlbGNhY2hlLmMgYi9zcmMvYmFj a2VuZC91dGlscy9jYWNoZS9yZWxjYWNoZS5jCmluZGV4IDVhNTc3MDJlOTE0Li5iZjQxZTlmNTNk MyAxMDA2NDQKLS0tIGEvc3JjL2JhY2tlbmQvdXRpbHMvY2FjaGUvcmVsY2FjaGUuYworKysgYi9z cmMvYmFja2VuZC91dGlscy9jYWNoZS9yZWxjYWNoZS5jCkBAIC01MTgxLDggKzUxODEsMTAgQEAg UmVsYXRpb25HZXRJbmRleEF0dHJCaXRtYXAoUmVsYXRpb24gcmVsYXRpb24sIEluZGV4QXR0ckJp dG1hcEtpbmQgYXR0cktpbmQpCiAJQml0bWFwc2V0ICAqdWluZGV4YXR0cnM7CS8qIGNvbHVtbnMg aW4gdW5pcXVlIGluZGV4ZXMgKi8KIAlCaXRtYXBzZXQgICpwa2luZGV4YXR0cnM7CS8qIGNvbHVt bnMgaW4gdGhlIHByaW1hcnkgaW5kZXggKi8KIAlCaXRtYXBzZXQgICppZGluZGV4YXR0cnM7CS8q IGNvbHVtbnMgaW4gdGhlIHJlcGxpY2EgaWRlbnRpdHkgKi8KLQlCaXRtYXBzZXQgICpob3RibG9j a2luZ2F0dHJzOwkvKiBjb2x1bW5zIHdpdGggSE9UIGJsb2NraW5nIGluZGV4ZXMgKi8KLQlCaXRt YXBzZXQgICpzdW1tYXJpemVkYXR0cnM7CS8qIGNvbHVtbnMgd2l0aCBzdW1tYXJpemluZyBpbmRl eGVzICovCisJQml0bWFwc2V0ICAqaG90YmxvY2tpbmdhdHRyczsJCS8qIGNvbHVtbnMgd2l0aCBI T1QgYmxvY2tpbmcgaW5kZXhlcyAqLworCUJpdG1hcHNldCAgKmhvdGJsb2NraW5nZXhwcmF0dHJz OwkvKiBhcyBhYm92ZSwgYnV0IG9ubHkgdGhvc2UgaW4gZXhwcmVzc2lvbnMgKi8KKwlCaXRtYXBz ZXQgICpzdW1tYXJpemVkYXR0cnM7CQkvKiBjb2x1bW5zIHdpdGggc3VtbWFyaXppbmcgaW5kZXhl cyAqLworCUJpdG1hcHNldCAgKnN1bW1hcml6ZWRleHByYXR0cnM7CS8qIGFzIGFib3ZlLCBidXQg b25seSB0aG9zZSBpbiBleHByZXNzaW9ucyAqLwogCUxpc3QJICAgKmluZGV4b2lkbGlzdDsKIAlM aXN0CSAgICpuZXdpbmRleG9pZGxpc3Q7CiAJT2lkCQkJcmVscGtpbmRleDsKQEAgLTUyNDgsNyAr NTI1MCw5IEBAIHJlc3RhcnQ6CiAJcGtpbmRleGF0dHJzID0gTlVMTDsKIAlpZGluZGV4YXR0cnMg PSBOVUxMOwogCWhvdGJsb2NraW5nYXR0cnMgPSBOVUxMOworCWhvdGJsb2NraW5nZXhwcmF0dHJz ID0gTlVMTDsKIAlzdW1tYXJpemVkYXR0cnMgPSBOVUxMOworCXN1bW1hcml6ZWRleHByYXR0cnMg PSBOVUxMOwogCWZvcmVhY2gobCwgaW5kZXhvaWRsaXN0KQogCXsKIAkJT2lkCQkJaW5kZXhPaWQg PSBsZmlyc3Rfb2lkKGwpOwpAQCAtNTI2Miw2ICs1MjY2LDcgQEAgcmVzdGFydDoKIAkJYm9vbAkJ aXNQSzsJCS8qIHByaW1hcnkga2V5ICovCiAJCWJvb2wJCWlzSURLZXk7CS8qIHJlcGxpY2EgaWRl bnRpdHkgaW5kZXggKi8KIAkJQml0bWFwc2V0ICoqYXR0cnM7CisJCUJpdG1hcHNldCAqKmV4cHJh dHRyczsKIAogCQlpbmRleERlc2MgPSBpbmRleF9vcGVuKGluZGV4T2lkLCBBY2Nlc3NTaGFyZUxv Y2spOwogCkBAIC01MzA1LDkgKzUzMTAsMTUgQEAgcmVzdGFydDoKIAkJICogZGVjaWRlIHdoaWNo IGJpdG1hcCB3ZSdsbCB1cGRhdGUgaW4gdGhlIGZvbGxvd2luZyBsb29wLgogCQkgKi8KIAkJaWYg KGluZGV4RGVzYy0+cmRfaW5kYW0tPmFtc3VtbWFyaXppbmcpCisJCXsKIAkJCWF0dHJzID0gJnN1 bW1hcml6ZWRhdHRyczsKKwkJCWV4cHJhdHRycyA9ICZzdW1tYXJpemVkZXhwcmF0dHJzOworCQl9 CiAJCWVsc2UKKwkJewogCQkJYXR0cnMgPSAmaG90YmxvY2tpbmdhdHRyczsKKwkJCWV4cHJhdHRy cyA9ICZob3RibG9ja2luZ2V4cHJhdHRyczsKKwkJfQogCiAJCS8qIENvbGxlY3Qgc2ltcGxlIGF0 dHJpYnV0ZSByZWZlcmVuY2VzICovCiAJCWZvciAoaSA9IDA7IGkgPCBpbmRleERlc2MtPnJkX2lu ZGV4LT5pbmRuYXR0czsgaSsrKQpAQCAtNTM0OCwxMCArNTM1OSwxMCBAQCByZXN0YXJ0OgogCQl9 CiAKIAkJLyogQ29sbGVjdCBhbGwgYXR0cmlidXRlcyB1c2VkIGluIGV4cHJlc3Npb25zLCB0b28g Ki8KLQkJcHVsbF92YXJhdHRub3MoaW5kZXhFeHByZXNzaW9ucywgMSwgYXR0cnMpOworCQlwdWxs X3ZhcmF0dG5vcyhpbmRleEV4cHJlc3Npb25zLCAxLCBleHByYXR0cnMpOwogCiAJCS8qIENvbGxl Y3QgYWxsIGF0dHJpYnV0ZXMgaW4gdGhlIGluZGV4IHByZWRpY2F0ZSwgdG9vICovCi0JCXB1bGxf dmFyYXR0bm9zKGluZGV4UHJlZGljYXRlLCAxLCBhdHRycyk7CisJCXB1bGxfdmFyYXR0bm9zKGlu ZGV4UHJlZGljYXRlLCAxLCBleHByYXR0cnMpOwogCiAJCWluZGV4X2Nsb3NlKGluZGV4RGVzYywg QWNjZXNzU2hhcmVMb2NrKTsKIAl9CkBAIC01MzgwLDExICs1MzkxLDI1IEBAIHJlc3RhcnQ6CiAJ CWJtc19mcmVlKHBraW5kZXhhdHRycyk7CiAJCWJtc19mcmVlKGlkaW5kZXhhdHRycyk7CiAJCWJt c19mcmVlKGhvdGJsb2NraW5nYXR0cnMpOworCQlibXNfZnJlZShob3RibG9ja2luZ2V4cHJhdHRy cyk7CiAJCWJtc19mcmVlKHN1bW1hcml6ZWRhdHRycyk7CisJCWJtc19mcmVlKHN1bW1hcml6ZWRl eHByYXR0cnMpOwogCiAJCWdvdG8gcmVzdGFydDsKIAl9CiAKKwkvKiB7ZXhwcmVzc2lvbi1vbmx5 IGNvbHVtbnN9ID0ge2V4cHJlc3Npb24gY29sdW1uc30gLSB7ZGlyZWN0IGNvbHVtbnN9ICovCisJ aG90YmxvY2tpbmdleHByYXR0cnMgPSBibXNfZGVsX21lbWJlcnMoaG90YmxvY2tpbmdleHByYXR0 cnMsCisJCQkJCQkJCQkJICAgaG90YmxvY2tpbmdhdHRycyk7CisJLyoge2FsbCBjb2x1bW5zfSA9 IHtkaXJlY3QgY29sdW1uc30gKyB7ZXhwcmVzc2lvbi1vbmx5IGNvbHVtbnN9ICovCisJaG90Ymxv Y2tpbmdhdHRycyA9IGJtc19hZGRfbWVtYmVycyhob3RibG9ja2luZ2F0dHJzLAorCQkJCQkJCQkJ ICAgaG90YmxvY2tpbmdleHByYXR0cnMpOworCisJc3VtbWFyaXplZGV4cHJhdHRycyA9IGJtc19k ZWxfbWVtYmVycyhzdW1tYXJpemVkZXhwcmF0dHJzLAorCQkJCQkJCQkJCSAgc3VtbWFyaXplZGF0 dHJzKTsKKwlzdW1tYXJpemVkYXR0cnMgPSBibXNfYWRkX21lbWJlcnMoc3VtbWFyaXplZGF0dHJz LAorCQkJCQkJCQkJICBzdW1tYXJpemVkZXhwcmF0dHJzKTsKKwogCS8qIERvbid0IGxlYWsgdGhl IG9sZCB2YWx1ZXMgb2YgdGhlc2UgYml0bWFwcywgaWYgYW55ICovCiAJcmVsYXRpb24tPnJkX2F0 dHJzdmFsaWQgPSBmYWxzZTsKIAlibXNfZnJlZShyZWxhdGlvbi0+cmRfa2V5YXR0cik7Cg== --000000000000908ead062de45f42--