Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uqqAB-00Dw99-Eu for pgsql-hackers@arkaria.postgresql.org; Tue, 26 Aug 2025 09:40:25 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uqqAA-004ETd-UT for pgsql-hackers@arkaria.postgresql.org; Tue, 26 Aug 2025 09:40:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uqqAA-004ETU-46 for pgsql-hackers@lists.postgresql.org; Tue, 26 Aug 2025 09:40:23 +0000 Received: from forwardcorp1d.mail.yandex.net ([178.154.239.200]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1uqqA6-001ojX-0n for pgsql-hackers@postgresql.org; Tue, 26 Aug 2025 09:40:21 +0000 Received: from mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net (mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net [IPv6:2a02:6b8:c42:cf2d:0:640:140f:0]) by forwardcorp1d.mail.yandex.net (Yandex) with ESMTPS id 98427806A8; Tue, 26 Aug 2025 12:40:14 +0300 (MSK) Received: from smtpclient.apple (unknown [2a02:6bf:8080:2d::1:29]) by mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net (smtpcorp/Yandex) with ESMTPSA id DeUtnT0FxSw0-91C4uGa0; Tue, 26 Aug 2025 12:40:13 +0300 X-Yandex-Fwd: 1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yandex-team.ru; s=default; t=1756201214; bh=aB7wltJwSjGEE9jp4Hj0tdSUbkdpLx+llm0SZ/QRLwc=; h=To:Cc:Date:Message-Id:From:Subject; b=fd73CvNhiZkn91OsHc9166J5pKBHlfJ8+GsVWzdXEl33O6wORUhb/pmPs+D5a5q3Z aJUl/lYKixnAmQSMEtUzxVegL1Bki86sgXKTWG8InFvMEAP6GWQfABCGnUT6tLH/Wo SOzNufyHuQaegv/8ha7fTxjwsGbuJoE95IMkJ3fw= Authentication-Results: mail-nwsmtp-smtp-corp-main-56.klg.yp-c.yandex.net; dkim=pass header.i=@yandex-team.ru From: Andrey Borodin Content-Type: multipart/mixed; boundary="Apple-Mail=_0B27333F-B829-459D-8A6C-156FEFF0AF59" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.600.51.1.1\)) Subject: [WiP] B-tree page merge during vacuum to reduce index bloat Message-Id: <11A59C0C-A8C8-4642-8493-292D5DF8311D@yandex-team.ru> Date: Tue, 26 Aug 2025 14:40:03 +0500 Cc: Kirk Wolak , Nikolay Samokhvalov To: pgsql-hackers X-Mailer: Apple Mail (2.3826.600.51.1.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_0B27333F-B829-459D-8A6C-156FEFF0AF59 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii Hi hackers, Together with Kirk and Nik we spent several online hacking sessions to = tackle index bloat problems [0,1,2]. As a result we concluded that = B-tree index page merge should help to keep an index from pressuring = shared_buffers. We are proposing a feature to automatically merge nearly-empty B-tree = leaf pages during VACUUM operations to reduce index bloat. This = addresses a common issue where deleted tuples leave behind sparsely = populated pages that traditional page deletion cannot handle because = they're not completely empty. *** Implementation Overview: The patch adds a new `mergefactor` reloption (default 5%, range 0-50%) = that defines when a page becomes a candidate for merging. During vacuum, = when a page exceeds this threshold (e.g., 95% empty with default = settings), we attempt to move the remaining tuples to the right sibling = page and then delete the source page using existing page deletion = mechanisms. Key changes: - New `mergefactor` reloption for configurable merge thresholds =20 - Detection logic in `btvacuumpage()` to identify merge candidates - Tuple movement implementation in `_bt_unlink_halfdead_page()` - WAL logging enhancements to handle cross-page dependencies during = replay The last part needs further improvements (it's simply REGBUF_FORCE_IMAGE = for now), but I want to start a discussion and ask for known problems of = the approach. *** Correctness: The implementation reuses existing locking, critical sections and WAL = logging infrastructure. To handle cross-page dependencies during WAL = replay, when tuples are merged, the right sibling buffer is registered = with `REGBUF_FORCE_IMAGE`, this is a temporary workaround. *** Current Status & Questions: The patch successfully reduces index bloat and handles basic scenarios, = but we've identified some areas that need community input: 1. Backward Scan Correctness: The primary concern is ensuring backward = scans work correctly when pages are being merged concurrently. Since we = maintain the same locking protocol as existing page deletion, I believe = this should be safe, but would appreciate expert review of the approach. 2. Performance Impact: The additional checks during vacuum have minimal = overhead, but broader testing would be valuable. Worst case would be the = index with leaf pattern (5%,96%,5%,96%,5%,96%...). We will attempt to = merge it every time spending time on acquiring locks. 3. WAL Consistency: There are still some edge cases with WAL consistency = checking that need refinement. I think I can handle it, just need to = spend enough time on debugging real redo instead of imaging right page. *** Usage: CREATE INDEX ON table (col) WITH (mergefactor=3D10); -- 10% threshold I don't know if it would be a good idea to enable mergefactor for = existing indexes. The feature is particularly beneficial for workloads with high = update/delete rates that create sparse index pages without triggering = complete page deletion. I'm attaching the patch for review and would welcome feedback on the = approach, especially regarding backward scan safety and any other = correctness concerns we may have missed. Thank you! Best regards, Andrey, Nik, Kirk. [0] https://www.youtube.com/watch?v=3D3MleDtXZUlM [1] https://www.youtube.com/watch?v=3DIb3SXSFt8mE [2] https://www.youtube.com/watch?v=3DD1PEdDcvZTw --Apple-Mail=_0B27333F-B829-459D-8A6C-156FEFF0AF59 Content-Disposition: attachment; filename=v1-0001-btree-Add-page-merge-during-vacuum-to-reduce-inde.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="v1-0001-btree-Add-page-merge-during-vacuum-to-reduce-inde.patch" Content-Transfer-Encoding: quoted-printable =46rom=20b297dc807dd2b59a86734d8fb98f03429c9113cc=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20Andrey=20Borodin=20=0ADate:=20= Wed,=2020=20Aug=202025=2023:24:36=20+0500=0ASubject:=20[PATCH=20v1]=20= btree:=20Add=20page=20merge=20during=20vacuum=20to=20reduce=20index=20= bloat=0A=0AImplement=20automatic=20merging=20of=20nearly-empty=20B-tree=20= leaf=20pages=20during=20vacuum.=0AWhen=20a=20page=20exceeds=20the=20= mergefactor=20threshold=20(default=205%=20=3D=2095%=20empty),=20move=0A= remaining=20tuples=20to=20the=20right=20sibling=20and=20delete=20the=20= source=20page,=20reducing=0Aindex=20bloat.=0A=0AChanges:=0A-=20Add=20= mergefactor=20reloption=20(0-50%,=20default=205%)=20for=20configurable=20= merge=20threshold=0A-=20Detect=20mergefactor-empty=20pages=20in=20= btvacuumpage()=20for=20merge=20consideration=0A-=20Add=20merge=20logic=20= in=20_bt_unlink_halfdead_page()=20with=20tuple=20movement=0A-=20Perform=20= feasibility=20checks=20for=20right=20sibling=20space=20and=20level=0A-=20= Handle=20WAL=20replay=20by=20reading=20tuples=20from=20target=20page=20= before=20reinit=0A-=20Skip=20merge=20when=20vacuum=20deletions=20are=20= pending=20to=20avoid=20WAL=20conflicts=0A-=20Update=20assertions=20to=20= allow=20non-empty=20pages=20for=20merge=20scenarios=0A=0AThe=20= implementation=20maintains=20crash=20safety=20through=20existing=20= critical=20sections=0Aand=20WAL=20logging,=20preserves=20B-tree=20sort=20= order,=20and=20coordinates=20with=20vacuum=0Aoperations=20to=20prevent=20= offset=20conflicts=20during=20standby=20replay.=0A=0AUsage:=20CREATE=20= INDEX=20ON=20table=20(col)=20WITH=20(mergefactor=3D10);=0A---=0A=20= src/backend/access/common/reloptions.c=20|=20=2010=20+=0A=20= src/backend/access/nbtree/nbtpage.c=20=20=20=20|=20275=20= +++++++++++++++++++++++--=0A=20src/backend/access/nbtree/nbtree.c=20=20=20= =20=20|=20=2030=20++-=0A=20src/backend/access/nbtree/nbtutils.c=20=20=20= |=20=20=203=20+-=0A=20src/backend/access/nbtree/nbtxlog.c=20=20=20=20|=20= =2082=20+++++++-=0A=20src/include/access/nbtree.h=20=20=20=20=20=20=20=20= =20=20=20=20|=20=20=208=20+=0A=20src/include/access/nbtxlog.h=20=20=20=20= =20=20=20=20=20=20=20|=20=20=204=20+-=0A=207=20files=20changed,=20390=20= insertions(+),=2022=20deletions(-)=0A=0Adiff=20--git=20= a/src/backend/access/common/reloptions.c=20= b/src/backend/access/common/reloptions.c=0Aindex=20= 0af3fea68fa..b16fd32dc9b=20100644=0A---=20= a/src/backend/access/common/reloptions.c=0A+++=20= b/src/backend/access/common/reloptions.c=0A@@=20-192,6=20+192,16=20@@=20= static=20relopt_int=20intRelOpts[]=20=3D=0A=20=09=09},=0A=20=09=09= BTREE_DEFAULT_FILLFACTOR,=20BTREE_MIN_FILLFACTOR,=20100=0A=20=09},=0A+=09= {=0A+=09=09{=0A+=09=09=09"mergefactor",=0A+=09=09=09"Minimum=20= percentage=20of=20free=20space=20to=20trigger=20btree=20page=20merge=20= during=20vacuum",=0A+=09=09=09RELOPT_KIND_BTREE,=0A+=09=09=09= ShareUpdateExclusiveLock=09/*=20since=20it=20applies=20only=20to=20= vacuum=0A+=09=09=09=09=09=09=09=09=09=09=20*=20operations=20*/=0A+=09=09= },=0A+=09=09BTREE_DEFAULT_MERGEFACTOR,=200,=2050=0A+=09},=0A=20=09{=0A=20= =09=09{=0A=20=09=09=09"fillfactor",=0Adiff=20--git=20= a/src/backend/access/nbtree/nbtpage.c=20= b/src/backend/access/nbtree/nbtpage.c=0Aindex=20c79dd38ee18..792bc52558b=20= 100644=0A---=20a/src/backend/access/nbtree/nbtpage.c=0A+++=20= b/src/backend/access/nbtree/nbtpage.c=0A@@=20-1805,6=20+1805,9=20@@=20= _bt_pagedel(Relation=20rel,=20Buffer=20leafbuf,=20BTVacState=20*vstate)=0A= =20=09bool=09=09rightsib_empty;=0A=20=09Page=09=09page;=0A=20=09= BTPageOpaque=20opaque;=0A+=09OffsetNumber=20maxoff;=0A+=09OffsetNumber=20= minoff;=0A+=09bool=09=09page_has_tuples;=0A=20=0A=20=09/*=0A=20=09=20*=20= Save=20original=20leafbuf=20block=20number=20from=20caller.=20=20Only=20= deleted=20blocks=0A@@=20-1876,8=20+1879,12=20@@=20_bt_pagedel(Relation=20= rel,=20Buffer=20leafbuf,=20BTVacState=20*vstate)=0A=20=0A=20=09=09/*=0A=20= =09=09=20*=20We=20can=20never=20delete=20rightmost=20pages=20nor=20root=20= pages.=20=20While=20at=20it,=0A-=09=09=20*=20check=20that=20page=20is=20= empty,=20since=20it's=20possible=20that=20the=20leafbuf=20page=0A-=09=09=20= *=20was=20empty=20a=20moment=20ago,=20but=20has=20since=20had=20some=20= inserts.=0A+=09=09=20*=20check=20that=20page=20is=20empty=20or=20nearly=20= empty,=20since=20it's=0A+=09=09=20*=20possible=20that=20the=20leafbuf=20= page=20was=20empty=20a=20moment=20ago,=20but=20has=20since=0A+=09=09=20*=20= had=20some=20inserts.=0A+=09=09=20*=0A+=09=09=20*=20For=20pages=20that=20= have=20tuples,=20attempt=20to=20move=20them=20to=20the=20right=20sibling=0A= +=09=09=20*=20if=20there's=20enough=20space.=20This=20enables=20merging=20= of=20nearly-empty=20pages.=0A=20=09=09=20*=0A=20=09=09=20*=20To=20keep=20= the=20algorithm=20simple,=20we=20also=20never=20delete=20an=20= incompletely=0A=20=09=09=20*=20split=20page=20(they=20should=20be=20rare=20= enough=20that=20this=20doesn't=20make=20any=0A@@=20-1893,9=20+1900,11=20= @@=20_bt_pagedel(Relation=20rel,=20Buffer=20leafbuf,=20BTVacState=20= *vstate)=0A=20=09=09=20*=20we=20know=20we=20stepped=20right=20from=20a=20= page=20that=20passed=20these=20tests,=20so=0A=20=09=09=20*=20it's=20OK.=0A= =20=09=09=20*/=0A-=09=09if=20(P_RIGHTMOST(opaque)=20||=20= P_ISROOT(opaque)=20||=0A-=09=09=09P_FIRSTDATAKEY(opaque)=20<=3D=20= PageGetMaxOffsetNumber(page)=20||=0A-=09=09=09= P_INCOMPLETE_SPLIT(opaque))=0A+=09=09maxoff=20=3D=20= PageGetMaxOffsetNumber(page);=0A+=09=09minoff=20=3D=20= P_FIRSTDATAKEY(opaque);=0A+=09=09page_has_tuples=20=3D=20(minoff=20<=3D=20= maxoff);=0A+=0A+=09=09if=20(P_RIGHTMOST(opaque)=20||=20P_ISROOT(opaque)=20= ||=20P_INCOMPLETE_SPLIT(opaque))=0A=20=09=09{=0A=20=09=09=09/*=20Should=20= never=20fail=20to=20delete=20a=20half-dead=20page=20*/=0A=20=09=09=09= Assert(!P_ISHALFDEAD(opaque));=0A@@=20-1904,6=20+1913,88=20@@=20= _bt_pagedel(Relation=20rel,=20Buffer=20leafbuf,=20BTVacState=20*vstate)=0A= =20=09=09=09return;=0A=20=09=09}=0A=20=0A+=09=09/*=0A+=09=09=20*=20If=20= page=20has=20tuples,=20check=20if=20they=20can=20be=20merged=20with=20= right=20sibling.=0A+=09=09=20*=20Actual=20merge=20will=20be=20done=20= later=20in=20_bt_unlink_halfdead_page()=20within=0A+=09=09=20*=20the=20= same=20WAL=20record=20as=20the=20page=20deletion=20for=20atomicity.=0A+=09= =09=20*/=0A+=09=09if=20(page_has_tuples)=0A+=09=09{=0A+=09=09=09= BlockNumber=20right_sibling=20=3D=20opaque->btpo_next;=0A+=09=09=09= Buffer=09=09rbuf;=0A+=09=09=09Page=09=09rpage;=0A+=09=09=09BTPageOpaque=20= ropaque;=0A+=09=09=09Size=09=09space_needed=20=3D=200;=0A+=09=09=09= OffsetNumber=20i;=0A+=09=09=09bool=09=09merge_feasible=20=3D=20false;=0A= +=0A+=09=09=09/*=20Calculate=20total=20space=20needed=20for=20all=20= tuples=20*/=0A+=09=09=09for=20(i=20=3D=20minoff;=20i=20<=3D=20maxoff;=20= i++)=0A+=09=09=09{=0A+=09=09=09=09ItemId=09=09itemid=20=3D=20= PageGetItemId(page,=20i);=0A+=09=09=09=09space_needed=20+=3D=20= ItemIdGetLength(itemid)=20+=20sizeof(ItemIdData);=0A+=09=09=09}=0A+=0A+=09= =09=09/*=20Check=20if=20merge=20with=20right=20sibling=20is=20feasible=20= */=0A+=09=09=09if=20(right_sibling=20!=3D=20P_NONE)=0A+=09=09=09{=0A+=09=09= =09=09rbuf=20=3D=20_bt_getbuf(rel,=20right_sibling,=20BT_READ);=0A+=09=09= =09=09rpage=20=3D=20BufferGetPage(rbuf);=0A+=09=09=09=09ropaque=20=3D=20= BTPageGetOpaque(rpage);=0A+=0A+=09=09=09=09/*=20Verify=20right=20sibling=20= is=20a=20leaf=20page=20at=20same=20level=20*/=0A+=09=09=09=09if=20= (P_ISLEAF(ropaque)=20&&=20ropaque->btpo_level=20=3D=3D=20= opaque->btpo_level=20&&=0A+=09=09=09=09=09!P_ISDELETED(ropaque)=20&&=20= !P_ISHALFDEAD(ropaque))=0A+=09=09=09=09{=0A+=09=09=09=09=09Size=20= freespace=20=3D=20PageGetFreeSpace(rpage);=0A+=0A+=09=09=09=09=09if=20= (freespace=20>=3D=20space_needed)=0A+=09=09=09=09=09{=0A+=09=09=09=09=09=09= merge_feasible=20=3D=20true;=0A+=09=09=09=09=09=09elog(DEBUG1,=20"Page=20= merge=20feasible=20for=20index=20\"%s\":=20can=20move=20%d=20tuples=20= from=20page=20%u=20to=20page=20%u=20(need=20%zu=20bytes,=20have=20%zu=20= bytes=20free)",=0A+=09=09=09=09=09=09=09=20RelationGetRelationName(rel),=20= (int)(maxoff=20-=20minoff=20+=201),=0A+=09=09=09=09=09=09=09=20= BufferGetBlockNumber(leafbuf),=20right_sibling,=20space_needed,=20= freespace);=0A+=09=09=09=09=09}=0A+=09=09=09=09=09else=0A+=09=09=09=09=09= {=0A+=09=09=09=09=09=09elog(DEBUG1,=20"Page=20merge=20not=20feasible=20= for=20index=20\"%s\":=20right=20sibling=20page=20%u=20has=20insufficient=20= space=20(need=20%zu=20bytes,=20have=20%zu=20bytes=20free)",=0A+=09=09=09=09= =09=09=09=20RelationGetRelationName(rel),=20right_sibling,=20= space_needed,=20freespace);=0A+=09=09=09=09=09}=0A+=09=09=09=09}=0A+=09=09= =09=09else=0A+=09=09=09=09{=0A+=09=09=09=09=09elog(DEBUG1,=20"Page=20= merge=20not=20feasible=20for=20index=20\"%s\":=20right=20sibling=20page=20= %u=20is=20not=20suitable=20(is_leaf=3D%d,=20deleted=3D%d,=20halfdead=3D%d,= =20level=3D%d=20vs=20%d)",=0A+=09=09=09=09=09=09=20= RelationGetRelationName(rel),=20right_sibling,=20P_ISLEAF(ropaque),=0A+=09= =09=09=09=09=09=20P_ISDELETED(ropaque),=20P_ISHALFDEAD(ropaque),=20= ropaque->btpo_level,=20opaque->btpo_level);=0A+=09=09=09=09}=0A+=0A+=09=09= =09=09_bt_relbuf(rel,=20rbuf);=0A+=09=09=09}=0A+=09=09=09else=0A+=09=09=09= {=0A+=09=09=09=09elog(DEBUG1,=20"Page=20merge=20not=20feasible=20for=20= index=20\"%s\":=20page=20%u=20has=20no=20right=20sibling",=0A+=09=09=09=09= =09=20RelationGetRelationName(rel),=20BufferGetBlockNumber(leafbuf));=0A= +=09=09=09}=0A+=0A+=09=09=09/*=20If=20merge=20is=20not=20feasible,=20= abort=20the=20page=20deletion=20*/=0A+=09=09=09if=20(!merge_feasible)=0A= +=09=09=09{=0A+=09=09=09=09_bt_relbuf(rel,=20leafbuf);=0A+=09=09=09=09= return;=0A+=09=09=09}=0A+=0A+=09=09=09/*=0A+=09=09=09=20*=20Merge=20is=20= feasible.=20Mark=20page=20as=20candidates=20for=20merge=20and=20proceed=0A= +=09=09=09=20*=20with=20deletion.=20The=20actual=20merge=20will=20happen=20= in=20_bt_unlink_halfdead_page().=0A+=09=09=09=20*/=0A+=09=09}=0A+=0A+=09=09= /*=0A+=09=09=20*=20Page=20might=20not=20be=20empty=20if=20we're=20doing=20= a=20merge,=20but=20we've=20validated=0A+=09=09=20*=20that=20merge=20is=20= feasible.=20Proceed=20with=20deletion=20-=20the=20actual=20merge=20will=0A= +=09=09=20*=20happen=20in=20_bt_unlink_halfdead_page().=0A+=09=09=20*/=0A= +=0A=20=09=09/*=0A=20=09=09=20*=20First,=20remove=20downlink=20pointing=20= to=20the=20page=20(or=20a=20parent=20of=20the=0A=20=09=09=20*=20page,=20= if=20we=20are=20going=20to=20delete=20a=20taller=20subtree),=20and=20= mark=20the=0A@@=20-2104,9=20+2195,21=20@@=20= _bt_mark_page_halfdead(Relation=20rel,=20Relation=20heaprel,=20Buffer=20= leafbuf,=0A=20=09page=20=3D=20BufferGetPage(leafbuf);=0A=20=09opaque=20=3D= =20BTPageGetOpaque(page);=0A=20=0A+=09/*=0A+=09=20*=20Assert=20page=20is=20= suitable=20for=20deletion.=20Page=20must=20be=20a=20leaf,=20not=20root,=0A= +=09=20*=20not=20rightmost,=20and=20not=20ignored.=20For=20traditional=20= deletion,=20page=20must=20be=0A+=09=20*=20empty.=20For=20merge=20= scenarios=20(nearly=20empty=20pages),=20page=20may=20have=20tuples.=0A+=09= =20*/=0A=20=09Assert(!P_RIGHTMOST(opaque)=20&&=20!P_ISROOT(opaque)=20&&=0A= -=09=09=20=20=20P_ISLEAF(opaque)=20&&=20!P_IGNORE(opaque)=20&&=0A-=09=09=20= =20=20P_FIRSTDATAKEY(opaque)=20>=20PageGetMaxOffsetNumber(page));=0A+=09=09= =20=20=20P_ISLEAF(opaque)=20&&=20!P_IGNORE(opaque));=0A+=0A+=09/*=0A+=09=20= *=20Page=20should=20be=20either=20empty=20(traditional=20deletion)=20or=20= nearly=20empty=0A+=09=20*=20with=20tuples=20that=20can=20be=20merged=20= (new=20merge=20feature).=0A+=09=20*/=0A+=09Assert(P_FIRSTDATAKEY(opaque)=20= >=20PageGetMaxOffsetNumber(page)=20||=0A+=09=09=20=20=20= (P_FIRSTDATAKEY(opaque)=20<=3D=20PageGetMaxOffsetNumber(page)=20&&=0A+=09= =09=09PageGetFreeSpace(page)=20>=3D=20((BLCKSZ=20-=20= SizeOfPageHeaderData)=20*=20BTGetMergeFactor(rel))=20/=20100));=0A=20=09= Assert(heaprel=20!=3D=20NULL);=0A=20=0A=20=09/*=0A@@=20-2231,7=20= +2334,13=20@@=20_bt_mark_page_halfdead(Relation=20rel,=20Relation=20= heaprel,=20Buffer=20leafbuf,=0A=20=09opaque=20=3D=20= BTPageGetOpaque(page);=0A=20=09opaque->btpo_flags=20|=3D=20= BTP_HALF_DEAD;=0A=20=0A-=09Assert(PageGetMaxOffsetNumber(page)=20=3D=3D=20= P_HIKEY);=0A+=09/*=0A+=09=20*=20Page=20should=20only=20have=20high=20key=20= in=20traditional=20deletion=20scenario.=0A+=09=20*=20In=20merge=20= scenarios,=20page=20may=20have=20data=20tuples=20that=20will=20be=20= moved=20later.=0A+=09=20*/=0A+=09Assert(PageGetMaxOffsetNumber(page)=20= =3D=3D=20P_HIKEY=20||=0A+=09=09=20=20=20(PageGetMaxOffsetNumber(page)=20= >=20P_HIKEY=20&&=0A+=09=09=09PageGetFreeSpace(page)=20>=3D=20((BLCKSZ=20= -=20SizeOfPageHeaderData)=20*=20BTGetMergeFactor(rel))=20/=20100));=0A=20= =09MemSet(&trunctuple,=200,=20sizeof(IndexTupleData));=0A=20=09= trunctuple.t_info=20=3D=20sizeof(IndexTupleData);=0A=20=09if=20= (topparent=20!=3D=20leafblkno)=0A@@=20-2333,9=20+2442,19=20@@=20= _bt_unlink_halfdead_page(Relation=20rel,=20Buffer=20leafbuf,=20= BlockNumber=20scanblkno,=0A=20=09FullTransactionId=20safexid;=0A=20=09= bool=09=09rightsib_is_rightmost;=0A=20=09uint32=09=09targetlevel;=0A+=09= IndexTuple=09tuple;=0A+=09Size=09=09tupsz;=0A+=09int=09=09=09idx;=0A=20=09= IndexTuple=09leafhikey;=0A=20=09BlockNumber=20leaftopparent;=0A=20=0A+=09= /*=20Variables=20for=20page=20merge=20functionality=20*/=0A+=09= IndexTuple=09*merge_tuples=20=3D=20NULL;=0A+=09Size=09=09*merge_tupsz=20= =3D=20NULL;=0A+=09OffsetNumber=20*merge_deletable=20=3D=20NULL;=0A+=09= int=09=09=09merge_ntuples=20=3D=200;=0A+=09bool=09=09do_merge=20=3D=20= false;=0A+=0A=20=09page=20=3D=20BufferGetPage(leafbuf);=0A=20=09opaque=20= =3D=20BTPageGetOpaque(page);=0A=20=0A@@=20-2480,11=20+2599,24=20@@=20= _bt_unlink_halfdead_page(Relation=20rel,=20Buffer=20leafbuf,=20= BlockNumber=20scanblkno,=0A=20=0A=20=09if=20(target=20=3D=3D=20= leafblkno)=0A=20=09{=0A-=09=09if=20(P_FIRSTDATAKEY(opaque)=20<=3D=20= PageGetMaxOffsetNumber(page)=20||=0A-=09=09=09!P_ISLEAF(opaque)=20||=20= !P_ISHALFDEAD(opaque))=0A+=09=09/*=0A+=09=09=20*=20Validate=20leaf=20= page=20status.=20Page=20must=20be=20leaf=20and=20half-dead.=0A+=09=09=20= *=20For=20traditional=20deletion,=20page=20must=20be=20empty.=20For=20= merge=20scenarios,=0A+=09=09=20*=20page=20may=20have=20data=20tuples=20= that=20will=20be=20moved=20to=20right=20sibling.=0A+=09=09=20*/=0A+=09=09= if=20(!P_ISLEAF(opaque)=20||=20!P_ISHALFDEAD(opaque))=0A=20=09=09=09= elog(ERROR,=20"target=20leaf=20page=20changed=20status=20unexpectedly=20= in=20block=20%u=20of=20index=20\"%s\"",=0A=20=09=09=09=09=20target,=20= RelationGetRelationName(rel));=0A=20=0A+=09=09/*=20Check=20if=20page=20= is=20empty=20(traditional=20deletion)=20or=20suitable=20for=20merge=20*/=0A= +=09=09if=20(P_FIRSTDATAKEY(opaque)=20<=3D=20= PageGetMaxOffsetNumber(page))=0A+=09=09{=0A+=09=09=09/*=20Page=20has=20= tuples=20-=20validate=20it's=20suitable=20for=20merge=20*/=0A+=09=09=09= if=20(PageGetFreeSpace(page)=20<=20((BLCKSZ=20-=20SizeOfPageHeaderData)=20= *=20BTGetMergeFactor(rel))=20/=20100)=0A+=09=09=09=09elog(ERROR,=20= "target=20leaf=20page=20changed=20status=20unexpectedly=20in=20block=20= %u=20of=20index=20\"%s\"",=0A+=09=09=09=09=09=20target,=20= RelationGetRelationName(rel));=0A+=09=09}=0A+=0A=20=09=09/*=20Leaf=20= page=20is=20also=20target=20page:=20don't=20set=20leaftopparent=20*/=0A=20= =09=09leaftopparent=20=3D=20InvalidBlockNumber;=0A=20=09}=0A@@=20-2591,6=20= +2723,71=20@@=20_bt_unlink_halfdead_page(Relation=20rel,=20Buffer=20= leafbuf,=20BlockNumber=20scanblkno,=0A=20=09=20*=20Here=20we=20begin=20= doing=20the=20deletion.=0A=20=09=20*/=0A=20=0A+=09/*=0A+=09=20*=20= Prepare=20merge=20data=20before=20critical=20section=20if=20needed.=0A+=09= =20*=20We'll=20use=20local=20variables=20and=20perform=20the=20merge=20= within=20the=20critical=20section.=0A+=09=20*/=0A+=0A+=09if=20(target=20= =3D=3D=20leafblkno)=0A+=09{=0A+=09=09Page=09=09leafpage=20=3D=20= BufferGetPage(leafbuf);=0A+=09=09BTPageOpaque=20leafopaque=20=3D=20= BTPageGetOpaque(leafpage);=0A+=09=09OffsetNumber=20minoff=20=3D=20= P_FIRSTDATAKEY(leafopaque);=0A+=09=09OffsetNumber=20maxoff=20=3D=20= PageGetMaxOffsetNumber(leafpage);=0A+=0A+=09=09/*=20Check=20if=20leaf=20= page=20has=20tuples=20that=20need=20to=20be=20merged=20*/=0A+=09=09if=20= (minoff=20<=3D=20maxoff=20&&=20P_ISLEAF(leafopaque)=20&&=20= P_ISHALFDEAD(leafopaque))=0A+=09=09{=0A+=09=09=09Page=09=09rightpage=20=3D= =20BufferGetPage(rbuf);=0A+=09=09=09BTPageOpaque=20rightopaque=20=3D=20= BTPageGetOpaque(rightpage);=0A+=09=09=09Size=09=09space_needed=20=3D=20= 0;=0A+=09=09=09OffsetNumber=20i;=0A+=0A+=09=09=09merge_ntuples=20=3D=20= maxoff=20-=20minoff=20+=201;=0A+=0A+=09=09=09/*=20Calculate=20total=20= space=20needed=20for=20all=20tuples=20*/=0A+=09=09=09for=20(i=20=3D=20= minoff;=20i=20<=3D=20maxoff;=20i++)=0A+=09=09=09{=0A+=09=09=09=09itemid=20= =3D=20PageGetItemId(leafpage,=20i);=0A+=09=09=09=09space_needed=20+=3D=20= ItemIdGetLength(itemid)=20+=20sizeof(ItemIdData);=0A+=09=09=09}=0A+=0A+=09= =09=09/*=20Verify=20that=20right=20sibling=20has=20enough=20space=20= (should=20have=20been=20checked=20earlier)=20*/=0A+=09=09=09if=20= (P_ISLEAF(rightopaque)=20&&=20rightopaque->btpo_level=20=3D=3D=20= leafopaque->btpo_level=20&&=0A+=09=09=09=09!P_ISDELETED(rightopaque)=20= &&=20!P_ISHALFDEAD(rightopaque)=20&&=0A+=09=09=09=09= PageGetFreeSpace(rightpage)=20>=3D=20space_needed)=0A+=09=09=09{=0A+=09=09= =09=09elog(DEBUG1,=20"Performing=20page=20merge=20in=20index=20\"%s\":=20= moving=20%d=20tuples=20from=20page=20%u=20to=20page=20%u",=0A+=09=09=09=09= =09=20RelationGetRelationName(rel),=20merge_ntuples,=20leafblkno,=20= rightsib);=0A+=0A+=09=09=09=09/*=20Pre-allocate=20and=20copy=20all=20= tuples=20outside=20critical=20section=20*/=0A+=09=09=09=09merge_tuples=20= =3D=20palloc(sizeof(IndexTuple)=20*=20merge_ntuples);=0A+=09=09=09=09= merge_tupsz=20=3D=20palloc(sizeof(Size)=20*=20merge_ntuples);=0A+=09=09=09= =09merge_deletable=20=3D=20palloc(sizeof(OffsetNumber)=20*=20= merge_ntuples);=0A+=0A+=09=09=09=09for=20(i=20=3D=20minoff;=20i=20<=3D=20= maxoff;=20i++)=0A+=09=09=09=09{=0A+=09=09=09=09=09itemid=20=3D=20= PageGetItemId(leafpage,=20i);=0A+=09=09=09=09=09tuple=20=3D=20= (IndexTuple)=20PageGetItem(leafpage,=20itemid);=0A+=09=09=09=09=09tupsz=20= =3D=20ItemIdGetLength(itemid);=0A+=09=09=09=09=09idx=20=3D=20i=20-=20= minoff;=0A+=0A+=09=09=09=09=09merge_tuples[idx]=20=3D=20palloc(tupsz);=0A= +=09=09=09=09=09memcpy(merge_tuples[idx],=20tuple,=20tupsz);=0A+=09=09=09= =09=09merge_tupsz[idx]=20=3D=20tupsz;=0A+=09=09=09=09=09= merge_deletable[idx]=20=3D=20i;=0A+=09=09=09=09}=0A+=0A+=09=09=09=09= do_merge=20=3D=20true;=0A+=09=09=09}=0A+=09=09=09else=0A+=09=09=09{=0A+=09= =09=09=09elog(DEBUG1,=20"Cannot=20perform=20page=20merge=20in=20index=20= \"%s\":=20right=20sibling=20conditions=20not=20met",=0A+=09=09=09=09=09=20= RelationGetRelationName(rel));=0A+=09=09=09}=0A+=09=09}=0A+=09}=0A+=0A=20= =09/*=20No=20ereport(ERROR)=20until=20changes=20are=20logged=20*/=0A=20=09= START_CRIT_SECTION();=0A=20=0A@@=20-2611,6=20+2808,38=20@@=20= _bt_unlink_halfdead_page(Relation=20rel,=20Buffer=20leafbuf,=20= BlockNumber=20scanblkno,=0A=20=09Assert(opaque->btpo_prev=20=3D=3D=20= target);=0A=20=09opaque->btpo_prev=20=3D=20leftsib;=0A=20=0A+=09/*=0A+=09= =20*=20Perform=20page=20merge=20if=20needed.=20Move=20tuples=20from=20= the=20leaf=20page=20to=20the=20right=0A+=09=20*=20sibling=20before=20= marking=20the=20leaf=20page=20as=20deleted.=20This=20must=20be=20done=20= within=0A+=09=20*=20the=20critical=20section=20to=20ensure=20atomicity=20= with=20the=20page=20deletion.=0A+=09=20*/=0A+=09if=20(do_merge)=0A+=09{=0A= +=09=09Page=09=09leafpage=20=3D=20BufferGetPage(leafbuf);=0A+=09=09Page=09= =09rightpage=20=3D=20BufferGetPage(rbuf);=0A+=09=09BTPageOpaque=20= rightopaque=20=3D=20BTPageGetOpaque(rightpage);=0A+=09=09OffsetNumber=20= insert_at=20=3D=20P_FIRSTDATAKEY(rightopaque);=0A+=09=09int=09=09=09i;=0A= +=0A+=09=09/*=20Move=20all=20saved=20tuples=20to=20right=20sibling=20*/=0A= +=09=09for=20(i=20=3D=200;=20i=20<=20merge_ntuples;=20i++)=0A+=09=09{=0A= +=09=09=09if=20(PageAddItem(rightpage,=20(Item)=20merge_tuples[i],=20= merge_tupsz[i],=0A+=09=09=09=09=09=09=09insert_at,=20false,=20false)=20= =3D=3D=20InvalidOffsetNumber)=0A+=09=09=09{=0A+=09=09=09=09elog(PANIC,=20= "failed=20to=20move=20tuple=20during=20page=20merge=20in=20index=20= \"%s\"=20-=20space=20should=20have=20been=20checked",=0A+=09=09=09=09=09=20= RelationGetRelationName(rel));=0A+=09=09=09}=0A+=09=09=09insert_at++;=0A= +=09=09}=0A+=0A+=09=09/*=20Clear=20all=20tuples=20from=20the=20leaf=20= page=20using=20pre-allocated=20array=20*/=0A+=09=09= PageIndexMultiDelete(leafpage,=20merge_deletable,=20merge_ntuples);=0A+=0A= +=09=09elog(DEBUG1,=20"Page=20merge=20completed=20in=20index=20\"%s\":=20= moved=20%d=20tuples=20from=20page=20%u=20to=20page=20%u",=0A+=09=09=09=20= RelationGetRelationName(rel),=20merge_ntuples,=20leafblkno,=20rightsib);=0A= +=09}=0A+=0A=20=09/*=0A=20=09=20*=20If=20we=20deleted=20a=20parent=20of=20= the=20targeted=20leaf=20page,=20instead=20of=20the=20leaf=0A=20=09=20*=20= itself,=20update=20the=20leaf=20to=20point=20to=20the=20next=20remaining=20= child=20in=20the=0A@@=20-2676,10=20+2905,15=20@@=20= _bt_unlink_halfdead_page(Relation=20rel,=20Buffer=20leafbuf,=20= BlockNumber=20scanblkno,=0A=20=0A=20=09=09XLogBeginInsert();=0A=20=0A-=09= =09XLogRegisterBuffer(0,=20buf,=20REGBUF_WILL_INIT);=0A+=09=09/*=20= Always=20use=20REGBUF_STANDARD=20for=20target=20page=20to=20allow=20= reading=20during=20replay=20*/=0A+=09=09XLogRegisterBuffer(0,=20buf,=20= REGBUF_STANDARD);=0A=20=09=09if=20(BufferIsValid(lbuf))=0A=20=09=09=09= XLogRegisterBuffer(1,=20lbuf,=20REGBUF_STANDARD);=0A-=09=09= XLogRegisterBuffer(2,=20rbuf,=20REGBUF_STANDARD);=0A+=09=09/*=20Always=20= use=20REGBUF_FORCE_IMAGE=20for=20right=20sibling=20if=20we're=20merging=20= tuples=20*/=0A+=09=09if=20(merge_ntuples=20>=200)=0A+=09=09=09= XLogRegisterBuffer(2,=20rbuf,=20REGBUF_FORCE_IMAGE);=0A+=09=09else=0A+=09= =09=09XLogRegisterBuffer(2,=20rbuf,=20REGBUF_STANDARD);=0A=20=09=09if=20= (target=20!=3D=20leafblkno)=0A=20=09=09=09XLogRegisterBuffer(3,=20= leafbuf,=20REGBUF_WILL_INIT);=0A=20=0A@@=20-2694,8=20+2928,12=20@@=20= _bt_unlink_halfdead_page(Relation=20rel,=20Buffer=20leafbuf,=20= BlockNumber=20scanblkno,=0A=20=09=09xlrec.leafrightsib=20=3D=20= leafrightsib;=0A=20=09=09xlrec.leaftopparent=20=3D=20leaftopparent;=0A=20= =0A+=09=09/*=20Note:=20merge=20information=20is=20inferred=20during=20= WAL=20replay=20by=20checking=20target=20page=20content=20*/=0A+=0A=20=09=09= XLogRegisterData(&xlrec,=20SizeOfBtreeUnlinkPage);=0A=20=0A+=09=09/*=20= Note:=20merged=20tuple=20data=20is=20read=20directly=20from=20target=20= page=20during=20WAL=20replay=20*/=0A+=0A=20=09=09if=20= (BufferIsValid(metabuf))=0A=20=09=09{=0A=20=09=09=09= XLogRegisterBuffer(4,=20metabuf,=20REGBUF_WILL_INIT=20|=20= REGBUF_STANDARD);=0A@@=20-2739,6=20+2977,19=20@@=20= _bt_unlink_halfdead_page(Relation=20rel,=20Buffer=20leafbuf,=20= BlockNumber=20scanblkno,=0A=20=0A=20=09END_CRIT_SECTION();=0A=20=0A+=09= /*=20Clean=20up=20merge-related=20memory=20allocations=20*/=0A+=09if=20= (do_merge=20&&=20merge_tuples=20!=3D=20NULL)=0A+=09{=0A+=09=09for=20(int=20= i=20=3D=200;=20i=20<=20merge_ntuples;=20i++)=0A+=09=09{=0A+=09=09=09if=20= (merge_tuples[i]=20!=3D=20NULL)=0A+=09=09=09=09pfree(merge_tuples[i]);=0A= +=09=09}=0A+=09=09pfree(merge_tuples);=0A+=09=09pfree(merge_tupsz);=0A+=09= =09pfree(merge_deletable);=0A+=09}=0A+=0A=20=09/*=20release=20metapage=20= */=0A=20=09if=20(BufferIsValid(metabuf))=0A=20=09=09_bt_relbuf(rel,=20= metabuf);=0Adiff=20--git=20a/src/backend/access/nbtree/nbtree.c=20= b/src/backend/access/nbtree/nbtree.c=0Aindex=20fdff960c130..281467ff18f=20= 100644=0A---=20a/src/backend/access/nbtree/nbtree.c=0A+++=20= b/src/backend/access/nbtree/nbtree.c=0A@@=20-1620,6=20+1620,9=20@@=20= backtrack:=0A=20=09=09=20*=20since=20that=20would=20require=20teaching=20= _bt_pagedel()=20about=20backtracking=0A=20=09=09=20*=20(doesn't=20seem=20= worth=20adding=20more=20complexity=20to=20deal=20with=20that).=0A=20=09=09= =20*=0A+=09=09=20*=20We=20also=20attempt=20page=20deletion=20if=20the=20= page=20is=20mostly=20empty=20(by=20bytes).=0A+=09=09=20*=20This=20= enables=20merging=20of=20nearly-empty=20pages=20to=20reduce=20bloat.=0A+=09= =09=20*=0A=20=09=09=20*=20We=20don't=20count=20the=20number=20of=20live=20= TIDs=20during=20cleanup-only=20calls=20to=0A=20=09=09=20*=20btvacuumscan=20= (i.e.=20when=20callback=20is=20not=20set).=20=20We=20count=20the=20= number=0A=20=09=09=20*=20of=20index=20tuples=20directly=20instead.=20=20= This=20avoids=20the=20expense=20of=0A@@=20-1630,12=20+1633,35=20@@=20= backtrack:=0A=20=09=09=20*/=0A=20=09=09if=20(minoff=20>=20maxoff)=0A=20=09= =09=09attempt_pagedel=20=3D=20(blkno=20=3D=3D=20scanblkno);=0A-=09=09= else=20if=20(callback)=0A+=09=09else=20if=20(blkno=20=3D=3D=20scanblkno)=0A= +=09=09{=0A+=09=09=09/*=20Check=20if=20page=20meets=20merge=20threshold=20= by=20space=20usage=20*/=0A+=09=09=09Size=09=09freespace=20=3D=20= PageGetFreeSpace(page);=0A+=09=09=09Size=09=09pagesize=20=3D=20BLCKSZ=20= -=20SizeOfPageHeaderData;=0A+=09=09=09int=09=09=09mergefactor=20=3D=20= BTGetMergeFactor(rel);=0A+=0A+=09=09=09=09=09/*=0A+=09=09=20*=20Only=20= attempt=20page=20merge=20if=20there=20were=20no=20vacuum=20deletions=0A+=09= =09=20*=20on=20this=20page.=20If=20there=20were=20deletions,=20the=20= vacuum=20WAL=20record=0A+=09=09=20*=20was=20already=20written=20with=20= specific=20offset=20numbers=20that=20would=0A+=09=09=20*=20become=20= invalid=20if=20we=20merge=20tuples=20to=20another=20page.=0A+=09=09=20*/=0A= +=09=09if=20(freespace=20>=3D=20(pagesize=20*=20mergefactor)=20/=20100=20= &&=20ndeletable=20=3D=3D=200=20&&=20nupdatable=20=3D=3D=200)=0A+=09=09=09= attempt_pagedel=20=3D=20true;=0A+=09=09}=0A+=0A+=09=09if=20(callback)=0A=20= =09=09=09stats->num_index_tuples=20+=3D=20nhtidslive;=0A=20=09=09else=0A=20= =09=09=09stats->num_index_tuples=20+=3D=20maxoff=20-=20minoff=20+=201;=0A= =20=0A-=09=09Assert(!attempt_pagedel=20||=20nhtidslive=20=3D=3D=200);=0A= +=09=09/*=0A+=09=09=20*=20Assert=20that=20either=20we're=20not=20= attempting=20page=20deletion,=20or=20if=20we=20are,=0A+=09=09=20*=20it's=20= either=20because=20the=20page=20is=20empty=20(nhtidslive=20=3D=3D=200)=20= or=20because=0A+=09=09=20*=20we're=20attempting=20a=20merge=20of=20a=20= mostly=20empty=20page=20with=20tuples.=0A+=09=09=20*/=0A+=09=09= Assert(!attempt_pagedel=20||=20nhtidslive=20=3D=3D=200=20||=0A+=09=09=09=20= =20=20(blkno=20=3D=3D=20scanblkno=20&&=20PageGetFreeSpace(page)=20>=3D=20= ((BLCKSZ=20-=20SizeOfPageHeaderData)=20*=20BTGetMergeFactor(rel))=20/=20= 100));=0A=20=09}=0A=20=0A=20=09if=20(attempt_pagedel)=0Adiff=20--git=20= a/src/backend/access/nbtree/nbtutils.c=20= b/src/backend/access/nbtree/nbtutils.c=0Aindex=20= 9aed207995f..30b986f35be=20100644=0A---=20= a/src/backend/access/nbtree/nbtutils.c=0A+++=20= b/src/backend/access/nbtree/nbtutils.c=0A@@=20-3649,7=20+3649,8=20@@=20= btoptions(Datum=20reloptions,=20bool=20validate)=0A=20=09=09= {"vacuum_cleanup_index_scale_factor",=20RELOPT_TYPE_REAL,=0A=20=09=09= offsetof(BTOptions,=20vacuum_cleanup_index_scale_factor)},=0A=20=09=09= {"deduplicate_items",=20RELOPT_TYPE_BOOL,=0A-=09=09offsetof(BTOptions,=20= deduplicate_items)}=0A+=09=09offsetof(BTOptions,=20deduplicate_items)},=0A= +=09=09{"mergefactor",=20RELOPT_TYPE_INT,=20offsetof(BTOptions,=20= mergefactor)}=0A=20=09};=0A=20=0A=20=09return=20(bytea=20*)=20= build_reloptions(reloptions,=20validate,=0Adiff=20--git=20= a/src/backend/access/nbtree/nbtxlog.c=20= b/src/backend/access/nbtree/nbtxlog.c=0Aindex=20d31dd56732d..7ee05d122f8=20= 100644=0A---=20a/src/backend/access/nbtree/nbtxlog.c=0A+++=20= b/src/backend/access/nbtree/nbtxlog.c=0A@@=20-813,6=20+813,8=20@@=20= btree_xlog_unlink_page(uint8=20info,=20XLogReaderState=20*record)=0A=20=09= Buffer=09=09rightbuf;=0A=20=09Page=09=09page;=0A=20=09BTPageOpaque=20= pageop;=0A+=09IndexTuple=20*merge_tuples;=0A+=09uint16=09=09= merge_ntuples;=0A=20=0A=20=09leftsib=20=3D=20xlrec->leftsib;=0A=20=09= rightsib=20=3D=20xlrec->rightsib;=0A@@=20-823,6=20+825,10=20@@=20= btree_xlog_unlink_page(uint8=20info,=20XLogReaderState=20*record)=0A=20=09= /*=20No=20leaftopparent=20for=20level=200=20(leaf=20page)=20or=20level=20= 1=20target=20*/=0A=20=09Assert(!BlockNumberIsValid(xlrec->leaftopparent)=20= ||=20level=20>=201);=0A=20=0A+=09/*=20Initialize=20merge=20variables=20= */=0A+=09merge_tuples=20=3D=20NULL;=0A+=09merge_ntuples=20=3D=200;=0A+=0A= =20=09/*=0A=20=09=20*=20In=20normal=20operation,=20we=20would=20lock=20= all=20the=20pages=20this=20WAL=20record=0A=20=09=20*=20touches=20before=20= changing=20any=20of=20them.=20=20In=20WAL=20replay,=20we=20at=20least=20= lock=0A@@=20-847,12=20+853,57=20@@=20btree_xlog_unlink_page(uint8=20= info,=20XLogReaderState=20*record)=0A=20=09else=0A=20=09=09leftbuf=20=3D=20= InvalidBuffer;=0A=20=0A-=09/*=20Rewrite=20target=20page=20as=20empty=20= deleted=20page=20*/=0A-=09target=20=3D=20XLogInitBufferForRedo(record,=20= 0);=0A-=09page=20=3D=20(Page)=20BufferGetPage(target);=0A+=09/*=0A+=09=20= *=20Handle=20target=20page.=20For=20page=20merges,=20we=20first=20read=20= existing=20content=0A+=09=20*=20to=20extract=20tuples,=20then=20= reinitialize.=20For=20simple=20deletions,=20we=20can=0A+=09=20*=20= initialize=20directly.=0A+=09=20*/=0A+=09if=20= (XLogReadBufferForRedo(record,=200,=20&target)=20=3D=3D=20= BLK_NEEDS_REDO)=0A+=09{=0A+=09=09page=20=3D=20(Page)=20= BufferGetPage(target);=0A+=09=09pageop=20=3D=20BTPageGetOpaque(page);=0A=20= =0A-=09_bt_pageinit(page,=20BufferGetPageSize(target));=0A-=09pageop=20=3D= =20BTPageGetOpaque(page);=0A+=09=09/*=20Check=20if=20this=20is=20a=20= leaf=20page=20merge=20case=20*/=0A+=09=09if=20(isleaf=20&&=20rightsib=20= !=3D=20P_NONE=20&&=20!P_RIGHTMOST(pageop))=0A+=09=09{=0A+=09=09=09= OffsetNumber=20maxoff=20=3D=20PageGetMaxOffsetNumber(page);=0A+=09=09=09= OffsetNumber=20minoff=20=3D=20P_FIRSTDATAKEY(pageop);=0A+=0A+=09=09=09/*=20= If=20page=20has=20tuples,=20save=20them=20for=20merging=20*/=0A+=09=09=09= if=20(minoff=20<=3D=20maxoff)=0A+=09=09=09{=0A+=09=09=09=09OffsetNumber=20= offnum;=0A+=09=09=09=09uint16=09=09i=20=3D=200;=0A+=0A+=09=09=09=09= merge_ntuples=20=3D=20maxoff=20-=20minoff=20+=201;=0A+=09=09=09=09= merge_tuples=20=3D=20(IndexTuple=20*)=20palloc(merge_ntuples=20*=20= sizeof(IndexTuple));=0A+=0A+=09=09=09=09/*=20Copy=20all=20tuples=20from=20= the=20page=20*/=0A+=09=09=09=09for=20(offnum=20=3D=20minoff;=20offnum=20= <=3D=20maxoff;=20offnum++)=0A+=09=09=09=09{=0A+=09=09=09=09=09ItemId=09=09= itemid=20=3D=20PageGetItemId(page,=20offnum);=0A+=09=09=09=09=09= IndexTuple=09tuple=20=3D=20(IndexTuple)=20PageGetItem(page,=20itemid);=0A= +=09=09=09=09=09Size=09=09tupsz=20=3D=20IndexTupleSize(tuple);=0A+=0A+=09= =09=09=09=09merge_tuples[i]=20=3D=20(IndexTuple)=20palloc(tupsz);=0A+=09=09= =09=09=09memcpy(merge_tuples[i],=20tuple,=20tupsz);=0A+=09=09=09=09=09= i++;=0A+=09=09=09=09}=0A+=09=09=09}=0A+=09=09}=0A+=0A+=09=09/*=20Now=20= reinitialize=20target=20page=20as=20empty=20deleted=20page=20*/=0A+=09=09= _bt_pageinit(page,=20BufferGetPageSize(target));=0A+=09=09pageop=20=3D=20= BTPageGetOpaque(page);=0A+=09}=0A+=09else=0A+=09{=0A+=09=09/*=20Page=20= doesn't=20need=20redo,=20but=20we=20still=20need=20to=20get=20the=20= buffer=20*/=0A+=09=09target=20=3D=20XLogInitBufferForRedo(record,=200);=0A= +=09=09page=20=3D=20(Page)=20BufferGetPage(target);=0A+=09=09= _bt_pageinit(page,=20BufferGetPageSize(target));=0A+=09=09pageop=20=3D=20= BTPageGetOpaque(page);=0A+=09}=0A=20=0A=20=09pageop->btpo_prev=20=3D=20= leftsib;=0A=20=09pageop->btpo_next=20=3D=20rightsib;=0A@@=20-865,17=20= +916,36=20@@=20btree_xlog_unlink_page(uint8=20info,=20XLogReaderState=20= *record)=0A=20=09PageSetLSN(page,=20lsn);=0A=20=09= MarkBufferDirty(target);=0A=20=0A-=09/*=20Fix=20left-link=20of=20right=20= sibling=20*/=0A+=09/*=20Fix=20left-link=20of=20right=20sibling=20and=20= replay=20merged=20tuples=20if=20any=20*/=0A=20=09if=20= (XLogReadBufferForRedo(record,=202,=20&rightbuf)=20=3D=3D=20= BLK_NEEDS_REDO)=0A=20=09{=0A=20=09=09page=20=3D=20(Page)=20= BufferGetPage(rightbuf);=0A=20=09=09pageop=20=3D=20= BTPageGetOpaque(page);=0A=20=09=09pageop->btpo_prev=20=3D=20leftsib;=0A=20= =0A+=09=09/*=0A+=09=09=20*=20Note:=20If=20tuples=20were=20merged=20= during=20the=20original=20operation,=20the=20right=0A+=09=09=20*=20= sibling=20buffer=20was=20registered=20with=20REGBUF_FORCE_IMAGE,=20so=20= the=20page=0A+=09=09=20*=20is=20automatically=20restored=20to=20its=20= final=20post-merge=20state.=20No=20explicit=0A+=09=09=20*=20tuple=20= insertion=20is=20needed=20during=20replay.=0A+=09=09=20*=0A+=09=09=20*=20= For=20non-merge=20operations,=20we=20only=20need=20to=20update=20the=20= left-link.=0A+=09=09=20*/=0A+=0A=20=09=09PageSetLSN(page,=20lsn);=0A=20=09= =09MarkBufferDirty(rightbuf);=0A=20=09}=0A=20=0A+=09/*=20Clean=20up=20= merge=20tuples=20*/=0A+=09if=20(merge_tuples=20!=3D=20NULL)=0A+=09{=0A+=09= =09uint16=09=09i;=0A+=0A+=09=09for=20(i=20=3D=200;=20i=20<=20= merge_ntuples;=20i++)=0A+=09=09=09pfree(merge_tuples[i]);=0A+=09=09= pfree(merge_tuples);=0A+=09}=0A+=0A=20=09/*=20Release=20siblings=20*/=0A=20= =09if=20(BufferIsValid(leftbuf))=0A=20=09=09= UnlockReleaseBuffer(leftbuf);=0Adiff=20--git=20= a/src/include/access/nbtree.h=20b/src/include/access/nbtree.h=0Aindex=20= e709d2e0afe..6fd1bcd142d=20100644=0A---=20a/src/include/access/nbtree.h=0A= +++=20b/src/include/access/nbtree.h=0A@@=20-201,6=20+201,7=20@@=20= typedef=20struct=20BTMetaPageData=0A=20#define=20= BTREE_DEFAULT_FILLFACTOR=0990=0A=20#define=20BTREE_NONLEAF_FILLFACTOR=09= 70=0A=20#define=20BTREE_SINGLEVAL_FILLFACTOR=0996=0A+#define=20= BTREE_DEFAULT_MERGEFACTOR=095=0A=20=0A=20/*=0A=20=20*=09In=20general,=20= the=20btree=20code=20tries=20to=20localize=20its=20knowledge=20about=0A= @@=20-1153,6=20+1154,7=20@@=20typedef=20struct=20BTOptions=0A=20=09int=09= =09=09fillfactor;=09=09/*=20page=20fill=20factor=20in=20percent=20= (0..100)=20*/=0A=20=09float8=09=09vacuum_cleanup_index_scale_factor;=09= /*=20deprecated=20*/=0A=20=09bool=09=09deduplicate_items;=09/*=20Try=20= to=20deduplicate=20items?=20*/=0A+=09int=09=09=09mergefactor;=09/*=20= page=20merge=20factor=20in=20percent=20(0..100)=20*/=0A=20}=20BTOptions;=0A= =20=0A=20#define=20BTGetFillFactor(relation)=20\=0A@@=20-1168,6=20= +1170,12=20@@=20typedef=20struct=20BTOptions=0A=20=09=09=09=09=20= relation->rd_rel->relam=20=3D=3D=20BTREE_AM_OID),=20\=0A=20=09= ((relation)->rd_options=20?=20\=0A=20=09=20((BTOptions=20*)=20= (relation)->rd_options)->deduplicate_items=20:=20true))=0A+#define=20= BTGetMergeFactor(relation)=20\=0A+=09= (AssertMacro(relation->rd_rel->relkind=20=3D=3D=20RELKIND_INDEX=20&&=20\=0A= +=09=09=09=09=20relation->rd_rel->relam=20=3D=3D=20BTREE_AM_OID),=20\=0A= +=09=20(relation)->rd_options=20?=20\=0A+=09=20((BTOptions=20*)=20= (relation)->rd_options)->mergefactor=20:=20\=0A+=09=20= BTREE_DEFAULT_MERGEFACTOR)=0A=20=0A=20/*=0A=20=20*=20Constant=20= definition=20for=20progress=20reporting.=20=20Phase=20numbers=20must=20= match=0Adiff=20--git=20a/src/include/access/nbtxlog.h=20= b/src/include/access/nbtxlog.h=0Aindex=20fbbe115c771..5ba5eb69b1d=20= 100644=0A---=20a/src/include/access/nbtxlog.h=0A+++=20= b/src/include/access/nbtxlog.h=0A@@=20-325,7=20+325,9=20@@=20typedef=20= struct=20xl_btree_unlink_page=0A=20=09BlockNumber=20leafrightsib;=0A=20=09= BlockNumber=20leaftopparent;=09/*=20next=20child=20down=20in=20the=20= subtree=20*/=0A=20=0A-=09/*=20xl_btree_metadata=20FOLLOWS=20IF=20= XLOG_BTREE_UNLINK_PAGE_META=20*/=0A+=09/*=0A+=09=20*=20xl_btree_metadata=20= FOLLOWS=20IF=20XLOG_BTREE_UNLINK_PAGE_META=0A+=09=20*/=0A=20}=20= xl_btree_unlink_page;=0A=20=0A=20#define=20SizeOfBtreeUnlinkPage=09= (offsetof(xl_btree_unlink_page,=20leaftopparent)=20+=20= sizeof(BlockNumber))=0A--=20=0A2.39.5=20(Apple=20Git-154)=0A=0A= --Apple-Mail=_0B27333F-B829-459D-8A6C-156FEFF0AF59--