Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvwcd-00BftW-1S for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Feb 2026 12:07:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvwcc-0035ju-0t for pgsql-hackers@arkaria.postgresql.org; Fri, 27 Feb 2026 12:07:06 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvwcb-0035jl-3A for pgsql-hackers@lists.postgresql.org; Fri, 27 Feb 2026 12:07:05 +0000 Received: from mail-lj1-x236.google.com ([2a00:1450:4864:20::236]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vvwcY-00000001Z2y-3RLu for pgsql-hackers@postgresql.org; Fri, 27 Feb 2026 12:07:05 +0000 Received: by mail-lj1-x236.google.com with SMTP id 38308e7fff4ca-389e87e3452so22574171fa.3 for ; Fri, 27 Feb 2026 04:07:03 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1772194022; cv=none; d=google.com; s=arc-20240605; b=S08vP/J27GyNxy090T12lB0yh3/ByF5l1rDbTM3dwYqLe1NeOExcGy66exmfecKp7b Rz3igyheJ4VHNQ8eKeBJKNn0BeAPQCxniIN58eFwWMS3hDgBSMyrTzWDocRNz3jut6+h bKaPGkI41VCdKzQrWmt1jn4ziahtSAxAtNi1AM3ivkg19JSttwwN57Zv2Ww1bSsi3cIY cYlHST3lihNS0nu6nAxiMe0uAW3Ry8renk7dZCFn7IcMudspaP7aWK4mNm+Ijz/f9nPo JfxYvielH0mKscVl/55MsoxGQ2TCblnrq9wHpyp6f7gtuh/JY0kgZAb2ex1XeVi3eMuX uxtA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=SLTNf0SS7o1z27S8kOScpx5OJfo8nq7gTQ5t9jusAGI=; fh=Bgkqap4D3qJN5MCgBrJh2im88qErDxDVZSIt/8ae/4w=; b=hb1lIwddHXwiG03uhxlY0hu8ChHml5g2Pjy26Q5Z2HCRc0/dEOmsZy5FO51pgAsDuy rJOLKpZ5IFljxxkevMwxmuruRt56xVGqg8xTaXdfSqtjAQiMGT91E7ctMmKDB84p/LGU VE7DVpAz4D05N/Rr/j9VJ/sdDxeNatwenXNBEh2nFH6w0TMKk2Ve7FXs+5qh/ET2Ku/S MZ9rxaAike3eWPo6e2N2daapBkq/5SL5/FCaK4pDna6hb5pAnsRbRTykLcxoo5Oy3x2g pf77/cIH3uuKiL+0Y3SKWurm/OsKYazj5qirMD0tihxgdmncHOFwlpmmS2433uODJiw7 Kr/w==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1772194022; x=1772798822; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=SLTNf0SS7o1z27S8kOScpx5OJfo8nq7gTQ5t9jusAGI=; b=ROycdhWp5lWjM1/bGFZ4AR9x1LIJNoA6+rFPF9wtOFmX8S6z3oZjKdPTd0FekYLYyE yOLmETTP1UmKM5Q37/Cr4XtlTlIlyTz4JNnbLZLOZ2IwyD/uL+MICJ3Odpnfqmln/pDx WwPUR47YEshAAPO9LQFq5Yg0nappwJkTl4D0par7vGFTCu/GGIRjzysm7fbrc1TdSRd2 EJHA+XP9Zk+hU+K8GDhhC3n9pvP7AkQ1XQINw4lqC3PHaT3POrPD80pendLUwCXMBbuf Vj5YB2ansk/2+uXFKqlCs25nj5QyPkzR02LDkCW12RnGmHXfmMW5vt04KEgf4erRJhCi 6SHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772194022; x=1772798822; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=SLTNf0SS7o1z27S8kOScpx5OJfo8nq7gTQ5t9jusAGI=; b=sty0trIRgIPvKJptL7KkDtPm2YSwgRo/sq47LCX4744VBlLciwcklnaikPAKbCU12r N3aKcF721FWaPNKbQNaTMptlPJiHu9c1FbseZYEQHEm34xYUcRJ8p/YZyHuvlZy7MoAe 6+I+Fq/HBKJQc5MHjweefKj2yuHc7hnREvVqRnhEdjm5ZUjtD5GRstx0eSYdGLnmMkfj WBgZBEpAt9SHA1mGFUXqvUsl3x78nUajAOWME8Acl8avb6XsRJCFtTDx3Izx5cyAE39k FxXLZgeqxDmYEA2q/VWBNEGZRvOHwkOfVP0fCQJ0iKQ9ZgVki8jNlrUUg4Vk9M+YNYXn NgXA== X-Forwarded-Encrypted: i=1; AJvYcCWB9MADR1NwKvuxeMoTchGjsFFfQB26+7KU8wMDqtIFqMbGJr4XrFSL5vKU45zcuf5yff8cSraSvLVFwdGn@postgresql.org X-Gm-Message-State: AOJu0YzTTb/ZvrcZXmQVl12+8/3FLORTgiRfMj6rDQbUbh6sSzBVOyG7 7Y8HViObw6DxRuT905ioE92n9pSDSWrRiuLFw44y73NB9B42Hzy9jvvza7wl++t7cMtqcbjLla2 2XNl5ELwKPTjIFNYpTkmS7OuNbCAgzDo= X-Gm-Gg: ATEYQzyM9Su1qTeDn3dtFV+EGIS4jUdnLyLfm6SGeJHcDbFXqBClCoLWkkJt22HT1xl okffOXX1A9oSsmu69B280m2cBm0MAAEcBoM5wp2hOOM6zGla9XlXEYCMvPU5v8v12QVXHXrS/13 A7bBUCnlt+rkwoEWn+sLR0j2LDEhtlxK8nykijvGltPFLbTZbP0/BIu/Ttqtmr/aPQKGRQJ6d6e a8HyQXWDu4s1fG0jPaBahjmESB9k0DZm2r+VXDoS4mX3AivQp8Y8AkuycaP3ej0T+o68HoqvHnJ Z0+xzgt2vF9NBEpcCBP5Ea0qQvF5IIwBZIvU3p/3B9+qfynhpMiRrPdfsHQrOsTyubIp4S5kT0E E9iC0 X-Received: by 2002:a2e:be9b:0:b0:389:fcc6:4923 with SMTP id 38308e7fff4ca-389ff36c676mr18998681fa.36.1772194022099; Fri, 27 Feb 2026 04:07:02 -0800 (PST) MIME-Version: 1.0 References: <11A59C0C-A8C8-4642-8493-292D5DF8311D@yandex-team.ru> In-Reply-To: From: Matthias van de Meent Date: Fri, 27 Feb 2026 13:06:50 +0100 X-Gm-Features: AaiRm51HyyXdDXjyaDw1vpdO2f92Dr_qb-Clagtm7F_0PtFCkdZ2nnLA6Bkj_m4 Message-ID: Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat To: Madhav Madhusoodanan Cc: Kirk Wolak , Andrey Borodin , pgsql-hackers , Nikolay Samokhvalov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 26 Feb 2026 at 22:03, Madhav Madhusoodanan wrote: > > On Tue, Aug 26, 2025 at 2:11=E2=80=AFPM Kirk Wolak wro= te: > > I do have a question, one of the IDEAS we discussed was to ADD a new pa= ge that combined the 2 pages. > > Would the flow then be as follows? Please correct me if I'm wrong: > Start: Parent page P, with adjacent child pages A -> B -> C -> D. > Pages B and C are sparse enough and are about to be merged. > 1: Acquire lock on pages B and C > 2: Create a new page N, which copies the tuples in pages B and C > 3: Acquire lock on parent page P, update the separator keys in P, > release lock on P > 4: Update pointers such that pages link like so: A -> N -> D > 5: Release lock on pages B and C That is one part of the picture (the merging part), but it's missing a lot of details: - How do concurrent workloads (index scans, backwards index scans, index insertions) detect and recover from concurrent merges when they step through pages? - Do you have a theory or proof of correctness for the above? - Can this scheme be implemented without adding significant overhead to current workloads that don't benefit significantly from the new feature? One example for a problem with the given flow: where (and how) do you update the sibling pointers in pages A (to N from B) and D (to N from C)? You haven't explicitly locked pages A and D by the time you get to step 4, even though locking pages is critical to guarantee correct and safe operations. Simply locking them in step 4 isn't sufficient, as another process may have locked page A in preparation for a split, which would then be waiting to get a lock on page B. If you've already locked page B before locking page A, you'll have a deadlock. Additionally, how does this work when you find out in step 3 that your pages B and C each are linked to from a different parent page? You'd have to update both parent pages, but that would also require locking more than just the one parent page per level that currently gets locked during page deletion. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)