Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vzveS-001RxW-1J for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Mar 2026 11:53:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vzveQ-002ssC-1r for pgsql-hackers@arkaria.postgresql.org; Tue, 10 Mar 2026 11:53:27 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vzveQ-002ss4-0y for pgsql-hackers@lists.postgresql.org; Tue, 10 Mar 2026 11:53:26 +0000 Received: from mail-lf1-x131.google.com ([2a00:1450:4864:20::131]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vzveO-00000001xiH-1KTE for pgsql-hackers@postgresql.org; Tue, 10 Mar 2026 11:53:26 +0000 Received: by mail-lf1-x131.google.com with SMTP id 2adb3069b0e04-5a140efd2d5so4367113e87.2 for ; Tue, 10 Mar 2026 04:53:24 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773143603; cv=none; d=google.com; s=arc-20240605; b=BxaaGe4a574Xvh+9hikoOqFwTL6YZbagPBob1qPsmIngPYbJ0t6WnL7Qre0L1rDwHf YSSnAdbBMhKAL9Aqil9Aw4sXUsbEWXidiwdl5Q9sG4i7alUqr5kF95e46UlgbV6dSe4J Tv0MphpaMOM13ocaeKEp/+AoaU7qJqyyH3Wld3mVUea2VcqUBZYkfak0morP8MZr0Mir al4QL/N/q1os4rR1rrAsnmV3y3qy6cWmXVSymYU0KOFrtZz4zx6V7tGveS3A1Cl7i5eI TlXM3UpkNKV9sACZ9+QPqO8xrvXpAJQ2NQeYp6talOoAPSLP6X0koxzp9j9MpAOONNpc 2trA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=WLFrmgt3INQIAnAw0uBpi+e3S/P7gL+Quhjjyby8t6c=; fh=O9BjM5LIV8TsvZG17PQ7xovNhLVVj8SyCfERkNAhq38=; b=ek3Pwli7PhxqQrS7vm7fPTotub0yAAHXD2RVH5wS8fn8PA7ABZZ51mEWhLiexOA4qJ 5wVU/8KCXE1fGndJjGO1jS3yO2ArcCYpyccEHqW+x6SKzgzIIL+rOxIbr6My4QdngGZ0 E8CiDEVvgYTlKIzwvj4TS3x68ml0AUtmML8d9YHIsV2K5dptLxmilF9K8G5VC8R/T/CG B2+Lsjxbd9DhM3l1ATDvuwCNhiXn7rIG3mo86s/8JAG9hdTWK1gerz/XkwO1Vn8vHBU9 pODzff3ODoKOawuVAfAk0h5IBkBq0fsjujwnjRzKjTyLhhBAXOCNgSefn5qtjYcM0/pM 1hPQ==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773143603; x=1773748403; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WLFrmgt3INQIAnAw0uBpi+e3S/P7gL+Quhjjyby8t6c=; b=YZKJCwDOLxScmtibwmOKVe7mM+cyi1oxbv1EZP2wt++b/qw7vCpus1gI39NOd9Rfq7 3WYmLuNlkXYidM9kVqY1kFF2FGmGDR4/eR6UZCqUM/at3aPspFw2khUzz9DZ6Vt9ZTtx 4RwxXZDwflCWekxCTATqIIJx1Ie0m921mh8j4wB2KLxKXoWJ/v0FDWgcuWd7FEpyUdV8 WuI2HN0zNFJJE+lz0DCDlnmnqe+rrhwooWgjGCya2i0Csqr5ntYtvHNJJSqICd6FodTb umnQEbUf2h21CximJnmPxV9AwDgA0azGZXhAyrDvE72s2DyccD/eLa6oZ7hsS/O8KF2j bmVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1773143603; x=1773748403; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=WLFrmgt3INQIAnAw0uBpi+e3S/P7gL+Quhjjyby8t6c=; b=PqAI86uM3fqPVgXELj7u3bcYygTKnx89UlC9ApCL1GdpyMola6279pOI10cySnjX2+ K3XSgdl9yzSmBEdrC0pa2JPGEyTigGsykAA50zHVVsUGRkYRx7V0THX4ykgQrESrucVe mj6pPXX/tKU1ZJqg65WcAo9PLag6fVJcSHiiwHbgG197BAiPB0/v7edQdsrm8uw5vxGV eTYmbwe3jZ8aoYLpqOyWLXeab08TWMJpJySJ23wV5NRD5KZALa2rFFrY6QoEMbf135SA JARXLD0A2rup5fnvZG//qGLi7Gkj5qLodUEK+W3K4yD8lHVuvBjh+SNh/+jqq9xETyJM 50jA== X-Forwarded-Encrypted: i=1; AJvYcCVWiv+qDQ3rwpUl74l44vu/zR02k8+dOjK234Ukc3yPERduFzgq05ABGMwOz7Zb+Bwk/nSmK/4pKwghB4r5@postgresql.org X-Gm-Message-State: AOJu0Ywa6V4R/cxPJ/5FAJzFNY7BxrZCreosrJxdFVBdPnsvMUYl/Xe7 C5z+jgywNzC2gUrQRo+qf3XQSEITlG2NohF3PTZs9svfoYwkB5o1O4VWAcItwPwRknTGkjuTu9k RExjQGqtQJ/0m54+9jvax3km5y94Edz0= X-Gm-Gg: ATEYQzxLIUDKCHQr8rveaCiGgrDWH3AQ1gmlnTFD5Anfp154Ij4gpxYfIamOyGx2cD9 F0voWzWOtgBME5cmQ0n6tPEd0J62ySiiOWP/Iqk8DThm8Bm0GmGS+VcRHaGtWQLgqqbju2DzfEJ OdukhDKXSSmujYECIWExFtb3vAGSNIoG0/QO1hqVfYDaiANyPQeoJy5llw2xmZBawQfAGvpjR1R 1qsX35jCdo2+FRJfna1WlAkilVhvMHEbFRBzuESGmATqUwK/oTdcIRpX2W2PChRO6wlLcd4cgi8 NA/APmAnlGUSRPQVBVa6TsLaymJFRqqj9TdPqLc5CFh1rgB5W75kKEg44NDWDA5M2/BJ6NWa3eW 4/jVPR6X6xXsDnOA= X-Received: by 2002:a05:6512:61c3:20b0:59e:5c8f:a5 with SMTP id 2adb3069b0e04-5a13cabf330mr3543385e87.4.1773143603141; Tue, 10 Mar 2026 04:53:23 -0700 (PDT) MIME-Version: 1.0 References: <11A59C0C-A8C8-4642-8493-292D5DF8311D@yandex-team.ru> In-Reply-To: From: Matthias van de Meent Date: Tue, 10 Mar 2026 12:53:10 +0100 X-Gm-Features: AaiRm51O60eHYlIorqMh3VdGzHJryDnuJsVI8wWFtVghQn1p0msB_ABldpQWQbA Message-ID: Subject: Re: [WiP] B-tree page merge during vacuum to reduce index bloat To: Madhav Madhusoodanan Cc: Kirk Wolak , Andrey Borodin , pgsql-hackers , Nikolay Samokhvalov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, 10 Mar 2026 at 06:09, Madhav Madhusoodanan wrote: > > On Fri, Mar 6, 2026 at 2:15=E2=80=AFAM Madhav Madhusoodanan > wrote: > > > > Some parts of the merge flow that I am coming up with are as follows > > (assuming tuples from index page B are migrated rightwards into C): > > > > 1. Leaving B's tuples as it is even after merge, to remove the > > possible risk of scans "skipping over" tuples. Essentially, the tuples > > then would be "copied" into C. > > 2. Marking pages B and C with flags similar to INCOMPLETE_SPLIT (say, > > MERGE_SRC and MERGE_DEST respectively) before the actual merge > > process, then marking the pages with another flag upon completion > > (MERGE_COMPLETE) so that other processes can handle transient merge > > states. > > 3. For example, scans that reach page B post-merge (MERGE_SRC + > > MERGE_COMPLETE) would be made to skip to the page to its right. > > 4. Updating VACUUM to handle post-merge cleanup (to remove pages such a= s B). > > > > I was going through the source code to understand whether the > aforementioned direction of changes would be reasonable. > > I was observing `BTPageOpaqueData.btpo_flags` [0] which is a uint16, > but only 9 bits are used. > > Would using a couple bits of the same for this purpose be reasonable? > Or are they being reserved for future functionality? They're exclusively for btree code's use; extensions (*) must not add to (or change the meaning of) those bits, lest they create a forward incompatibility with core PostgreSQL btree code in newer major versions; it would cause corrupted binary upgraded databases. But patches against core btree code can use those bits, because forward compatibility is less of an issue there - we don't really support binary upgrades manually patched systems, especially if they have incompatible on-disk data. (*): I'm skeptical about whether you could make btree scans handle concurrently merged pages, when that merging is implemented as extension and the btree code doesn't know about merges. Kind regards, Matthias van de Meent Databricks (https://www.databricks.com)