Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1um4ZJ-009VoT-ID for pgsql-bugs@arkaria.postgresql.org; Wed, 13 Aug 2025 06:02:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1um4ZH-00CVkn-Da for pgsql-bugs@arkaria.postgresql.org; Wed, 13 Aug 2025 06:02:35 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1um4ZH-00CVkd-4r for pgsql-bugs@lists.postgresql.org; Wed, 13 Aug 2025 06:02:35 +0000 Received: from mail-pl1-x62c.google.com ([2607:f8b0:4864:20::62c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1um4ZF-000U14-0I for pgsql-bugs@lists.postgresql.org; Wed, 13 Aug 2025 06:02:35 +0000 Received: by mail-pl1-x62c.google.com with SMTP id d9443c01a7336-23ffa774f00so6232115ad.1 for ; Tue, 12 Aug 2025 23:02:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1755064951; x=1755669751; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pu4iw+xlW+xFEnOn5NG9I/44qVjP+za6HwR3CP9ywBU=; b=JYG1/8vs4O0MURs9VVp0j2tOX+VhT0dJNAzqpuOx8B6WlKPZc5l8Vs1YWtVybDQypC pcxsE/M+vQK0AAvfCUastFf2S6GXoIB9qEQ2mAk7LiTXesA8FpPXGrBI+xdI4YDF4xSA jCwPx/tv1nmuMvEXSNOmhK1A6ZJq0b2bZwOstYFrQJPhfi3Ck3jlCVGkO3LgOn3NCFwV OFs5PvWp04y4zqyOtOFTRonNpgWAAhBfmMwHB328ZC5iUkXukkBUmYaSLQRcEPvgwYA+ +wz+Mmm+smEhC8eG6Z/Ds/zfBENpxAZNPilEd2qLRXfo8ynRExRKpqSXUWclFRINnB0r MxGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755064951; x=1755669751; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pu4iw+xlW+xFEnOn5NG9I/44qVjP+za6HwR3CP9ywBU=; b=WwRL/8qXiSXkzHZ1G2fjM5M2PjaewQMNR05nGrofsyvybNbSXVmeo+M/wfbiXWTMCQ oU6uffb8774dTN+Rcfo69gPGfLddmkxoYIVf1B+E1aB3+ruGbYVlchuKq1l9DRKFXtxH 8Sa4tcgEXHCOFgZA2srsGqRDGBzZy3FOKCwJV4MP2Uqu0T71mDmc68Zr4Ov6rV24970S tBqCNXZ6ZIiOFD4130yXLBgdytCUJE15YUGUlwOMDKxZ+R1/xcXam5q3HO/2wQ1TaIVl /F7dcC9QiMlaz206GwgXZaQWmE6cD6unexL/gBb/zL0ryuHh1HNutHs8Tg9nceRzoEvf m/pg== X-Forwarded-Encrypted: i=1; AJvYcCUkvmAFO9yULCTUuN/hvGSi++Db2wq/6SLS588JLSH9EHt3Sq8SHJx5fRTSwo0vCLgrWsgzLeEA+yny@lists.postgresql.org X-Gm-Message-State: AOJu0YzxT0IEOHWZ+jUerifVD4xhncdSSFN9hXlAwDK/5QXS9DXLARDO 1v1IE+qALafldHX4G8lyq6YtJ2YjlTGaH25Fg1jOMvEwd8ZXIlWI2hzEK/z9J1uWHfe7wJlxM+n wS0Hj6WO8GbpbmKcYRe2wgyR3i6fxeog= X-Gm-Gg: ASbGncuZNvCOb3JZCh3UEzPpe6tZGrRqS7Ddka/XFXZ1nwDDUAUWB2h6G1mJtYP3rod rj9RV+ZRsE2MdWJtUQkR0FjV9We/bf2Nc0IUgCf1uxmARIAVddv3UYrgb47kZfS9Xmz33KjVJqS Wg0DmuKGkBUQz/zAOMtd6ywH/RwOLhA6l4stAMltrW9mCHHj3+47M24MCl45BaeVQ4jL9wRbIqR oqtfrr9x3kX8eM8xMeeKcr/s9RLyCsuGXr30G3c X-Google-Smtp-Source: AGHT+IH8wrvSqIcE+WY8/x11VMCKwjifZ4Fv0QYwVGHohw30z6/ZIgKtwoQgtH8Q3CUbIENqoJ/qYpBdy7ExDA81WQs= X-Received: by 2002:a17:903:1c8:b0:235:737:7a8 with SMTP id d9443c01a7336-2430d126481mr13627115ad.3.1755064950831; Tue, 12 Aug 2025 23:02:30 -0700 (PDT) MIME-Version: 1.0 References: <19006-80fcaaf69000377e@postgresql.org> In-Reply-To: From: Thomas Munro Date: Wed, 13 Aug 2025 18:01:54 +1200 X-Gm-Features: Ac12FXzU0OYcIaWF69xwLUB5JlJkIGpFL9xIztQACU652-tShlxqw0AW0bz79kI Message-ID: Subject: Re: BUG #19006: Assert(BufferIsPinned) in BufferGetBlockNumber() is triggered for forwarded buffer To: Xuneng Zhou Cc: exclusion@gmail.com, pgsql-bugs@lists.postgresql.org, Michael Paquier , Tom Lane , nathandbossart@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Aug 13, 2025 at 5:29=E2=80=AFPM Xuneng Zhou = wrote: > > 1. The leading block is BM_VALID, so we choose to give it to you > > immediately and not look further (we could look further and return > > more than one consecutive BM_VALID block at a time, but this isn't > > implemented) > > I am curious about why this isn't implemented. It looks helpful. > Is there any blocking issue or trade-off for not doing so? stream->distance tends to be 1 for well cached data (it grows when IO is needed and it is worth looking far ahead, and shrinks when data is found in cache). So we won't currently have a distance > 1 for well cached data, so we wouldn't benefit much. That may not always be true: if the buffer mapping table is changed to a tree data structure (like the way kernels manage virtual memory pages associated with a file or other VM object), then we might have an incentive to look up lots of cached blocks at the same time, and then we might want to change the distance tuning algorithm, and then we might want StartReadBuffers() to be able to return more than one cached block at a time. We've focused mainly on I/O so far, and just tried not to make the well cached cases worse. > The format of this part is not aligned well in gmail, so I copy it into v= s code. > Is this layout right? Yes. > I found second illustration somewhat hard to follow, especially > the "do nothing" trick and the movement of next_buffef_index in the secon= d queue. > Maybe I need to read the corresponding code. Suppose you have pending_read_blocknum =3D 100, pending_read_nblock =3D 5, next_buffer_index =3D 200. Now you decide to start that read, because the next block the caller wants can't be combined, so you call StartReadBuffers(blocknum =3D 100, *nblocks =3D 4, buffers =3D &buffers[200]). StartReadBuffers() pins 5 buffers and starts an IO, but finds a reason to split it at size 2. It sets *nblocks to 2, and returns. Now read_stream.c adds 2 to pending_read_blocknum, subtracts 2 from pending_read_nblocks, so that it represents the continuation of the previous read. It also adds 2 to next_buffer_index with modulo arithmetic (so it wraps around the queue), because that is the location of the buffers for the next read, and sets forwarded_buffers to 3, because there are 3 buffers already pinned (in that patch ^ it is renamed to pending_read_npinned). The three buffers themselves are already in the queue at the right position for StartReadBuffers() to receive them and know that it doesn't have to pin them again. What I was referring to with "doing nothing" is the way that if you keep sliding StartReadBuffers() call along the queue, the extra buffers it spits out become its input next time, without having to be copied anywhere. In that picture there are two forwarded buffers, labeled 9 and A.