Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJ8OM-00BwZq-39 for pgsql-hackers@arkaria.postgresql.org; Wed, 12 Nov 2025 10:47:57 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vJ8OK-00Aoag-2R for pgsql-hackers@arkaria.postgresql.org; Wed, 12 Nov 2025 10:47:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJ8OK-00AoaY-1W for pgsql-hackers@lists.postgresql.org; Wed, 12 Nov 2025 10:47:56 +0000 Received: from mail-pl1-x62a.google.com ([2607:f8b0:4864:20::62a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vJ8OH-006iDT-29 for pgsql-hackers@postgresql.org; Wed, 12 Nov 2025 10:47:55 +0000 Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-29806bd4776so732335ad.0 for ; Wed, 12 Nov 2025 02:47:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762944473; x=1763549273; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=C4Dk6HRNI0lI9wzpe6s8dFOFNRDWpxtnyqw07Bt1DJg=; b=WsKJLB6OJdBGUDBsq9R9QqhWqBVoCDlr7x+42H3BAYiMzJnbKZ836Ch1e1aO1m4fIJ 1PgyuBCIUkS4x60YeGp6fzyf1nkutfMlQL6jr52aNCEWwoCQ+hn5r/IwXEnFPHFaId1k E+dU5VWClJi1xaZR/j4XmcR1/u3up29d4ssD8J8XMXPgBuaOMevDf4vNceuWMt9PJftr wbCpNS789SZGsFXV4OkISgYXwWrweKlTj8fQBpBxTBpb9ZIXQ8+rDivD/dD9h8+rArD9 xyF6xLlHHrqUZ4qSe2ADg35kqJaU6h6BrFXbD7pcyWKverDMJ95wRdAzxAJL9F1CYWxk 48vw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762944473; x=1763549273; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=C4Dk6HRNI0lI9wzpe6s8dFOFNRDWpxtnyqw07Bt1DJg=; b=FlQWUIWouo8cjf6vFBjOY1u8D+nvrqeXur9lrlQblridUEFxz/cP7cAVZ5hFAOY9Eh l8ksmRCQkxp2JzZMYifF3uOEljHkTyVHar4yXSPeH8NlnK317DHDW0rzInVKf+BRo+cD ePBUz2TK82jlXhyAJFgXaQo1z61AqfOCozISZXTD09hN9bsoskS6YU1hcQ7VwqjSFv47 IviI76QzKNwg8e8GQuo/oiE2G9qCUSsSkpi0B8mn6sCglj0II8Ktlef/qn88+vrYV9Sh CXDWXQL9o/eIkXeJmpaEfBXwUr9+8A2yueiPeofc2qm4VApwT3DOdQa+hcvfGuTnB8fz GkoQ== X-Forwarded-Encrypted: i=1; AJvYcCX5hBtbA352BKpo1mqk1YNWqniE1bcgZGJaiWVXkXE+oHAVFVt8jX4Jfcw+olJOUGttPz1xjvvRWRL8MZ4W@postgresql.org X-Gm-Message-State: AOJu0YwVmBv7Feb/zD0XR9DnMskq4DT4O0pZHRXzgYYFrslCQiUoccHp M4W24evk6kdZ05O8dju5d7rziDop/QDqTC5UnXXD0ZxFMED8yUfztnezPYQAHi3TekQi0+4XBWP 5oneHmC2I9jReuBW45Wm1J1/F5+Wh3e0= X-Gm-Gg: ASbGncvJ0jCrnbCIv3fvT4dGHlD5I86Twkv0tXy6Ytjvcuj2ZxzBW+EGyM+ngmI9tf5 NgF4sM+fjzN2GxC4fHJDbLP37mOGB4AbBnuW/RYHuu7bQLH9b9Dfl0RTClTPBOH0ce1UhGPxDDm pWfcqHOuW1RPmb4LTwZ6W/XmNUAuUScnNbKWoHjukMUvI+xfROhGMx7R0DyWN+EczrkM8HOVlvl OsF6jPfpV8JBYGLC0gliNN+U5Ul+ObGbHaUJwrIdN1xngCgrP9EJGhRo+Qot++pQI73y4fPa9hC DXVTwoKhGxds/voCuhDqGAitG7xW1HsKXKO/QpFPft5xa3P2SyXc5vf8WGY4Qw8/x1U5xbl97mw = X-Google-Smtp-Source: AGHT+IHDg5jKqxoVWlGo5NFb94PThUMDLIb4Y5oLBOY+JJsm5WXUJ2m/+70dFHgnIDPU53AfeMiZu1oUhBuDu2M+2Gw= X-Received: by 2002:a17:903:41d1:b0:290:ad79:c617 with SMTP id d9443c01a7336-2984ed34d75mr20243895ad.1.1762944472659; Wed, 12 Nov 2025 02:47:52 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Thomas Munro Date: Wed, 12 Nov 2025 23:47:16 +1300 X-Gm-Features: AWmQ_bl0BpvGb9fpU52bDS0tCa8wLf_8MkErCWQ3sVO9cwcz2XvfVLRBfRqtt3M Message-ID: Subject: Re: Trying out read streams in pgvector (an extension) To: Nazir Bilal Yavuz Cc: Melanie Plageman , "Jonathan S. Katz" , pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Nov 12, 2025 at 9:04=E2=80=AFPM Nazir Bilal Yavuz wrote: > On Wed, 12 Nov 2025 at 07:12, Thomas Munro wrote= : > 0002: > > + /* End-of-stream. */ > + buf =3D read_stream_next_buffer(stream, NULL); > + Assert(buf =3D=3D InvalidBuffer); > + buf =3D read_stream_next_buffer(stream, NULL); > + Assert(buf =3D=3D InvalidBuffer); > > I noticed there are two 'read_stream_next_buffer()' and > 'InvalidBuffer' checks. Does having both provide any additional > validation? I tried removing one of them, and the test still passed. I wanted to demonstrate that this is a state that the stream is stuck in until you call _resume(). I suppose an alternative design would be that _next_buffer() returns InvalidBuffer only once (=3D the block number callback returns InvalidBlock once) and then automatically resumes (=3D it restores the distance) and then you can call read_stream_next_buffer() again (=3D the block number callback will be called again to fill the stream up with new buffers before waiting for the first one to be ready to give to you if it isn't already). That would have the advantage of not requiring a new function at all and make the patch even shorter, but I don't know, I guess I thought that would be a bit more fragile in some way, less explicit. Hmm, would it actually be better if it worked like that? > Also, there is one thing I wanted to clarify about the > 'read_stream_resume()'. If 'read_stream_next_buffer()' returns an > 'InvalidBuffer', then we can use 'read_stream_resume()' alone because > we know that we already consumed all buffers in the stream. For the > rest, we need to use 'read_stream_resume()' together with the > 'read_stream_reset()', right? For the rest, there would be no need to call read_stream_resume(). To recap the uses of read_stream_reset(): the original purpose was to release any buffers (pins) that the stream is holding internally because of look-ahead, and put it back to its original state, ready to be reused. It is called (1) by read_stream_end() as an implementation detail (eg if a LIMIT or anything else except ERROR/FATAL ends your query early, we need to unpin buffers queued in the stream before we pfree it), (2) explicitly by rescans, (3) in hypothetical code I thought about that would want to stream blocks speculatively and then change its mind after predicting incorrectly (I had a few patches like that, abandoned for now), and then (4) in this case, by places that temporarily ran out of block numbers, but will have some more again soon and want to resume the stream. It was already debatable whether heuristic state like lookahead distance should survive acoss rescans, or in other words, whether the expected I/O requirements of the previous scan are a useful prediction of the requirements of the next scan, but the answer is clearer in case (4), hence the desire to find a way to separate that use case from the others.