public inbox for [email protected]  
help / color / mirror / Atom feed
From: Melanie Plageman <[email protected]>
To: Nazir Bilal Yavuz <[email protected]>
Cc: Thomas Munro <[email protected]>
Cc: Jonathan S. Katz <[email protected]>
Cc: pgsql-hackers <[email protected]>
Subject: Re: Trying out read streams in pgvector (an extension)
Date: Thu, 20 Nov 2025 10:27:59 -0500
Message-ID: <CAAKRu_Zwj83zCJhahhMO578-+JdfTbqMV_ktxr-XjiE8BHLo9g@mail.gmail.com> (raw)
In-Reply-To: <CAN55FZ0tgjF1beJSRXw3rgkbzwPZ7ngChJkPZm9aJkPuaF=dmg@mail.gmail.com>
References: <CA+hUKGJ_7NKd46nx1wbyXWriuZSNzsTfm+rhEuvU6nxZi3-KVw@mail.gmail.com>
	<[email protected]>
	<CA+hUKG+x2BcqWzBC77cN0ewhzMF0kYhC6c4G_T2gJLPbqYQ6Ow@mail.gmail.com>
	<CA+hUKGL-3mBtkA9RTbLFHuSS5cviuv0ko7nBhCg9KM7Q-GSEkw@mail.gmail.com>
	<CAAKRu_ZVxzwRRbxedgb_LtkFaGf78XAbTO9uExvadV2DzaE=Jg@mail.gmail.com>
	<CA+hUKG+zLmkD9zus=JOjjC+j5p9R1+CSXNZgd5=exZ01ZTaKoA@mail.gmail.com>
	<CA+hUKGJx6FNqzsxfSOGH0nJZJq1MBc+t7NBKtAmy6zj4HD86tA@mail.gmail.com>
	<CAN55FZ16TEhgYbK=qSEbkO8utz+u232NksCEmJMC1G4iZvnbvA@mail.gmail.com>
	<CA+hUKGL7-Dx8KiUo=G91Y5tfFpwDUFFQJ6=9D8Gr1n=DZxGh+w@mail.gmail.com>
	<CAAKRu_ZGhnWZXOyEyZ2r47g-F7U8asMRA6U8YZw3h=2rR=m_hQ@mail.gmail.com>
	<CAN55FZ0tgjF1beJSRXw3rgkbzwPZ7ngChJkPZm9aJkPuaF=dmg@mail.gmail.com>

On Wed, Nov 19, 2025 at 2:28 AM Nazir Bilal Yavuz <[email protected]> wrote:
>
> > To make sure 1) distance isn't reset to a resume_distance from
> > read_stream_begin_relation() and 2) unexpected buffers aren't returned
> > from the read stream, we could error out in read_stream_resume() if
> > pinned_buffers > 0. And in read_stream_reset(), we would save distance
> > in resume_distance before clearing distance. That would allow calling
> > read_stream_resume() either if you called read_stream_reset() or if
> > you exhausted the stream yourself. See rough attached patch for a
> > sketch of this.
>
> This looks correct to me. What do you think about using an assert
> instead of erroring out?

I'm not totally opposed to this. My rationale for making it an error
is that the developer could have test cases where all the buffers are
consumed but the code is written such that that won't always happen.
Then if a real production query doesn't consume all the buffers, it
could return wrong results (I think). That will mean the user can't
complete their query until the extension author releases a new version
of their code. But I'm not sure what the right answer is here.

- Melanie





view thread (18+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Trying out read streams in pgvector (an extension)
  In-Reply-To: <CAAKRu_Zwj83zCJhahhMO578-+JdfTbqMV_ktxr-XjiE8BHLo9g@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox