Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vLT5B-000b46-34 for pgsql-hackers@arkaria.postgresql.org; Tue, 18 Nov 2025 21:17:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vLT5A-008yXl-1l for pgsql-hackers@arkaria.postgresql.org; Tue, 18 Nov 2025 21:17:48 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vLT5A-008yXR-0Q for pgsql-hackers@lists.postgresql.org; Tue, 18 Nov 2025 21:17:48 +0000 Received: from mail-ed1-x530.google.com ([2a00:1450:4864:20::530]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vLT54-000EGS-11 for pgsql-hackers@postgresql.org; Tue, 18 Nov 2025 21:17:47 +0000 Received: by mail-ed1-x530.google.com with SMTP id 4fb4d7f45d1cf-6418738efa0so10562488a12.1 for ; Tue, 18 Nov 2025 13:17:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763500661; x=1764105461; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+0hpV6xIvt+nMPgPlkRkQrSEEE9s3ArteurHer2lk7Q=; b=EjEBzCtCZXbVDRXdahrZuxBn8SSyS8UQjWQpUtjJHHLgUWwprUWGIWslSE8DJOk1Yt U96p4jnPOpPITEsPYSJkQS2q3zWurNMEyOiHA11QqfEqWzHHlOT9hSpBu81y1fRl5kFS 7gg4DZxuldoT7Rahl18LtnirhNSdQibFH1wlwji6i8ELdhQ5f138NBkM2rZ7YyFAe9tv uHOzNDF3RdqLcJxDBU73UVDxfhzyA09EHvZtLmXBiVd8Wo+0dUAVXSObU2jRcuAJNKhT S8GlDKKC83nmt+qjalAiDpxxGgjWDzCyItZ1NmOZkbAJHAKZkfaMYK16+cboMzGU94GM 5qZA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763500661; x=1764105461; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=+0hpV6xIvt+nMPgPlkRkQrSEEE9s3ArteurHer2lk7Q=; b=JLIv5Rxen5NArTUkNY3BmuHUV4H9jsmwpo3iS/DhkMTx1Of/4duq0xc9oYfQ0zZDfF sqB6O15NL2qdpt5rlovmROxKDFCJQTx3+yG1HqOyMiEpD6gPKTVpzAWzFMjt1gALtmXP PVc2GSLQSg8RuvA7NGhv13O1EVGKkNeHOXBi1yJF/LW6lt/2d0accW2veDkz22K6JIG4 LOR0RXTAGpl7Vub5RQylrdtvS4L876ifrLlTdWZW9hqNEbXFN3IkGxYlVrH/mxIFP/fb HVwk0PWSmZEWCzOG095tR9CUa0GieadgxMM67phsa1jF5a1YY20NZM8bt6G+/rv0eC64 xSlA== X-Forwarded-Encrypted: i=1; AJvYcCXQkRvVmzRDA+h3xlllQ2Jpjr4NGGDTYp75Uw0KAuI/WObKEY004KCwtTR2CVZ3OF73hmNecSQshLUAI8id@postgresql.org X-Gm-Message-State: AOJu0YwjtSRja0o67dQg7kMn1aM6IekE+pgezWEqw0vRBiUNHiD1ARB8 pEOjPCbHg7hpxFf8gbV7MpioURTGDt0JRTvzqP+j01b8mfB8n5X3Abvnhbn5YNTAeWaEZPoPaJd SuTS6exQp00dKMVlSjn8gP8ygTaPkFUk= X-Gm-Gg: ASbGnctww1bLMsZrM4g7B8a2XlkcBS0i15Dv4DEzt61yc80RuHWEEnfW4EkGoje5Grs vNd2KvIE4PMHDFZOA96UyG2gjv3lP+2XOYuILnZscW9FA0AMPfhiUEDI6U+9JljJ9LXThZRE0D6 0LS4TIRn7kKfQYustIlb9snJTAfaCHqlxzaIUofHeu7M7eadGEXHH9jHkeSb4meakvTEMHG2GZB WwM6RvE+S8ShW8ksfe4/KDGeReJqe9Xtxx9M5X9mOIT3Elwg5YaNwo26mckgosHRge2Qz3ExFd2 q5kQrR0= X-Google-Smtp-Source: AGHT+IHQLESCo8y56PRWmhYz0PhnWZZDP7BJlQuAmsxLhMgvhx9EuQECCufzhqDhEdJstaXne+677inmirWDozzolg0= X-Received: by 2002:a05:6402:210d:b0:643:9df:4993 with SMTP id 4fb4d7f45d1cf-64350deef70mr15901356a12.4.1763500661263; Tue, 18 Nov 2025 13:17:41 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Melanie Plageman Date: Tue, 18 Nov 2025 16:17:28 -0500 X-Gm-Features: AWmQ_blehr5_xIwP9krRxntpRTjbO_w7_w8XMagdiCW9HP_Z89aveOsI-RcOh2s Message-ID: Subject: Re: Trying out read streams in pgvector (an extension) To: Thomas Munro Cc: Nazir Bilal Yavuz , "Jonathan S. Katz" , pgsql-hackers Content-Type: multipart/mixed; boundary="0000000000007a046f0643e4fe43" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007a046f0643e4fe43 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Nov 12, 2025 at 5:47=E2=80=AFAM Thomas Munro wrote: > > I suppose an alternative design would be that _next_buffer() returns > InvalidBuffer only once (=3D the block number callback returns > InvalidBlock once) and then automatically resumes (=3D it restores the > distance) and then you can call read_stream_next_buffer() again (=3D the > block number callback will be called again to fill the stream up with > new buffers before waiting for the first one to be ready to give to > you if it isn't already). That would have the advantage of not > requiring a new function at all and make the patch even shorter, but I > don't know, I guess I thought that would be a bit more fragile in some > way, less explicit. Hmm, would it actually be better if it worked > like that? We discussed off-list and decided that changing existing functionality in an unexpected way is undesirable. So, it is better we stick with adding read_stream_resume. However, in talking about read_stream_resume() further, Thomas and I also thought of potential issues with it: If read_stream_resume() is called before the read stream user callback has ever returned InvalidBlockNumber, 1) The value of resume_distance will be the original value of distance from read_stream_begin_relation(). You don't want to reset the distance to that value. 2) There may be inflight or completed buffers that have yet to be yielded which will be returned the next time read_stream_next_buffer() is invoked. If the user resets the state the callback is using to return blocks and expects the next invocation of read_stream_next_buffer() to return buffers with those blocks, they will be disappointed. If we try to address this by requiring that stream->distance is 0 when read_stream_resume() is called, that won't work because while it is set to 0 when the callback returns InvalidBlockNumber, there may still be unreturned buffers in the stream. If the user wants to use read_stream_reset() to exhaust the stream before calling read_stream_resume(), read_stream_reset() sets stream->distance to 1 at the end, so read_stream_resume() couldn't detect if reset() was correctly called first or if the distance is > 0 because the stream is still in progress. To make sure 1) distance isn't reset to a resume_distance from read_stream_begin_relation() and 2) unexpected buffers aren't returned from the read stream, we could error out in read_stream_resume() if pinned_buffers > 0. And in read_stream_reset(), we would save distance in resume_distance before clearing distance. That would allow calling read_stream_resume() either if you called read_stream_reset() or if you exhausted the stream yourself. See rough attached patch for a sketch of this. It would be nicer if we could error out if read_stream_next_buffer() didn't return InvalidBuffer, but we can't do that if we want to allow calling read_stream_reset() followed by read_stream_resume() because distance won't be 0. I tried this with a modified pgvector with an hnsw read stream user and it seemed to work correctly. - Melanie --0000000000007a046f0643e4fe43 Content-Type: text/x-patch; charset="US-ASCII"; name="0001-resume.patch" Content-Disposition: attachment; filename="0001-resume.patch" Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_mi52m3ce0 RnJvbSBjZjQxZjU2YmEwZDhjN2YzYmY5MjI0MjAyMGY3NDJmMjAxY2YwOGQ2IE1vbiBTZXAgMTcg MDA6MDA6MDAgMjAwMQpGcm9tOiBNZWxhbmllIFBsYWdlbWFuIDxtZWxhbmllcGxhZ2VtYW5AZ21h aWwuY29tPgpEYXRlOiBUdWUsIDE4IE5vdiAyMDI1IDE0OjI0OjEyIC0wNTAwClN1YmplY3Q6IFtQ QVRDSF0gcmVzdW1lCgotLS0KIHNyYy9iYWNrZW5kL3N0b3JhZ2UvYWlvL3JlYWRfc3RyZWFtLmMg fCAxOSArKysrKysrKysrKysrKysrKysrCiBzcmMvaW5jbHVkZS9zdG9yYWdlL3JlYWRfc3RyZWFt LmggICAgIHwgIDEgKwogMiBmaWxlcyBjaGFuZ2VkLCAyMCBpbnNlcnRpb25zKCspCgpkaWZmIC0t Z2l0IGEvc3JjL2JhY2tlbmQvc3RvcmFnZS9haW8vcmVhZF9zdHJlYW0uYyBiL3NyYy9iYWNrZW5k L3N0b3JhZ2UvYWlvL3JlYWRfc3RyZWFtLmMKaW5kZXggMDMxZmRlOWY0Y2IuLjc1YmY5MmRjNjgz IDEwMDY0NAotLS0gYS9zcmMvYmFja2VuZC9zdG9yYWdlL2Fpby9yZWFkX3N0cmVhbS5jCisrKyBi L3NyYy9iYWNrZW5kL3N0b3JhZ2UvYWlvL3JlYWRfc3RyZWFtLmMKQEAgLTEwMCw2ICsxMDAsNyBA QCBzdHJ1Y3QgUmVhZFN0cmVhbQogCWludDE2CQlwaW5uZWRfYnVmZmVyczsKIAlpbnQxNgkJZGlz dGFuY2U7CiAJaW50MTYJCWluaXRpYWxpemVkX2J1ZmZlcnM7CisJaW50MTYJCXJlc3VtZV9kaXN0 YW5jZTsKIAlpbnQJCQlyZWFkX2J1ZmZlcnNfZmxhZ3M7CiAJYm9vbAkJc3luY19tb2RlOwkJLyog dXNpbmcgaW9fbWV0aG9kPXN5bmMgKi8KIAlib29sCQliYXRjaF9tb2RlOwkJLyogUkVBRF9TVFJF QU1fVVNFX0JBVENISU5HICovCkBAIC00NjQsNiArNDY1LDcgQEAgcmVhZF9zdHJlYW1fbG9va19h aGVhZChSZWFkU3RyZWFtICpzdHJlYW0pCiAJCWlmIChibG9ja251bSA9PSBJbnZhbGlkQmxvY2tO dW1iZXIpCiAJCXsKIAkJCS8qIEVuZCBvZiBzdHJlYW0uICovCisJCQlzdHJlYW0tPnJlc3VtZV9k aXN0YW5jZSA9IHN0cmVhbS0+ZGlzdGFuY2U7CiAJCQlzdHJlYW0tPmRpc3RhbmNlID0gMDsKIAkJ CWJyZWFrOwogCQl9CkBAIC03MTEsNiArNzEzLDcgQEAgcmVhZF9zdHJlYW1fYmVnaW5faW1wbChp bnQgZmxhZ3MsCiAJCXN0cmVhbS0+ZGlzdGFuY2UgPSBNaW4obWF4X3Bpbm5lZF9idWZmZXJzLCBz dHJlYW0tPmlvX2NvbWJpbmVfbGltaXQpOwogCWVsc2UKIAkJc3RyZWFtLT5kaXN0YW5jZSA9IDE7 CisJc3RyZWFtLT5yZXN1bWVfZGlzdGFuY2UgPSBzdHJlYW0tPmRpc3RhbmNlOwogCiAJLyoKIAkg KiBTaW5jZSB3ZSBhbHdheXMgYWNjZXNzIHRoZSBzYW1lIHJlbGF0aW9uLCB3ZSBjYW4gaW5pdGlh bGl6ZSBwYXJ0cyBvZgpAQCAtODYyLDYgKzg2NSw3IEBAIHJlYWRfc3RyZWFtX25leHRfYnVmZmVy KFJlYWRTdHJlYW0gKnN0cmVhbSwgdm9pZCAqKnBlcl9idWZmZXJfZGF0YSkKIAkJZWxzZQogCQl7 CiAJCQkvKiBObyBtb3JlIGJsb2NrcywgZW5kIG9mIHN0cmVhbS4gKi8KKwkJCXN0cmVhbS0+cmVz dW1lX2Rpc3RhbmNlID0gc3RyZWFtLT5kaXN0YW5jZTsKIAkJCXN0cmVhbS0+ZGlzdGFuY2UgPSAw OwogCQkJc3RyZWFtLT5vbGRlc3RfYnVmZmVyX2luZGV4ID0gc3RyZWFtLT5uZXh0X2J1ZmZlcl9p bmRleDsKIAkJCXN0cmVhbS0+cGlubmVkX2J1ZmZlcnMgPSAwOwpAQCAtMTAzNCw2ICsxMDM4LDE5 IEBAIHJlYWRfc3RyZWFtX25leHRfYmxvY2soUmVhZFN0cmVhbSAqc3RyZWFtLCBCdWZmZXJBY2Nl c3NTdHJhdGVneSAqc3RyYXRlZ3kpCiAJcmV0dXJuIHJlYWRfc3RyZWFtX2dldF9ibG9jayhzdHJl YW0sIE5VTEwpOwogfQogCisvKgorICogUmVzdW1lIGxvb2tpbmcgYWhlYWQgYWZ0ZXIgdGhlIGJs b2NrIG51bWJlciBjYWxsYmFjayByZXBvcnRlZCBlbmQtb2Ytc3RyZWFtLgorICogVGhpcyBpcyB1 c2VmdWwgZm9yIHN0cmVhbXMgb2Ygc2VsZi1yZWZlcmVudGlhbCBibG9ja3MsIGFmdGVyIGEgYnVm ZmVyIG5lZWRlZAorICogdG8gYmUgY29uc3VtZWQgYW5kIGV4YW1pbmVkIHRvIGZpbmQgbW9yZSBi bG9jayBudW1iZXJzLgorICovCit2b2lkCityZWFkX3N0cmVhbV9yZXN1bWUoUmVhZFN0cmVhbSAq c3RyZWFtKQoreworCWlmIChzdHJlYW0tPnBpbm5lZF9idWZmZXJzID4gMCkKKwkJZWxvZyhFUlJP UiwgInJlYWQgc3RyZWFtIG11c3QgYmUgZXhoYXVzdGVkIGJlZm9yZSByZXN1bWluZyIpOworCXN0 cmVhbS0+ZGlzdGFuY2UgPSBzdHJlYW0tPnJlc3VtZV9kaXN0YW5jZTsKK30KKwogLyoKICAqIFJl c2V0IGEgcmVhZCBzdHJlYW0gYnkgcmVsZWFzaW5nIGFueSBxdWV1ZWQgdXAgYnVmZmVycywgYWxs b3dpbmcgdGhlIHN0cmVhbQogICogdG8gYmUgdXNlZCBhZ2FpbiBmb3IgZGlmZmVyZW50IGJsb2Nr cy4gIFRoaXMgY2FuIGJlIHVzZWQgdG8gY2xlYXIgYW4KQEAgLTEwNDcsNiArMTA2NCw4IEBAIHJl YWRfc3RyZWFtX3Jlc2V0KFJlYWRTdHJlYW0gKnN0cmVhbSkKIAlCdWZmZXIJCWJ1ZmZlcjsKIAog CS8qIFN0b3AgbG9va2luZyBhaGVhZC4gKi8KKwlpZiAoc3RyZWFtLT5kaXN0YW5jZSA+IDApCisJ CXN0cmVhbS0+cmVzdW1lX2Rpc3RhbmNlID0gc3RyZWFtLT5kaXN0YW5jZTsKIAlzdHJlYW0tPmRp c3RhbmNlID0gMDsKIAogCS8qIEZvcmdldCBidWZmZXJlZCBibG9jayBudW1iZXIgYW5kIGZhc3Qg cGF0aCBzdGF0ZS4gKi8KZGlmZiAtLWdpdCBhL3NyYy9pbmNsdWRlL3N0b3JhZ2UvcmVhZF9zdHJl YW0uaCBiL3NyYy9pbmNsdWRlL3N0b3JhZ2UvcmVhZF9zdHJlYW0uaAppbmRleCA5YjBkNjUxNjFk MC4uZTI5YWM1MGZjOWUgMTAwNjQ0Ci0tLSBhL3NyYy9pbmNsdWRlL3N0b3JhZ2UvcmVhZF9zdHJl YW0uaAorKysgYi9zcmMvaW5jbHVkZS9zdG9yYWdlL3JlYWRfc3RyZWFtLmgKQEAgLTk5LDYgKzk5 LDcgQEAgZXh0ZXJuIFJlYWRTdHJlYW0gKnJlYWRfc3RyZWFtX2JlZ2luX3NtZ3JfcmVsYXRpb24o aW50IGZsYWdzLAogCQkJCQkJCQkJCQkJICAgUmVhZFN0cmVhbUJsb2NrTnVtYmVyQ0IgY2FsbGJh Y2ssCiAJCQkJCQkJCQkJCQkgICB2b2lkICpjYWxsYmFja19wcml2YXRlX2RhdGEsCiAJCQkJCQkJ CQkJCQkgICBzaXplX3QgcGVyX2J1ZmZlcl9kYXRhX3NpemUpOworZXh0ZXJuIHZvaWQgcmVhZF9z dHJlYW1fcmVzdW1lKFJlYWRTdHJlYW0gKnN0cmVhbSk7CiBleHRlcm4gdm9pZCByZWFkX3N0cmVh bV9yZXNldChSZWFkU3RyZWFtICpzdHJlYW0pOwogZXh0ZXJuIHZvaWQgcmVhZF9zdHJlYW1fZW5k KFJlYWRTdHJlYW0gKnN0cmVhbSk7CiAKLS0gCjIuNDcuMwoK --0000000000007a046f0643e4fe43--