Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vIvox-003Mdr-24 for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Nov 2025 21:22:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vIvou-007uJJ-0u for pgsql-hackers@arkaria.postgresql.org; Tue, 11 Nov 2025 21:22:32 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vIvot-007uJB-3A for pgsql-hackers@lists.postgresql.org; Tue, 11 Nov 2025 21:22:31 +0000 Received: from mail-pl1-x62a.google.com ([2607:f8b0:4864:20::62a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vIvor-007Dk9-2L for pgsql-hackers@postgresql.org; Tue, 11 Nov 2025 21:22:31 +0000 Received: by mail-pl1-x62a.google.com with SMTP id d9443c01a7336-29800ac4ef3so284985ad.1 for ; Tue, 11 Nov 2025 13:22:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1762896148; x=1763500948; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=i+6wmzEITbN9stuRKqIOSH3qWTax8z8/piFfZVyJJXI=; b=edhhN3pz7TO1sXiZhesylfavUg5ulv2Tx+tjGEIkHy3LWmmGjd4uGZGFjABolPDqj6 KfMqwdRdxMlryOjhznsGIcOLzv27cEdWmAZtDm4zdGQKZlve2GtzIxVVtJqNCnPhY/gc A+LeR07x6lMuu9gTD7akfn910ldQgbc8+M5qy1FQiyXZ2EAI2DnvMdoEPInKrbHdmHvP 5Rvm4YLDznkxMz9xXokrN+Y36+FBxTd/2nz0C2MbjspaUjhHPHBqqkhO2sysbDnCTEmd ApZ0oeDRLJfTXrO1zxf/+JCzThvyLLnBtR3YCRQhFqauxg8HHsfV0GgRMhfEikq6fTqH RwxA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1762896148; x=1763500948; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=i+6wmzEITbN9stuRKqIOSH3qWTax8z8/piFfZVyJJXI=; b=OYPp1ugEx0FFyKfmS59QL89Go/HEDsgA5z1iw/UAG3BLi7eV8G+nGSgRLQrXb5YSEC V571lNn+Q/Zgddy7QuBFcWqZ3BOOya5+XgE2tiEXBepTgcp4+f+SlbeQBdTD516C0nlJ MjbIvyvHvta3elSPsfMuL56sqcru4gSujiFqjlSmqQbo6S0ctgCLdJfvfCcSGzXD/TNY 9QbMwOZUJ2ITqz1S5OT8esW2BuFq4NNFccDWrNhl6wFgmCIFmjx7yPND4Gv0ZbnP0rOa /y8qN3e6e5T5f0xO710yydKu8awNldlccaaojvco+PU+ZKzFyilETMdX7oSbTPjtjOpM SGbA== X-Gm-Message-State: AOJu0Yy009aQ4f668ngriPUZfJZPBfsYc8Am6tsc3kooRam7gMAQWCRl jKa8kc1KR8cYpIQzRe+b6v9NBLDoJU61qJZAslzcuqY5OvIUxr3hMsRiBjD7Vrcm0XfJvFT+YF/ n3D2O1S9zCTM0VTUDkQTFq6TV4izZ3RfMbVtH6HE= X-Gm-Gg: ASbGncthRdV48381xEGMQRe1r9RELhuo87rq3zRZkAY1+u0lzj0KUzyTVHly8t/rzys 7uEX5xJ6HHL3i5ERQL3eFCBFZhWffZ+P9ZUNoUvTBHQqM8dWw48oWhRmQLNBiz8uC3J/dY+gC/E XvBXo9fk99zL5EsRkcPeJ+o21YqyZA1XemfDYjPmmdy5bIDf2fPci662lc9D4PKXlilInNYfMds C5dk63nh5SPMHl1PrAUnwv4wklhIzruSFyWnizC0Cf6rhOW572oWowXNaSr1782HKc9dR+aaoa9 aj2/D0NTBBDzrzfHSs2pdJ49TmoGvCF9Sb1lO7wHMuaDqWtZNpiz X-Google-Smtp-Source: AGHT+IGk+M76TIWDr85kxkchDL3DdcF+YKAofA+RpUD4RQ4h+sbE8fBYdzxt318ZYkXDHjDmLlY2zSxvLSQFqngyVSQ= X-Received: by 2002:a17:902:f291:b0:277:c230:bfca with SMTP id d9443c01a7336-2984ed74fabmr3308545ad.4.1762896147879; Tue, 11 Nov 2025 13:22:27 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Thomas Munro Date: Wed, 12 Nov 2025 10:21:50 +1300 X-Gm-Features: AWmQ_bkVOvJmsLnauw98P3spGc5VlhsZ0otaxlO3aN1ETSHK0gGqhNFm0aINObU Message-ID: Subject: Re: Trying out read streams in pgvector (an extension) To: "Jonathan S. Katz" Cc: pgsql-hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Sep 6, 2024 at 4:28=E2=80=AFPM Thomas Munro wrote: > Here's a new version with a TODO tidied up. I also understood that we > need to tweak the read_stream_reset() function, so that it doesn't > forget its current readhead distance when it hops between HNSW nodes > (which is something that comes up in several other potential uses > cases including another one I am working in in core). Without this > patch for PostgreSQL, it reads 1, 2, 4, 7 blocks (=3D 16 in total) > before it has to take a break to hop to a new page, and then it start > again at 1. Oops. With this patch, it is less forgetful, and reaches > the full possible I/O concurrency of 16 (or whatever the minimum of > HNSW's m parameter and effective_io_concurrency is for you). I heard that the pgvector project is now trying to do this for real, and (surprise!) running into this problem. It causes streamified HNSW search to regress in performance on some queries when the overheads of streaming are not outweighed by the (bogusly constrained) gains in concurrency. We just don't generate enough concurrency to win. I probably should have been more opinionated and just committed a version of that distance-reset policy change, but I guess at the time I wrote the above, streaming and AIO were a little too abstract to attract reviews relating to hypothetical external projects. We definitely want to fix that for v19, because it also affects the streamified index scan project and doubtless many other things. I wrote about that with patches[1] and will start a new thread soon with a new collection of rebased heuristics improvements. But for now, to fix pgvector's woes, I wonder if it might make sense to call this a bug in v18, and back-patch the tiniest possible change. Something like what I posted[2] in this thread almost two years ago. I don't think it really affects any core code: we use read_stream_reset() only in very minimal ways there (I could elaborate), and it's quite arguable that the existing policy is wrong for them too, but we'd need to confirm that and perhaps think about other extensions that might be using it. Better ideas? [1] https://www.postgresql.org/message-id/flat/CA%2BhUKGL6hCd40Dh1AcFcoiw5z= JXK2T1dRKO3oe8RkPExqA5zoQ%40mail.gmail.com#181a22a8be99ff561b7beae44986870c [2] https://www.postgresql.org/message-id/flat/CA%2BhUKG%2Bx2BcqWzBC77cN0ew= hzMF0kYhC6c4G_T2gJLPbqYQ6Ow%40mail.gmail.com#9aa6012713b473611ae46d8e203258= 6f