Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ulyVh-0083i3-Qg for pgsql-hackers@arkaria.postgresql.org; Tue, 12 Aug 2025 23:34:29 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ulyVg-00Ai4Q-AD for pgsql-hackers@arkaria.postgresql.org; Tue, 12 Aug 2025 23:34:28 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ulyVf-00Ai4I-QG for pgsql-hackers@lists.postgresql.org; Tue, 12 Aug 2025 23:34:28 +0000 Received: from mail-wm1-x329.google.com ([2a00:1450:4864:20::329]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1ulyVd-000GdM-1W for pgsql-hackers@lists.postgresql.org; Tue, 12 Aug 2025 23:34:27 +0000 Received: by mail-wm1-x329.google.com with SMTP id 5b1f17b1804b1-459d7726ee6so31236155e9.2 for ; Tue, 12 Aug 2025 16:34:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1755041663; x=1755646463; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KxNvQSUSYN1jCf6B05Pf8UhlTcIJ+oE/97kfRJ8h4TU=; b=VKGpGQzPbAWMQb/qrcw9gaPsoTbLgkR5yfhQY4NPpbCOxg5YSak8+1jsvn07AsQ3Y2 P/mecJwejxFqlfM+lIdWpRREN7xRrZDlCeI7osAmu1lnuQXBIhUIBJzBMBA2RrrLCzM0 4JhRPIyYqV7yJ1BlSYVUztimCmfleXD2tmfKzMMX5IglX59dTclbp4DRGGxpkyOXThL0 C/va/EIq2RFhgxL0c8uxyrbWwDb9Llq9ETsIW9m/t2lnpCML5D8BCSJzGRGojoqeva6z Z3muwhYwaFkwygG1ou1WDikMchmDVBXXufjzL6ZtvBRr6oc0Ycf7XP9ksixb9CQ01ETs TSPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755041663; x=1755646463; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KxNvQSUSYN1jCf6B05Pf8UhlTcIJ+oE/97kfRJ8h4TU=; b=ib0M71NHUQ2amD/JuHvbI98XvI499xOafWSMmfX1GXVnfPUB1T61vkdNzsXfcmOVDr S4Od6atfil2B44E+y2gUGsB2ObZIItoUHZYlimME7JudQThLeWxdxp95GROIEY3me1d3 lhOtowic3pz3bgNtWXzHJlHiYd3Goi1mDDBYsSj3LB26saJHz9zqLEiJVf7yww2zRbCt TIQ4H0lw+/h/xE6IZ4r7zfgLXFKo/PgEEfLMf3CsJsfGAxOOPnrjOyidUKpjL+LO7ZrT P48xvCIIaJAybsvSoJIBUYyL9nsWBj0nxcQegDqDkvBqBd5QiXKqXVbHXQNuSKm5/4+T gs7g== X-Forwarded-Encrypted: i=1; AJvYcCVwiEYGe6dK5xGPoceQblHCOSjbx3FgdkTD6+h/ITEoLtgWN2O2wp2GG4h+Cy3uPgg1H46rJmO7Rx/LeHoP@lists.postgresql.org X-Gm-Message-State: AOJu0YyiNmMVOmtbrRTSogDbFhC6YRu4AKCNagllRlurF7azF4Q5+SfF 0HGNp1+5tHqNrFpB4ArRErgoTcplo/Vf7WBHquS54R38lJ4wvC6Qggnj8PepvUrBm18r9209gez 0aYtEs1x+XcDXIWglekiCVRYLxPyI/ScCFAqN2FvUgw== X-Gm-Gg: ASbGnctt9F1nJJ+gQM22bxwgacFh/xEL2vYwe1uVDaEP1i1OXhvxUDUlUZVbcUyR5Rd /ZEiR45tjjiJFPx6gtQuIOqQWgNJQ5pbeUeJ2GA6XkTLhGbrKe0XReItOV6PbI8qmbJeKsquB4J cNRH/+D/LoejRjKnQUAITvAxXc7puGbYUeQt2j1HbZhvf5PtDq1TTSZnVdXoz3pMk4WxHFNJfyo izVzN0= X-Google-Smtp-Source: AGHT+IHRF9Hv0LKR30VmFr9LuzhuL+Pd78C7NS9qmholMuNTPXApboCwlkKhmEhaKUHKzFnmQz+y7pj4RxR/ifPZK0E= X-Received: by 2002:a05:600c:1c92:b0:459:d709:e59f with SMTP id 5b1f17b1804b1-45a1650c4bcmr7580595e9.0.1755041663193; Tue, 12 Aug 2025 16:34:23 -0700 (PDT) MIME-Version: 1.0 References: <8ed1d326-5c6e-476e-b3fd-30d3da210546@vondra.me> <306fc8c0-c882-4602-86f5-a106b9ace603@vondra.me> <152ea782-5bd4-4435-b021-0ab2da61e63d@vondra.me> <7c2f6350-6fca-4e39-b0a8-8ac735f5d58a@vondra.me> <2e63cadd-2a03-46b1-866e-7ea5d3ffd37f@vondra.me> <51b5f71b-5f19-4453-91ff-2b9f2a840c58@vondra.me> In-Reply-To: From: Peter Geoghegan Date: Tue, 12 Aug 2025 19:33:57 -0400 X-Gm-Features: Ac12FXz4ZNoKb49nhf8C284tYb9TDOsm0UBNbyxokAgTXajFiJ7hcdC3MTiswps Message-ID: Subject: Re: index prefetching To: Tomas Vondra Cc: Thomas Munro , Andres Freund , Nazir Bilal Yavuz , Robert Haas , Melanie Plageman , PostgreSQL Hackers , Georgios , Konstantin Knizhnik , Dilip Kumar Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Aug 12, 2025 at 7:10=E2=80=AFPM Tomas Vondra wrot= e: > Actually, this might be a consequence of how backwards scans work (at > least in btree). I logged the block in index_scan_stream_read_next, and > this is what I see in the forward scan (at the beginning): Just to be clear: you did disable deduplication and then reindex, right? You're accounting for the known issue with posting list TIDs returning TIDs in the wrong order, relative to the scan direction (when the scan direction is backwards)? It won't be necessary to do this once I commit my patch that fixes the issue directly, on the nbtree side, but for now deduplication messes things up here. And so for now you have to work around it. > But with the backwards scan we apparently scan the values backwards, but > then the blocks for each value are accessed in forward direction. So we > do a couple blocks "forward" and then jump to the preceding value - but > that's a couple blocks *back*. And that breaks the lastBlock check. I don't think that this should be happening. The read stream ought to be seeing blocks in exactly the same order as everything else. > I believe this applies both to master and the prefetching, except that > master doesn't have read stream - so it only does sync I/O. In what sense is it an issue on master? On master, we simply access the TIDs in whatever order amgettuple returns TIDs in. That should always be scan order/index key space order, where heap TID counts as a tie-breaker/affects the key space in the presence of duplicates (at least once that issue with posting lists is fixed, or once deduplication has been disabled in a way that leaves no posting list TIDs around via a reindex). It is certainly not surprising that master does poorly on backwards scans. And it isn't all that surprising that master does worse on backwards scans when direct I/O is in use (per the explanation Andres offered just now). But master should nevertheless always read the TIDs in whatever order it gets them from amgettuple in. It sounds like amgetbatch doesn't really behave analogously to master here, at least with backwards scans. It sounds like you're saying that we *won't* feed TIDs heap block numbers to the read stream in exactly scan order (when we happen to be scanning backwards) -- which seems wrong to me. As you pointed out, a forwards scan of a DESC column index should feed heap blocks to the read stream in a way that is very similar to an equivalent backwards scan of a similar ASC column on the same table. There might be some very minor differences, due to differences in the precise leaf page boundaries among each of the indexes. But that should hardly be noticeable at all. > Could that hide the extra buffer accesses, somehow? I think that you meant to ask about *missing* buffer hits with the patch, for the forwards scan. That doesn't agree with the backwards scan with the patch, nor does it agree with master (with either the forwards or backwards scan). Note that the heap accesses themselves appear to have sane/consistent numbers, since we always see "read=3D49933" as expected for those, for all 4 query executions that I showed. The "missing buffer hits" issue seems like an issue with the instrumentation itself. Possibly one that is totally unrelated to everything else we're discussing. -- Peter Geoghegan