Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umbGd-000epQ-6s for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 16:57:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1umbFb-008KuZ-G2 for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Aug 2025 16:56:27 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1umbFb-008KuQ-2w for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 16:56:27 +0000 Received: from mail-qt1-x831.google.com ([2607:f8b0:4864:20::831]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1umbFY-000ZvI-0h for pgsql-hackers@lists.postgresql.org; Thu, 14 Aug 2025 16:56:26 +0000 Received: by mail-qt1-x831.google.com with SMTP id d75a77b69052e-4b109c482c8so20551311cf.3 for ; Thu, 14 Aug 2025 09:56:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bowt-ie.20230601.gappssmtp.com; s=20230601; t=1755190583; x=1755795383; darn=lists.postgresql.org; h=in-reply-to:references:to:from:subject:cc:message-id:date :content-transfer-encoding:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=71MdkLG7HQOyLzi0Hn/9KPnl5iZEQDsVMpVeiEWEgqM=; b=qH38poRYDqfs8Jl7WYQ8PzXBJz9RcEzij+2NIl6RQzjjYRWfxWFz80gt+fncgi7Oz9 sxssCC8/aE0E8Por+jJrvgGVD9VWp9quu8U9mhOVD98AAWvomsHynI3Dfky0vweq+wCk cSqFV/1Vd80YsGo6e4WYxPeMdfE9aJIfkMS6o0ZB6yWIqJXQlnXJx4DRVGV8V6oksfRS Mchq03M9xIEnEUpTqEbWOsj2dhphACD9zxhQJ+MTahw5x5Nyy7LP3RybjnbOUyqYykRs Et28qBFMELgZ2j/1CPPaFmLHKnDbF5EocHRNzQF8GCpXjv8VgiHdnZqF9xxFk1cwfmlm aG3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755190583; x=1755795383; h=in-reply-to:references:to:from:subject:cc:message-id:date :content-transfer-encoding:mime-version:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=71MdkLG7HQOyLzi0Hn/9KPnl5iZEQDsVMpVeiEWEgqM=; b=VHyLqOhRQezAFoiG49ytfEOiUfmE+QwQhAOMMsVYRGOAPHVDhANBOHFBctzN92Ts5S J2o+x5YFcoGNJRSh6ABi4LxsgoSPe+wdG8YBIVPLiWFgLd4h8e8aQkocL9DlFKrQtfYH dekrnZ83GYgfi32QoUdyka+ccvvT9mKVGhBNAcDxLofVIiKeFYGdPjewZ4DwNVFsJtpM wbb5VCTfWlxgAKIAMP1PUrz87M6ZLEeIoFQxYJSXDriGJHtdbfEn//X5WqMNWuNuAfeb GMo51wNgRxd/03sMVbHpzLd4K/U+mC7A6J26OGrSMVuLUHBE+K23pjxYwcU+pcv2IEkp HX7w== X-Forwarded-Encrypted: i=1; AJvYcCWZV9WUpKPSfcktWD7f+/74t8zsq6l2yh3mQkljs75h4OYIna8fgOJm3z2KijSPQz/9TGnW/ldS8vR+dQ8t@lists.postgresql.org X-Gm-Message-State: AOJu0YzETiNfibJvMav4UNvrvKJ4MsFatND7mUNtPQkLm+57NtqiOvFs 5xlGFs6oGRSZvxp7V2ZCdVvXLJnDkIa/9Rc7xHAqi4O3uj4Jpj7Q0UJOWMiFnZNnYUM= X-Gm-Gg: ASbGncuNJS+HtLQI9eEOJI5JFbIQLfki+7E0Pt4Smb5rZ4dkF77yeHL7KKrIMRXrN5Q P9/GkbqzDnUQlkurxfEcvu7k6Ng/hreo6h+fEucospLybQkczfzYYz20XQl6sdJ9Tck1rhEOXwf gR9nPHhiMKiZZHWDH+f2CLtkIcVajqRc2F7uSAFpP6zlVvSFl01x0C+zHWPdn2MqkZcowSaaTM2 fvaoIPzToTreAGbzmTtBkz3BpR1r+WMUfKGcLIET0LZpFstXPkBFL1/TraYPnVv+YbuhHnZcPba 14EGv9ePka+QpbNOLEclkhi9djnVThLq3x5/MeJj7RpoYG74vfqWG4vUFUtlmIP49EGqZZXAVLu tErp6zxgTxUuKy6W8W5YKT6RH4/CnKVENhFxsheepwiZwxirO74vyMS2E X-Google-Smtp-Source: AGHT+IEgxh3YRKP+PYZCvQTXLKXe/ZMbyyZfJIqjENGrv8NdrLuXZ/gO0WFhdV8gek6jAVsd1WsN7Q== X-Received: by 2002:a05:620a:4492:b0:7e8:38ca:c499 with SMTP id af79cd13be357-7e87047618amr555389485a.25.1755190583420; Thu, 14 Aug 2025 09:56:23 -0700 (PDT) Received: from localhost (syn-024-090-110-103.res.spectrum.com. [24.90.110.103]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7e820f9d201sm1303526785a.42.2025.08.14.09.56.22 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 14 Aug 2025 09:56:22 -0700 (PDT) Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=UTF-8 Date: Thu, 14 Aug 2025 12:56:21 -0400 Message-Id: Cc: "Thomas Munro" , "Nazir Bilal Yavuz" , "Robert Haas" , "Melanie Plageman" , "PostgreSQL Hackers" , "Georgios" , "Konstantin Knizhnik" , "Dilip Kumar" Subject: Re: index prefetching From: "Peter Geoghegan" To: "Tomas Vondra" , "Andres Freund" X-Mailer: aerc 0.20.0 References: <51b5f71b-5f19-4453-91ff-2b9f2a840c58@vondra.me> <6cb6109d-71d6-490c-8056-d8885081b008@vondra.me> <7c308720-5cd4-4508-8cc0-6e55a6352731@vondra.me> In-Reply-To: <7c308720-5cd4-4508-8cc0-6e55a6352731@vondra.me> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed Aug 13, 2025 at 8:59 PM EDT, Tomas Vondra wrote: > On 8/14/25 01:50, Peter Geoghegan wrote: >> I first made the order of the table random, except among groups of index= tuples >> that have exactly the same value. Those will still point to the same 1 o= r 2 heap >> blocks in virtually all cases, so we have "heap clustering without any h= eap >> correlation" in the newly rewritten table. To set things up this way, I= first >> made another index, and then clustered the table using that new index: > Interesting. It's really surprising random I/O beats the sequential. It should be noted that the effect seems to be limited to io_method=3Dio_ur= ing. I find that with io_method=3Dworker, the execution time of the original "sequential heap access" backwards scan is very similar to the execution ti= me of the variant with the index that exhibits "heap clustering without any he= ap correlation" (the variant where individual heap blocks appear in random ord= er). Benchmark that includes both io_uring and worker =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D I performed the usual procedure of prewarming the index and evicting the he= ap relation, and then actually running the relevant query through EXPLAIN ANALYZE. Direct I/O was used throughout. io_method=3Dworker ---------------- Original backwards scan: 1498.024 ms (shared read=3D48.080) "No heap correlation" backwards scan: 1483.348 ms (shared read=3D22.036) Original forwards scan: 656.884 ms (shared read=3D19.904) "No heap correlation" forwards scan: 578.076 ms (shared read=3D10.159) io_method=3Dio_uring ------------------ Original backwards scan: 1052.807 ms (shared read=3D187.876) "No heap correlation" backwards scan: 649.473 ms (shared read=3D365.802) Original forwards scan: 593.126 ms (shared read=3D55.837) "No heap correlation" forwards scan: 429.888 ms (shared read=3D188.619) Summary ------- As of this morning, io_method=3Dio_uring also shows that the forwards scan = is faster with random heap accesses than without (not just the backwards scan)= . I double-checked, to make sure that the effect was real; it seems to be. I'm aware that some of these numbers (those for the original/sequential forward scan case) don't match what I reported on Tuesday. I believe that this is due to changes I made to my SSD's readahead using blockdev, though it's possible that there's some other explanation. (In case it matters, I'm running Debian unstable with liburing2 "2.9-1".) The important point remains: at least with io_uring, the backwards scan que= ry is much faster with random I/O than it is with descending sequential I/O. I= t might make sense if they were at least at parity, but clearly they're not. --=20 Peter Geoghegan