Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8HC6-000NJQ-12 for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 12:30:42 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w8HC4-005kYJ-2t for pgsql-hackers@arkaria.postgresql.org; Thu, 02 Apr 2026 12:30:41 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w8HC4-005kYB-1b for pgsql-hackers@lists.postgresql.org; Thu, 02 Apr 2026 12:30:41 +0000 Received: from mail-dy1-x1333.google.com ([2607:f8b0:4864:20::1333]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w8HC2-00000000BHR-3Yum for pgsql-hackers@postgresql.org; Thu, 02 Apr 2026 12:30:40 +0000 Received: by mail-dy1-x1333.google.com with SMTP id 5a478bee46e88-2b4520f6b32so995300eec.0 for ; Thu, 02 Apr 2026 05:30:38 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775133038; cv=none; d=google.com; s=arc-20240605; b=ErHcNK/N06xOIuxWo8TbG5jvMyTHq2QtgdGR27ZKHHy2uI8epfsqG5NPCBwVTxm9vr Tyl4cw3+y5sm+MpTgNfziSMrzSffs/il8C+uonv4c0N4j5lhwyJ0vToZS49I5Wzfk3sB AwBdOwSNeiGhLEgpgCvWXuAlc4ecEty11qSjY+QqWOzBHq8i0MJWe+S4/Zd5N4O8DB0y qZC6onR/oKnJL7hezDbxk01qwTrt48A1fJ310TnBYCntH7BxNCdfej/3ipXao8JultpV aNXB5GjOZMZvIECSO6c1+tjV7/PZqOVsxpNUmKsJ0FHfyl5cnQW3NzN/IQGszY42t2D2 DZiA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=W0XsWls1bQMCyH7GyIFI/fpmR3+9Xg3zRJmr8I55hNA=; fh=iS7SRTUp5P20NmCmiat7FpR9MCLbz5KioeLxMmyZfBU=; b=K0iFRWuJKQ0vYxomaIDwkRPUfkALT+5/UNEEPx+WsFlN4ZfS8qy5hFV5Q12/+x5jzs HgXnpPQSly5BN/ZQWGvrp0wujZWcYwL64Dwl7+/a6jj5wc0XdZktuntxPvwyTavnQOy2 qseA4QLoYmhJI3C5MiRL40IXZwN43BDHGsUXeCrtDZ5laz+z6QCKS8HHp4r33qcXIemV E7hmcRrVc7ZpXW6ArGOt7U1dJV2obTZYotx2NeDHUjaPWP3HKy2Y69AGP6D+2RxyO6fa 5SUr4lLxx2cTUOZuJdhG4RVQ25roHOqE+N3J1wu6O9C6+v4VZvbk9yYMeQ+aoJDhyV5k vNAw==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1775133038; x=1775737838; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=W0XsWls1bQMCyH7GyIFI/fpmR3+9Xg3zRJmr8I55hNA=; b=Tn6DfFrrO+q5/Cx3LObfbicQwKKaBKsbpgqz/7ndj0Ed23Nn7KVTpeg49ffYSf254q BbyxfWeOuc5KkWgaoWLvthZ7Y865gL2tfOV08BLW+S4YI86RD6f6cZSU4kyj3WyJGAl4 NzfD8Oh2BBq1ERj43uiu/qIkOP/KfkSuMzvlhkxk1j2mCftKqP1tu5PXMWBNcGT5EFIb lcu1sYYFuoyhVnxMkpaHFKIapm/9hoQQrj8I0Le/jnFs6221AI8XN5AVG1l+vz9EXHpV wQHuGWD6tkgOtA0yL7hyJ/tpOxyoNLz5SF/pgiTpzUammJI51WqoN7v4PteRAVaBmO/9 0UnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775133038; x=1775737838; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=W0XsWls1bQMCyH7GyIFI/fpmR3+9Xg3zRJmr8I55hNA=; b=OoDnFPDd0sr1cbIhdpKZnjPASPhoSAZ0Mv3GxhVjJDaSKi8YDmhJQ3WtcQj/hSBG6a 2Ye/VjjDfbhx2+RTlVYKu5aluGp7fTM7vLYkTBg/m+iPGFzCCYjQTij6pHjBCHabYvzO VZ1MQa63WTGYi4SccClqSBcME2v5IEovZEQXu1nb4cL6PRFSSnR1JpeHOzMUE8KiLSOq aT19RE5i4bhl3yF7aL9LmfeDAVVwXo4lpbDqenrx7yCllFbl3sTb2jfcylpu1c/jpnyd WRC+KF9RefiwhbyuISBQzOCnjBROHj1mLdy126YIigujgLjXfwRuhl7dwkUPJlTe10no NSzw== X-Forwarded-Encrypted: i=1; AJvYcCUM/k13tklpUkWlyLv33jd2ZlXU5zLd2yY8NEn3PlilYf3oQiVHhJB96S2F4JLb5MIrP5gkkg7m84MVBQlX@postgresql.org X-Gm-Message-State: AOJu0YwHRygmbUDWLAtxsPfpQYfrJFCjGCMe6V961sAHhm6fN04Q/eoU MXMYiOxppjKXWw5vvkFLMHa/NMan6IZGrk06WbivpeJKn2bV23+5kDrECZVnJ+MYiheFQMcF86n f5A8zM5pqpULYoPIJ7Y8QicGvKHVZu3U= X-Gm-Gg: AeBDieu0JdEEHeOdkZD54U9fEQovjUJf2CNtJKLEbAl6SK4S9YUeeC1/YvBWjSN7wZ6 y4e0SEi8A3s1VJFDNwaxHFJKrSNNhysStsJv2FZhtudXkhrzGv8tCbtPGK0d6Mm9WMecFvORGZG JwYmx/XD/ydSxBba/iQtbciCSzHTCcJfDoaik4QSle5l0b8KFVR5KAj3iIt3w4zUoBHbRnKYcJc fN/kuXvWfIzV5d0fXaT1H+4H0Npzsv1X751VswTajq3Bh+vFjk5SOcS0D9UkwG/5/yA/i/ELjlw opJEFuDDmdQyR0+i2PZq X-Received: by 2002:a05:7300:2202:b0:2c4:b8d6:45cf with SMTP id 5a478bee46e88-2c93138464amr4409211eec.9.1775133037742; Thu, 02 Apr 2026 05:30:37 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Nazir Bilal Yavuz Date: Thu, 2 Apr 2026 15:30:26 +0300 X-Gm-Features: AQROBzD7GhrA74KhFyQHuZmPQsliOg9QGe3a3L92dzqVXKZMltnFdeKn2eDfk7s Message-ID: Subject: Re: AIO / read stream heuristics adjustments for index prefetching To: Andres Freund Cc: Melanie Plageman , pgsql-hackers@postgresql.org, Thomas Munro , Peter Geoghegan , Tomas Vondra Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi, On Thu, 2 Apr 2026 at 02:20, Andres Freund wrote: > > I've pushed what was 0001 and 0002. Will push the former 0003 shortly. 0001 LGTM. 0002: read_stream: Prevent distance from decaying too quickly + /* + * As we needed IO, prevent distance from being reduced within our + * maximum look-ahead window. This avoids having distance collapse too + * quickly in workloads where most of the required blocks are cached, + * but where the remaining IOs are a sufficient enough factor to cause + * a substantial slowdown if executed synchronously. + * + * There are valid arguments for preventing decay for max_ios or for + * max_pinned_buffers. But the argument for max_pinned_buffers seems + * clearer - if we can't see any misses within the maximum look-ahead + * distance, we can't do any useful read-ahead. + */ + stream->distance_decay_holdoff = stream->max_pinned_buffers; That is already committed but I have a question. Did you think about setting stream->distance_decay_holdoff to current stream->distance? This will also fix 'miss followed by a hit' (it won't fix double missed followed by a hit, though). 0003: aio: io_uring: Trigger async processing for large IOs There is a small optimization opportunity, we don't need to calculate io_size for the DIO since pgaio_uring_should_use_async() will always return false. Do you think it is worth implementing this? Other than that, LGTM. 0004: read_stream: Only increase distance when waiting for IO LGTM. 0005: WIP: read_stream: Move logic about IO combining & issuing to helpers /* never pin more buffers than allowed */ if (stream->pinned_buffers + stream->pending_read_nblocks >= stream->max_pinned_buffers) return false; /* * Don't start more read-ahead if that'd put us over the distance limit * for doing read-ahead. */ if (stream->pinned_buffers + stream->pending_read_nblocks >= stream->distance) return false; AFAIK, stream->distance can't be higher than stream->max_pinned_buffers [1], so I think we can remove the first if block. If we are not sure about [1], stream->max_pinned_buffers most likely will be higher than stream->distance, it would make sense to change the order of these. Aha, I understood this change after looking at 0006. It is nitpick but 'if (stream->pinned_buffers + stream->pending_read_nblocks >= stream->max_pinned_buffers)' block can be added in 0006. Other than these, LGTM. 0006: WIP: read stream: Split decision about look ahead for AIO and combining I liked the idea of being more aggressive to do IO combining. What is the reason for gradually increasing combine_distance, is it to not do unnecessary IOs at the start? + /* + * XXX: Should we actually reduce this at any time other than + * a reset? For now we have to, as this is also a condition + * for re-enabling fast_path. + */ + if (stream->combine_distance > 1) + stream->combine_distance--; I don't think we need to reduce this other than reset. + /* + * Allow looking further ahead if we have an the process of building a + * larger IO, the IO is not yet big enough and we don't yet have IO in + * flight. Note that this is allowed even if we are reaching the + * read-ahead limit (but not the buffer pin limit). + * + * This is important for cases where either effective_io_concurrency is + * low or we never need to wait for IO and thus are not increasing the + * distance. Without this we would end up with lots of small IOs. + */ + if (stream->pending_read_nblocks > 0 && + stream->pinned_buffers == 0 && + stream->pending_read_nblocks < stream->combine_distance) + return true; Do we need to check 'stream->pinned_buffers == 0' here? I think we can continue to look ahead although there are pinned buffers as we already know 'stream->pinned_buffers + stream->pending_read_nblocks < stream->max_pinned_buffers'. + /* same if capped not by io_combine_limit but combine_distance */ + if (stream->combine_distance > 0 && + pending_read_nblocks >= stream->combine_distance) + return true; I think we don't need to check 'stream->combine_distance > 0', it shouldn't be less or equal than zero when the 'stream->readahead_distance' is not zero, right? + { + stream->readahead_distance = Min(max_pinned_buffers, stream->io_combine_limit); + stream->combine_distance = stream->io_combine_limit; + } I think the stream->combine_distance should be equal to stream->readahead_distance as we don't want to surpass max_pinned_buffers. 0007: Hacky implementation of making read_stream_reset()/end() not wait for IO Read stream parts LGTM. I don't have a comment on the FIXME part, though. -- Regards, Nazir Bilal Yavuz Microsoft