Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rGKr6-00EmIV-GJ for pgsql-hackers@arkaria.postgresql.org; Thu, 21 Dec 2023 15:21:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rGKr4-00ARtB-Kq for pgsql-hackers@arkaria.postgresql.org; Thu, 21 Dec 2023 15:20:58 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rGKr4-00ARoA-5N for pgsql-hackers@lists.postgresql.org; Thu, 21 Dec 2023 15:20:58 +0000 Received: from mail-ej1-x62a.google.com ([2a00:1450:4864:20::62a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rGKqu-00BWEp-7J for pgsql-hackers@lists.postgresql.org; Thu, 21 Dec 2023 15:20:54 +0000 Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-a2331caab22so100323166b.3 for ; Thu, 21 Dec 2023 07:20:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1703172046; x=1703776846; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=y+rbVTkpuafD9i507K7eF0pZNE6QAzIfju6RGDmpUbk=; b=dZJWGIPshCODoWD9sAKJ12wMey3T5h6qVzSH0zb9fDk1IiOPQlZGGfM+vmsH0p/IGx SX1yostrcM8HmRPZsi1nCQTvcomTLOGekvUNYhZIbersNRC4CncA0FwnpaiiaoafHpla F7fzvWFWgPi/hIk6W6aRiG+idub5E0eKoYPVWi2XuVD2A3saP2qghBSOSQT4LsLuf2q6 +1t49XOHapLL0Qmqb1Xv2sEvv4PNSEsGAw3vQaRcvsZweZWh79Gh299IKeyk43zv++ow HhBTylSc7jFaLHqF4z9wB0YrV8FOr+p5ASbiCZBqhLCzKgVydigQXEvHANKa01qTr1Iu rIaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703172046; x=1703776846; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=y+rbVTkpuafD9i507K7eF0pZNE6QAzIfju6RGDmpUbk=; b=q9B6brXnKa+RnKXhIkI90SVhNqWRwaFsoRJhE4uR10MJfntVO4IrKDuo4QqpqWe4ZA k88RVKUX2XIwo8QRjJndm+sIguZ8gTl0N1vpgj++4+SX2NMEMN8ebnsdqJUfzL6tFSCE Tnn38iwHncDyxePJxdj0weN8a8kHUhgQrmI+wbuqZ7ANi7LcbZ1Fz777E86M1Zxu0/bE zeA97DYbx3N5yrDtogRdT5b1GvZBnWR9DnOIczOxnPeBayON5yDok8vSO2HRg+YGIjGL CZPBkkiLXFVb7tJDqQ8775GoLCA353sQBVlZ1Ikvy1V8DWkp2ZUyVSpAmfAvrI/OGxN0 ztKA== X-Gm-Message-State: AOJu0Yz1EVt0j6FjwwFe9tNlAADkPJdhJlbXKosDRZMJglxuiuOJ2vXK 4OB+8uB/zt8ZmVlWhLVDGLqWtA== X-Google-Smtp-Source: AGHT+IFFwMoeCa5HkBtJtl18NLBY57mbSg8r9XhGe2+WSkpO1aBTY7wwbRxXuLEOGEFTWAMjZqHzYg== X-Received: by 2002:a17:906:20c9:b0:a23:686d:1a51 with SMTP id c9-20020a17090620c900b00a23686d1a51mr2871301ejc.149.1703172046566; Thu, 21 Dec 2023 07:20:46 -0800 (PST) Received: from [10.137.0.17] (srv1.mobissw.com. [89.235.0.226]) by smtp.gmail.com with ESMTPSA id ge12-20020a170907908c00b00a26ac403032sm409107ejb.65.2023.12.21.07.20.46 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 21 Dec 2023 07:20:46 -0800 (PST) Message-ID: Date: Thu, 21 Dec 2023 16:20:45 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.15.1 Subject: Re: index prefetching Content-Language: en-US To: Andres Freund Cc: Robert Haas , PostgreSQL Hackers , Georgios References: <8c86c3a6-074e-6c88-3e7e-9452b6a37b9b@enterprisedb.com> <3cd40425-965a-5ce1-1af3-d51971c44b93@enterprisedb.com> <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com> <06bb7d02-2c44-3062-731e-a735ba13da7e@enterprisedb.com> <367160ea-b1ed-4481-e804-bca509128878@enterprisedb.com> <280dc83c-a16f-4424-1319-95e7e3f798bd@enterprisedb.com> <98ba4b25-fae8-c1f4-1597-8093375a1986@enterprisedb.com> <20231221134314.wf2rs62d37u62j7t@alap3.anarazel.de> From: Tomas Vondra In-Reply-To: <20231221134314.wf2rs62d37u62j7t@alap3.anarazel.de> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 12/21/23 14:43, Andres Freund wrote: > Hi, > > On 2023-12-21 13:30:42 +0100, Tomas Vondra wrote: >> You're right a lot of this is a guesswork. I don't think we can do much >> better, because it depends on stuff that's out of our control - each OS >> may do things differently, or perhaps it's just configured differently. >> >> But I don't think this is really a serious issue - all the read-ahead >> implementations need to work about the same, because they are meant to >> work in a transparent way. >> >> So it's about deciding at which point we think this is a sequential >> pattern. Yes, the OS may use a slightly different threshold, but the >> exact value does not really matter - in the worst case we prefetch a >> couple more/fewer blocks. >> >> The OS read-ahead can't really prefetch anything except sequential >> cases, so the whole question is "When does the access pattern get >> sequential enough?". I don't think there's a perfect answer, and I don't >> think we need a perfect one - we just need to be reasonably close. > > For the streaming read interface (initially backed by fadvise, to then be > replaced by AIO) we found that it's clearly necessary to avoid fadvises in > cases of actual sequential IO - the overhead otherwise leads to easily > reproducible regressions. So I don't think we have much choice. > Yeah, the regression are pretty easy to demonstrate. In fact, I didn't have such detection in the first patch, but after the first round of benchmarks it became obvious it's needed. > >> Also, while I don't want to lazily dismiss valid cases that might be >> affected by this, I think that sequential access for index paths is not >> that common (with the exception of clustered indexes). > > I think sequential access is common in other cases as well. There's lots of > indexes where heap tids are almost perfectly correlated with index entries, > consider insert only insert-only tables and serial PKs or inserted_at > timestamp columns. Even leaving those aside, for indexes with many entries > for the same key, we sort by tid these days, which will also result in > "runs" of sequential access. > True. I should have thought about those cases. > >> Obviously, the latter case has much more severe impact, but it depends >> on the exact workload / access pattern etc. The only "perfect" solution >> would be to actually check the page cache, but well - that seems to be >> fairly expensive. > >> What I was envisioning was something self-tuning, based on the I/O we >> may do later. If the prefetcher decides to prefetch something, but finds >> it's already in cache, we'd increase the distance, to remember more >> blocks. Likewise, if a block is not prefetched but then requires I/O >> later, decrease the distance. That'd make it adaptive, but I don't think >> we actually have the info about I/O. > > How would the prefetcher know that hte data wasn't in cache? > I don't think there's a good way to do that, unfortunately, or at least I'm not aware of it. That's what I meant by "we don't have the info" at the end. Which is why I haven't tried implementing it. The only "solution" I could come up with was some sort of "timing" for the I/O requests and deducing what was cached. Not great, of course. > >> Alternatively, I was thinking about moving the prefetches into a >> separate worker process (or multiple workers), so we'd just queue the >> request and all the overhead would be done by the worker. The main >> problem is the overhead of calling posix_fadvise() for blocks that are >> already in memory, and this would just move it to a separate backend. I >> wonder if that might even make the custom cache unnecessary / optional. > > The AIO patchset provides this. > OK, I guess it's time for me to take a look at the patch again. > >> AFAICS this seems similar to some of the AIO patch, I wonder what that >> plans to do. I need to check. > > Yes, most of this exists there. The difference that with the AIO you don't > need to prefetch, as you can just initiate the IO for real, and wait for it to > complete. > Right, although the line where things stop being "prefetch" and becomes "async" seems a bit unclear to me / perhaps more a point of view. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company