Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rSKoE-00D3Ky-3Y for pgsql-hackers@arkaria.postgresql.org; Tue, 23 Jan 2024 17:43:38 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rSKoC-00Elta-Oz for pgsql-hackers@arkaria.postgresql.org; Tue, 23 Jan 2024 17:43:36 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rSKoC-00EltS-Ad for pgsql-hackers@lists.postgresql.org; Tue, 23 Jan 2024 17:43:36 +0000 Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rSKo4-002zoL-2u for pgsql-hackers@lists.postgresql.org; Tue, 23 Jan 2024 17:43:34 +0000 Received: by mail-ej1-x62d.google.com with SMTP id a640c23a62f3a-a31047165a2so14073166b.2 for ; Tue, 23 Jan 2024 09:43:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1706031806; x=1706636606; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=d9s+FME6SGkQ+g813SQNV6RRZiHCfY7buuFLcCkLFsQ=; b=g84s0KNpd3OfyhcxOr/zmNqqkCU/6jRPcRylo6qoO0+Kq62fUCv/XuwO911UFx69Sd 9npriQwtRSb9Z17g6ddDilEJcaCL3e2uehoiAbez3GI7j2nXQCgLje0CW/etPA4A1CZL nqE7hoGYyYJAKCVw/6+pH8kZ8XN066OkuMm2X5OqO/67P5D8QereWWKpuq7BKLd6LmMW SeQpmOl1qkhte9ikPbamiwdwNYdmXE0wC/N62YdH73W08rqxqbfFe/KyRx0HG7xpT7XX ApgRz2SyRz4fKTjxcFGLUZShOqTEUny73pPLFWgnYWRybous9hyDKoKVIBoE0z5PJmUO vbWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706031806; x=1706636606; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=d9s+FME6SGkQ+g813SQNV6RRZiHCfY7buuFLcCkLFsQ=; b=XklN9bg4Ehfb/Z+O/fOHIfyoBHlAdeTU/qbs3g5x+5VtTCj5Cu/cx2XwV9kOQAjOo5 qnJ4emX97XZnADTVux9KQh7qAc3HrdR3QyKkNkUwL/BWuCegXKMY/vxdlGMsWla77N7a AYqhd2CXJkJ3Rel/Hqh53zpDDEFFMI8eMAtbWLnJGT+Usi7wuU2uLfe/eA9aycXGRYmU jjJ/xE2p5DyJbwjN6UcVHIrKLWPZDSsHf/ba79IYkg/OGIcrXhYX7TAwamsUELZDLEB1 OPo09GDhmDJ3Ek1iqaHEgWQ8ZLDpls0VMBRNFarBau2zpILr52Lw2ct9qYTCwt5EtcXc k4kQ== X-Gm-Message-State: AOJu0YxZSSH1rsO0A9E8Ag2LB34aJ6iZW+bVJVsUF/A90/QAvoG7OEfd 9R0sDBz5qa8SXE++/HZFg6KgaeYdQOfkuYoPYd4Z6x6F2hs2m1vdRhmpthFvQA== X-Google-Smtp-Source: AGHT+IGA2v+89jyF0X8D3u3SxFX0j0vKJKl1dm++x0e97zdt1uMZEVyd/rlSAO0QE/J1SEebkRxdKw== X-Received: by 2002:a17:906:646:b0:a2e:d32b:6fc7 with SMTP id t6-20020a170906064600b00a2ed32b6fc7mr117745ejb.119.1706031806358; Tue, 23 Jan 2024 09:43:26 -0800 (PST) Received: from [10.137.0.18] (ip-86-49-229-30.bb.vodafone.cz. [86.49.229.30]) by smtp.gmail.com with ESMTPSA id i3-20020a1709064fc300b00a2e33a6cbecsm9600700ejw.18.2024.01.23.09.43.25 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 23 Jan 2024 09:43:26 -0800 (PST) Message-ID: <56176b8d-956c-487e-ab09-310db4581c07@enterprisedb.com> Date: Tue, 23 Jan 2024 18:43:25 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: index prefetching To: Melanie Plageman Cc: Robert Haas , Andres Freund , PostgreSQL Hackers , Georgios References: <8ec36f51-b863-60e3-20e2-b9c981c5ce5e@enterprisedb.com> <06bb7d02-2c44-3062-731e-a735ba13da7e@enterprisedb.com> <367160ea-b1ed-4481-e804-bca509128878@enterprisedb.com> <280dc83c-a16f-4424-1319-95e7e3f798bd@enterprisedb.com> <98ba4b25-fae8-c1f4-1597-8093375a1986@enterprisedb.com> <20231221134314.wf2rs62d37u62j7t@alap3.anarazel.de> <20231221154352.ijtg6wloa3nowivh@alap3.anarazel.de> <482ec3ff-52ad-415d-96fd-f3832a894023@enterprisedb.com> Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 1/19/24 22:43, Melanie Plageman wrote: > On Fri, Jan 12, 2024 at 11:42 AM Tomas Vondra > wrote: >> >> On 1/9/24 21:31, Robert Haas wrote: >>> On Thu, Jan 4, 2024 at 9:55 AM Tomas Vondra >>> wrote: >>>> Here's a somewhat reworked version of the patch. My initial goal was to >>>> see if it could adopt the StreamingRead API proposed in [1], but that >>>> turned out to be less straight-forward than I hoped, for two reasons: >>> >>> I guess we need Thomas or Andres or maybe Melanie to comment on this. >>> >> >> Yeah. Or maybe Thomas if he has thoughts on how to combine this with the >> streaming I/O stuff. > > I've been studying your patch with the intent of finding a way to > change it and or the streaming read API to work together. I've > attached a very rough sketch of how I think it could work. > Thanks. > We fill a queue with blocks from TIDs that we fetched from the index. > The queue is saved in a scan descriptor that is made available to the > streaming read callback. Once the queue is full, we invoke the table > AM specific index_fetch_tuple() function which calls > pg_streaming_read_buffer_get_next(). When the streaming read API > invokes the callback we registered, it simply dequeues a block number > for prefetching. So in a way there are two queues in IndexFetchTableData. One (blk_queue) is being filled from IndexNext, and then the queue in StreamingRead. > The only change to the streaming read API is that now, even if the > callback returns InvalidBlockNumber, we may not be finished, so make > it resumable. > Hmm, not sure when can the callback return InvalidBlockNumber before reaching the end. Perhaps for the first index_fetch_heap call? Any reason not to fill the blk_queue before calling index_fetch_heap? > Structurally, this changes the timing of when the heap blocks are > prefetched. Your code would get a tid from the index and then prefetch > the heap block -- doing this until it filled a queue that had the > actual tids saved in it. With my approach and the streaming read API, > you fetch tids from the index until you've filled up a queue of block > numbers. Then the streaming read API will prefetch those heap blocks. > And is that a good/desirable change? I'm not saying it's not, but maybe we should not be filling either queue in one go - we don't want to overload the prefetching. > I didn't actually implement the block queue -- I just saved a single > block number and pretended it was a block queue. I was imagining we > replace this with something like your IndexPrefetch->blockItems -- > which has light deduplication. We'd probably have to flesh it out more > than that. > I don't understand how this passes the TID to the index_fetch_heap. Isn't it working only by accident, due to blk_queue only having a single entry? Shouldn't the first queue (blk_queue) store TIDs instead? > There are also table AM layering violations in my sketch which would > have to be worked out (not to mention some resource leakage I didn't > bother investigating [which causes it to fail tests]). > > 0001 is all of Thomas' streaming read API code that isn't yet in > master and 0002 is my rough sketch of index prefetching using the > streaming read API > > There are also numerous optimizations that your index prefetching > patch set does that would need to be added in some way. I haven't > thought much about it yet. I wanted to see what you thought of this > approach first. Basically, is it workable? > It seems workable, yes. I'm not sure it's much simpler than my patch (considering a lot of the code is in the optimizations, which are missing from this patch). I think the question is where should the optimizations happen. I suppose some of them might/should happen in the StreamingRead API itself - like the detection of sequential patterns, recently prefetched blocks, ... But I'm not sure what to do about optimizations that are more specific to the access path. Consider for example the index-only scans. We don't want to prefetch all the pages, we need to inspect the VM and prefetch just the not-all-visible ones. And then pass the info to the index scan, so that it does not need to check the VM again. It's not clear to me how to do this with this approach. The main -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company