Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1scsdu-00B8eW-NZ for pgsql-hackers@arkaria.postgresql.org; Sat, 10 Aug 2024 20:24:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1scsdt-00AOzW-DY for pgsql-hackers@arkaria.postgresql.org; Sat, 10 Aug 2024 20:24:49 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1scsds-00AOzO-VT for pgsql-hackers@lists.postgresql.org; Sat, 10 Aug 2024 20:24:49 +0000 Received: from mail.postgrespro.ru ([93.174.131.139]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1scsdq-004B7Q-A7 for pgsql-hackers@postgresql.org; Sat, 10 Aug 2024 20:24:48 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=postgrespro.ru; s=mx2023; t=1723321484; bh=yd6YgCpTQUNRcLDCLg1nK65mYfoP4IA3gqYxYHZKr6w=; h=Message-ID:Date:User-Agent:Subject:To:References:From:In-Reply-To: From; b=IvutCHEr5lDoWBqJGq326TcKvxPr3H2Rp9kDBV0RVvhlaO1yiOK5NueEKVIQNyig8 VP4pku1LK/hT8iooes6AXYZLSRzYyuTu7f+rJ8DpizPY2ACLg8Wjhaa7NsP8EqTFzI FeeW2JsAEO6lw2y3g4Q3pvq5jnFcNBR50PdeiY/F9nFGtKPG4LxU1lm4yQaikCLGrp OWCNXeAsLJ37HOHohLxILBNpM5VP5VB5PseywGdxf8ShVewKb49bHTDvede76N1Mjj catRU4Au1snhr1ONDG30z/2qVqOCiQtSPh7nfuQ1YqxxzEibXM3OpKnfF4B4mklMyP HOw9gHtu7u8cA== Received: from [172.20.10.4] (unknown [83.220.238.78]) (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (Client did not present a certificate) (Authenticated sender: a.rybakina@postgrespro.ru) by mail.postgrespro.ru (Postfix/587) with ESMTPSA id 6447A60A07; Sat, 10 Aug 2024 23:24:44 +0300 (MSK) Message-ID: <764dd8b8-6374-4f5a-aac7-d8e3f6ebe5fd@postgrespro.ru> Date: Sat, 10 Aug 2024 23:24:43 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Asynchronous MergeAppend To: Alexander Pyhalov , Pgsql Hackers References: <59be194c5a409fb9fc9f2031581b8a44@postgrespro.ru> Content-Language: en-US From: Alena Rybakina In-Reply-To: <59be194c5a409fb9fc9f2031581b8a44@postgrespro.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-KSMG-AntiPhishing: NotDetected, bases: 2024/08/10 19:49:00 X-KSMG-AntiSpam-Interceptor-Info: not scanned X-KSMG-AntiSpam-Status: not scanned, disabled by settings X-KSMG-AntiVirus: Kaspersky Secure Mail Gateway, version 2.1.0.7854, bases: 2024/08/10 16:57:00 #26271445 X-KSMG-AntiVirus-Status: NotDetected, skipped X-KSMG-LinksScanning: not scanned, disabled by settings X-KSMG-Message-Action: skipped X-KSMG-Rule-ID: 1 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi! Thank you for your work on this subject! I think this is a very useful optimization) While looking through your code, I noticed some points that I think should be taken into account. Firstly, I noticed only two tests to verify the functionality of this function and I think that this is not enough. Are you thinking about adding some tests with queries involving, for example, join connections with different tables and unusual operators? In addition, I have a question about testing your feature on a benchmark. Are you going to do this? On 17.07.2024 16:24, Alexander Pyhalov wrote: > Hello. > > I'd like to make MergeAppend node Async-capable like Append node. > Nowadays when planner chooses MergeAppend plan, asynchronous execution > is not possible. With attached patches you can see plans like > > EXPLAIN (VERBOSE, COSTS OFF) > SELECT * FROM async_pt WHERE b % 100 = 0 ORDER BY b, a; >                                                           QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------ > >  Merge Append >    Sort Key: async_pt.b, async_pt.a >    ->  Async Foreign Scan on public.async_p1 async_pt_1 >          Output: async_pt_1.a, async_pt_1.b, async_pt_1.c >          Remote SQL: SELECT a, b, c FROM public.base_tbl1 WHERE (((b % > 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST >    ->  Async Foreign Scan on public.async_p2 async_pt_2 >          Output: async_pt_2.a, async_pt_2.b, async_pt_2.c >          Remote SQL: SELECT a, b, c FROM public.base_tbl2 WHERE (((b % > 100) = 0)) ORDER BY b ASC NULLS LAST, a ASC NULLS LAST > > This can be quite profitable (in our test cases you can gain up to two > times better speed with MergeAppend async execution on remote servers). > > Code for asynchronous execution in Merge Append was mostly borrowed > from Append node. > > What significantly differs - in ExecMergeAppendAsyncGetNext() you must > return tuple from the specified slot. > Subplan number determines tuple slot where data should be retrieved > to. When subplan is ready to provide some data, > it's cached in ms_asyncresults. When we get tuple for subplan, > specified in ExecMergeAppendAsyncGetNext(), > ExecMergeAppendAsyncRequest() returns true and loop in > ExecMergeAppendAsyncGetNext() ends. We can fetch data for > subplans which either don't have cached result ready or have already > returned them to the upper node. This > flag is stored in ms_has_asyncresults. As we can get data for some > subplan either earlier or after loop in ExecMergeAppendAsyncRequest(), > we check this flag twice in this function. > Unlike ExecAppendAsyncEventWait(), it seems > ExecMergeAppendAsyncEventWait() doesn't need a timeout - as there's no > need to get result > from synchronous subplan if a tuple form async one was explicitly > requested. > > Also we had to fix postgres_fdw to avoid directly looking at Append > fields. Perhaps, accesors to Append fields look strange, but allows > to avoid some code duplication. I suppose, duplication could be even > less if we reworked async Append implementation, but so far I haven't > tried to do this to avoid big diff from master. > > Also mark_async_capable() believes that path corresponds to plan. This > can be not true when create_[merge_]append_plan() inserts sort node. > In this case mark_async_capable() can treat Sort plan node as some > other and crash, so there's a small fix for this. I think you should add this explanation to the commit message because without it it's hard to understand the full picture of how your code works. -- Regards, Alena Rybakina Postgres Professional: http://www.postgrespro.com The Russian Postgres Company