Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rbTDP-00FZej-Ay for pgsql-hackers@arkaria.postgresql.org; Sat, 17 Feb 2024 22:31:23 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rbTDM-008OY3-IR for pgsql-hackers@arkaria.postgresql.org; Sat, 17 Feb 2024 22:31:20 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rbTDM-008OXv-6d for pgsql-hackers@lists.postgresql.org; Sat, 17 Feb 2024 22:31:20 +0000 Received: from mail-lj1-x230.google.com ([2a00:1450:4864:20::230]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rbTDF-007jSa-AI for pgsql-hackers@lists.postgresql.org; Sat, 17 Feb 2024 22:31:18 +0000 Received: by mail-lj1-x230.google.com with SMTP id 38308e7fff4ca-2d09cf00214so25902631fa.0 for ; Sat, 17 Feb 2024 14:31:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=enterprisedb.com; s=google; t=1708209071; x=1708813871; darn=lists.postgresql.org; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=kdfFVERNLLnfpzGheuFfuFGdkbB4yrhLd7eLo4E2NgA=; b=RpxMuJQpKIZzgIv2sqX+YqZmhq7uhCaCmOpEPPoUltkyCm9rhUbsKMRE7HBU8q4qdo 8rDmgnSla3hFAnjz0EE6JfyXRFMXSPaSQ7eIX17wSOVnQkf/xmBtC4F14pb6PCOmdaje imne1PYDpN3ctzioZUjHFVMiA3ihDWRMItkqYTVFTiXYhMvft5WERrRGrpjcsZ7E0zGf zKN+PONoTMImlqzKJMyUNUyyyeWxRa3OUgzr7eZ0/t1w3Zq3cDFEMdlaoJzh7rdc/Gu8 0gwqtmddudrqYN+lHmaLDRFCpVK7ilrVK6zSVMUCmEwTFPzcZwHSBVgJ3+LvYDtgiEU4 K9Mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708209071; x=1708813871; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kdfFVERNLLnfpzGheuFfuFGdkbB4yrhLd7eLo4E2NgA=; b=lPFghfpQWGiav7hfs+rVScXF4GviYzpEmkqKKHU/VF9uQRHlWNZep14iUXFBvzpO/Q Cle3Avku6kA89/OsE9WrnoCMGiwdID0PkT4VbAoMT1dlOcYAftm6SBVThaFNhMb1HnGj caPy4sOoPSKF7zR0WYjh7k2u8gWDjQi7s+xc44AJXjCY/Sa2vI9tpuxwSo2bEkrmhp8t U5aWC0F8RxSDO/mCngvyYdQGwxamkTbGhhX5TPw8E+BdEeuwD6+t1i6zqsAnJeznb9g4 NcnzC7AwpaD+Ozc1CbZNBngfAxwVeOccuxvLAbPOAAilZXaUSdhO4OJHOUNxjSUcCfhu CdHg== X-Gm-Message-State: AOJu0YznsrXSLJjOFc1dae3mgBkoG5kzpxHg2TeefXTOA/JtSGJ9NGxk VYkLMI/udIvQ6dToAUbOapiAYlp1XRFsexQSory89c/fIECS3Prv7e/wU1GJMNMcjdy1VIPsU0s = X-Google-Smtp-Source: AGHT+IFDes2gfCi5MX6s62iFEefjGRBZsLCKDvyWahCoV8tJnVswv7HkUMctSZwL5X8/SgwNuWYBWQ== X-Received: by 2002:a2e:b1d2:0:b0:2d0:b02d:f6f1 with SMTP id e18-20020a2eb1d2000000b002d0b02df6f1mr5125473lja.43.1708209071028; Sat, 17 Feb 2024 14:31:11 -0800 (PST) Received: from [10.137.0.18] (ip-86-49-229-30.bb.vodafone.cz. [86.49.229.30]) by smtp.gmail.com with ESMTPSA id f9-20020a05600c44c900b00411e3cc0e0asm6265052wmo.44.2024.02.17.14.31.10 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 17 Feb 2024 14:31:10 -0800 (PST) Message-ID: Date: Sat, 17 Feb 2024 23:31:09 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE Content-Language: en-US To: Dilip Kumar , David Geier Cc: PostgreSQL Developers References: From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi David, Do you plan to work continue working on this patch? I did take a look, and on the whole it looks reasonable - it modifies the right places etc. I think there are two things that may need an improvement: 1) Storing variable-length data in ParallelBitmapHeapState I agree with Robert the snapshot_and_stats name is not great. I see Dmitry mentioned phs_snapshot_off as used by ParallelTableScanDescData - the reasons are somewhat different (phs_snapshot_off exists because we don't know which exact struct will be allocated), while here we simply need to allocate two variable-length pieces of memory. But it seems like it would work nicely for this. That is, we could try adding an offset for each of those pieces of memory: - snapshot_off - stats_off I don't like the GetSharedSnapshotData name very much, it seems very close to GetSnapshotData - quite confusing, I think. Dmitry also suggested we might add a separate piece of shared memory. I don't quite see how that would work for ParallelBitmapHeapState, but I doubt it'd be simpler than having two offsets. I don't think the extra complexity (paid by everyone) would be worth it just to make EXPLAIN ANALYZE work. 2) Leader vs. worker counters It seems to me this does nothing to add the per-worker values from "Heap Blocks" into the leader, which means we get stuff like this: Heap Blocks: exact=102 lossy=10995 Worker 0: actual time=50.559..209.773 rows=215253 loops=1 Heap Blocks: exact=207 lossy=19354 Worker 1: actual time=50.543..211.387 rows=162934 loops=1 Heap Blocks: exact=161 lossy=14636 I think this is wrong / confusing, and inconsistent with what we do for other nodes. It's also inconsistent with how we deal e.g. with BUFFERS, where we *do* add the values to the leader: Heap Blocks: exact=125 lossy=10789 Buffers: shared hit=11 read=45420 Worker 0: actual time=51.419..221.904 rows=150437 loops=1 Heap Blocks: exact=136 lossy=13541 Buffers: shared hit=4 read=13541 Worker 1: actual time=56.610..222.469 rows=229738 loops=1 Heap Blocks: exact=209 lossy=20655 Buffers: shared hit=4 read=20655 Here it's not entirely obvious, because leader participates in the execution, but once we disable leader participation, it's clearer: Buffers: shared hit=7 read=45421 Worker 0: actual time=28.540..247.683 rows=309112 loops=1 Heap Blocks: exact=282 lossy=27806 Buffers: shared hit=4 read=28241 Worker 1: actual time=24.290..251.993 rows=190815 loops=1 Heap Blocks: exact=188 lossy=17179 Buffers: shared hit=3 read=17180 Not only is "Buffers" clearly a sum of per-worker stats, but the "Heap Blocks" simply disappeared because the leader does nothing and we don't print zeros. 3) I'm not sure dealing with various EXPLAIN flags may not be entirely correct. Consider this: EXPLAIN (ANALYZE): -> Parallel Bitmap Heap Scan on t (...) Recheck Cond: (a < 5000) Rows Removed by Index Recheck: 246882 Worker 0: Heap Blocks: exact=168 lossy=15648 Worker 1: Heap Blocks: exact=302 lossy=29337 EXPLAIN (ANALYZE, VERBOSE): -> Parallel Bitmap Heap Scan on public.t (...) Recheck Cond: (t.a < 5000) Rows Removed by Index Recheck: 246882 Worker 0: actual time=35.067..300.882 rows=282108 loops=1 Heap Blocks: exact=257 lossy=25358 Worker 1: actual time=32.827..302.224 rows=217819 loops=1 Heap Blocks: exact=213 lossy=19627 EXPLAIN (ANALYZE, BUFFERS): -> Parallel Bitmap Heap Scan on t (...) Recheck Cond: (a < 5000) Rows Removed by Index Recheck: 246882 Buffers: shared hit=7 read=45421 Worker 0: Heap Blocks: exact=236 lossy=21870 Worker 1: Heap Blocks: exact=234 lossy=23115 EXPLAIN (ANALYZE, VERBOSE, BUFFERS): -> Parallel Bitmap Heap Scan on public.t (...) Recheck Cond: (t.a < 5000) Rows Removed by Index Recheck: 246882 Buffers: shared hit=7 read=45421 Worker 0: actual time=28.265..260.381 rows=261264 loops=1 Heap Blocks: exact=260 lossy=23477 Buffers: shared hit=3 read=23478 Worker 1: actual time=28.224..261.627 rows=238663 loops=1 Heap Blocks: exact=210 lossy=21508 Buffers: shared hit=4 read=21943 Why should the per-worker buffer info be shown when combined with the VERBOSE flag, and not just with BUFFERS, when the patch shows the per-worker info always? 4) Now that I think about this, isn't the *main* problem really that we don't display the sum of the per-worker stats (which I think is wrong)? I mean, we already can get the worker details VERBOSEm right? So the only reason to display that by default seems to be that it the values in "Heap Blocks" are from the leader only. BTW doesn't this also suggest some of the code added to explain.c may not be quite necessary? Wouldn't it be enough to just "extend" the existing code printing per-worker stats. (I haven't tried, so maybe I'm wrong and we need the new code.) regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company