public inbox for [email protected]
help / color / mirror / Atom feedFrom: David Rowley <[email protected]>
To: Tomas Vondra <[email protected]>
Cc: Dilip Kumar <[email protected]>
Cc: David Geier <[email protected]>
Cc: PostgreSQL Developers <[email protected]>
Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Date: Mon, 8 Jul 2024 15:43:01 +1200
Message-ID: <CAApHDvqFtd-9DYH70sbjD7iB-Eq-xSip1LPr=nayfpPd1pkZVw@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<CAFiTN-v1yDvU=X+hwfJ+55=sbgDj=_kuvbduEG-F7=BjpWcnuw@mail.gmail.com>
<[email protected]>
On Sun, 18 Feb 2024 at 11:31, Tomas Vondra
<[email protected]> wrote:
> 2) Leader vs. worker counters
>
> It seems to me this does nothing to add the per-worker values from "Heap
> Blocks" into the leader, which means we get stuff like this:
>
> Heap Blocks: exact=102 lossy=10995
> Worker 0: actual time=50.559..209.773 rows=215253 loops=1
> Heap Blocks: exact=207 lossy=19354
> Worker 1: actual time=50.543..211.387 rows=162934 loops=1
> Heap Blocks: exact=161 lossy=14636
>
> I think this is wrong / confusing, and inconsistent with what we do for
> other nodes.
Are you able to share which other nodes that you mean here?
I used the following to compare to Sort and Memoize, and as far as I
see, the behaviour matches with the attached v8 patch.
Is there some inconsistency here that I'm not seeing?
create table mill (a int);
create index on mill(a);
insert into mill select x%1000 from generate_Series(1,10000000)x;
vacuum analyze mill;
create table big (a int primary key);
insert into big select x from generate_series(1,10000000)x;
create table probe (a int);
insert into probe select 1 from generate_Series(1,1000000);
analyze big
analyze probe;
set parallel_tuple_cost=0;
set parallel_setup_cost=0;
set enable_indexscan=0;
-- compare Parallel Bitmap Heap Scan with Memoize and Sort.
-- each includes "Worker N:" with stats for the operation.
explain (analyze) select * from mill where a < 100;
explain (analyze) select * from big b inner join probe p on b.a=p.a;
explain (analyze) select * from probe order by a;
-- each includes "Worker N:" with stats for the operation
-- also includes actual time and rows for each worker.
explain (analyze, verbose) select * from mill where a < 100;
explain (analyze, verbose) select * from big b inner join probe p on b.a=p.a;
explain (analyze, verbose) select * from probe order by a;
-- each includes "Worker N:" with stats for the operation
-- shows a single total buffers which includes leader and worker buffers.
explain (analyze, buffers) select * from mill where a < 100;
explain (analyze, buffers) select * from big b inner join probe p on b.a=p.a;
explain (analyze, buffers) select * from probe order by a;
-- each includes "Worker N:" with stats for the operation
-- also includes actual time and rows for each worker.
-- shows a single total buffers which includes leader and worker buffers.
-- shows buffer counts for each worker process
explain (analyze, buffers, verbose) select * from mill where a < 100;
explain (analyze, buffers, verbose) select * from big b inner join
probe p on b.a=p.a;
explain (analyze, buffers, verbose) select * from probe order by a;
If we did want to adjust things to show the totals for each worker
rather than the stats for the leader, what would Sort Method show if
one worker spilled to disk and another did not?
David
Attachments:
[application/octet-stream] v8-0001-Show-Parallel-Bitmap-Heap-Scan-worker-stats-in-EX.patch (14.5K, 2-v8-0001-Show-Parallel-Bitmap-Heap-Scan-worker-stats-in-EX.patch)
download | inline diff:
From 5cd0f8bcb0518ddc999a9cfc9aa488b0817228da Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Tue, 8 Nov 2022 19:40:31 +0100
Subject: [PATCH v8] Show Parallel Bitmap Heap Scan worker stats in EXPLAIN
ANALYZE
Nodes like Memoize report the cache stats for each parallel worker, so it
makes sense to show the exact and lossy pages in Parallel Bitmap Heap Scan
in a similar way. Likewise, Sort shows the method and memory used for
each worker.
Author: David Geier <[email protected]>
Author: Heikki Linnakangas <[email protected]>
Author: Donghang Lin <[email protected]>
Author: Alena Rybakina <[email protected]>
Author: David Rowley <[email protected]>
Reviewed-by: Dmitry Dolgov <[email protected]>
Reviewed-by: Michael Christofides <[email protected]>
Reviewed-by: Robert Haas <[email protected]>
Reviewed-by: Dilip Kumar <[email protected]>
Reviewed-by: Tomas Vondra <[email protected]>
Reviewed-by: Melanie Plageman <[email protected]>
Reviewed-by: Donghang Lin <[email protected]>
Reviewed-by: Masahiro Ikeda <[email protected]>
Discussion: https://postgr.es/m/b3d80961-c2e5-38cc-6a32-61886cdf766d%40gmail.com
---
src/backend/commands/explain.c | 58 +++++++++---
src/backend/executor/execParallel.c | 3 +
src/backend/executor/nodeBitmapHeapscan.c | 105 ++++++++++++++++++++--
src/include/executor/nodeBitmapHeapscan.h | 1 +
src/include/nodes/execnodes.h | 35 +++++++-
src/tools/pgindent/typedefs.list | 2 +
6 files changed, 181 insertions(+), 23 deletions(-)
diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index 6defd26df5..118db12903 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -2010,8 +2010,7 @@ ExplainNode(PlanState *planstate, List *ancestors,
if (plan->qual)
show_instrumentation_count("Rows Removed by Filter", 1,
planstate, es);
- if (es->analyze)
- show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
+ show_tidbitmap_info((BitmapHeapScanState *) planstate, es);
break;
case T_SampleScan:
show_tablesample(((SampleScan *) plan)->tablesample,
@@ -3628,31 +3627,70 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
}
/*
- * If it's EXPLAIN ANALYZE, show exact/lossy pages for a BitmapHeapScan node
+ * Show exact/lossy pages for a BitmapHeapScan node
*/
static void
show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
{
+ if (!es->analyze)
+ return;
+
if (es->format != EXPLAIN_FORMAT_TEXT)
{
ExplainPropertyUInteger("Exact Heap Blocks", NULL,
- planstate->exact_pages, es);
+ planstate->stats.exact_pages, es);
ExplainPropertyUInteger("Lossy Heap Blocks", NULL,
- planstate->lossy_pages, es);
+ planstate->stats.lossy_pages, es);
}
else
{
- if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
+ if (planstate->stats.exact_pages > 0 || planstate->stats.lossy_pages > 0)
{
ExplainIndentText(es);
appendStringInfoString(es->str, "Heap Blocks:");
- if (planstate->exact_pages > 0)
- appendStringInfo(es->str, " exact=" UINT64_FORMAT, planstate->exact_pages);
- if (planstate->lossy_pages > 0)
- appendStringInfo(es->str, " lossy=" UINT64_FORMAT, planstate->lossy_pages);
+ if (planstate->stats.exact_pages > 0)
+ appendStringInfo(es->str, " exact=" UINT64_FORMAT, planstate->stats.exact_pages);
+ if (planstate->stats.lossy_pages > 0)
+ appendStringInfo(es->str, " lossy=" UINT64_FORMAT, planstate->stats.lossy_pages);
appendStringInfoChar(es->str, '\n');
}
}
+
+ /* Display stats for each parallel worker */
+ if (planstate->pstate != NULL)
+ {
+ for (int n = 0; n < planstate->sinstrument->num_workers; n++)
+ {
+ BitmapHeapScanInstrumentation *si = &planstate->sinstrument->sinstrument[n];
+
+ if (si->exact_pages == 0 && si->lossy_pages == 0)
+ continue;
+
+ if (es->workers_state)
+ ExplainOpenWorker(n, es);
+
+ if (es->format == EXPLAIN_FORMAT_TEXT)
+ {
+ ExplainIndentText(es);
+ appendStringInfoString(es->str, "Heap Blocks:");
+ if (si->exact_pages > 0)
+ appendStringInfo(es->str, " exact=" UINT64_FORMAT, si->exact_pages);
+ if (si->lossy_pages > 0)
+ appendStringInfo(es->str, " lossy=" UINT64_FORMAT, si->lossy_pages);
+ appendStringInfoChar(es->str, '\n');
+ }
+ else
+ {
+ ExplainPropertyUInteger("Exact Heap Blocks", NULL,
+ si->exact_pages, es);
+ ExplainPropertyUInteger("Lossy Heap Blocks", NULL,
+ si->lossy_pages, es);
+ }
+
+ if (es->workers_state)
+ ExplainCloseWorker(n, es);
+ }
+ }
}
/*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 8c53d1834e..bfb3419efb 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1076,6 +1076,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
case T_MemoizeState:
ExecMemoizeRetrieveInstrumentation((MemoizeState *) planstate);
break;
+ case T_BitmapHeapScanState:
+ ExecBitmapHeapRetrieveInstrumentation((BitmapHeapScanState *) planstate);
+ break;
default:
break;
}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index 6b48a6d835..3c63bdd93d 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -236,9 +236,9 @@ BitmapHeapNext(BitmapHeapScanState *node)
valid_block = table_scan_bitmap_next_block(scan, tbmres);
if (tbmres->ntuples >= 0)
- node->exact_pages++;
+ node->stats.exact_pages++;
else
- node->lossy_pages++;
+ node->stats.lossy_pages++;
if (!valid_block)
{
@@ -627,6 +627,29 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
{
TableScanDesc scanDesc;
+ /*
+ * When ending a parallel worker, copy the statistics gathered by the
+ * worker back into shared memory so that it can be picked up by the main
+ * process to report in EXPLAIN ANALYZE.
+ */
+ if (node->sinstrument != NULL && IsParallelWorker())
+ {
+ BitmapHeapScanInstrumentation *si;
+
+ Assert(ParallelWorkerNumber <= node->sinstrument->num_workers);
+ si = &node->sinstrument->sinstrument[ParallelWorkerNumber];
+
+ /*
+ * Here we accumulate the stats rather than performing memcpy on
+ * node->stats into si. When a Gather/GatherMerge node finishes it
+ * will perform planner shutdown on the workers. On rescan it will
+ * spin up new workers which will have a new BitmapHeapScanState and
+ * zeroed stats.
+ */
+ si->exact_pages += node->stats.exact_pages;
+ si->lossy_pages += node->stats.lossy_pages;
+ }
+
/*
* extract information from the node
*/
@@ -694,8 +717,10 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
scanstate->tbmiterator = NULL;
scanstate->tbmres = NULL;
scanstate->pvmbuffer = InvalidBuffer;
- scanstate->exact_pages = 0;
- scanstate->lossy_pages = 0;
+
+ /* Zero the statistics counters */
+ memset(&scanstate->stats, 0, sizeof(BitmapHeapScanInstrumentation));
+
scanstate->prefetch_iterator = NULL;
scanstate->prefetch_pages = 0;
scanstate->prefetch_target = 0;
@@ -803,7 +828,18 @@ void
ExecBitmapHeapEstimate(BitmapHeapScanState *node,
ParallelContext *pcxt)
{
- shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelBitmapHeapState));
+ Size size;
+
+ size = MAXALIGN(sizeof(ParallelBitmapHeapState));
+
+ /* account for instrumentation, if required */
+ if (node->ss.ps.instrument && pcxt->nworkers > 0)
+ {
+ size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument));
+ size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
+ }
+
+ shm_toc_estimate_chunk(&pcxt->estimator, size);
shm_toc_estimate_keys(&pcxt->estimator, 1);
}
@@ -818,13 +854,27 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
ParallelContext *pcxt)
{
ParallelBitmapHeapState *pstate;
+ SharedBitmapHeapInstrumentation *sinstrument = NULL;
dsa_area *dsa = node->ss.ps.state->es_query_dsa;
+ char *ptr;
+ Size size;
/* If there's no DSA, there are no workers; initialize nothing. */
if (dsa == NULL)
return;
- pstate = shm_toc_allocate(pcxt->toc, sizeof(ParallelBitmapHeapState));
+ size = MAXALIGN(sizeof(ParallelBitmapHeapState));
+ if (node->ss.ps.instrument && pcxt->nworkers > 0)
+ {
+ size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument));
+ size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
+ }
+
+ ptr = shm_toc_allocate(pcxt->toc, size);
+ pstate = (ParallelBitmapHeapState *) ptr;
+ ptr += MAXALIGN(sizeof(ParallelBitmapHeapState));
+ if (node->ss.ps.instrument && pcxt->nworkers > 0)
+ sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
pstate->tbmiterator = 0;
pstate->prefetch_iterator = 0;
@@ -837,8 +887,18 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
ConditionVariableInit(&pstate->cv);
+ if (sinstrument)
+ {
+ sinstrument->num_workers = pcxt->nworkers;
+
+ /* ensure any unfilled slots will contain zeroes */
+ memset(sinstrument->sinstrument, 0,
+ pcxt->nworkers * sizeof(BitmapHeapScanInstrumentation));
+ }
+
shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pstate);
node->pstate = pstate;
+ node->sinstrument = sinstrument;
}
/* ----------------------------------------------------------------
@@ -880,10 +940,37 @@ void
ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
ParallelWorkerContext *pwcxt)
{
- ParallelBitmapHeapState *pstate;
+ char *ptr;
Assert(node->ss.ps.state->es_query_dsa != NULL);
- pstate = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
- node->pstate = pstate;
+ ptr = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+ node->pstate = (ParallelBitmapHeapState *) ptr;
+ ptr += MAXALIGN(sizeof(ParallelBitmapHeapState));
+
+ if (node->ss.ps.instrument)
+ node->sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
+}
+
+/* ----------------------------------------------------------------
+ * ExecBitmapHeapRetrieveInstrumentation
+ *
+ * Transfer bitmap heap scan statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBitmapHeapRetrieveInstrumentation(BitmapHeapScanState *node)
+{
+ SharedBitmapHeapInstrumentation *sinstrument = node->sinstrument;
+ Size size;
+
+ if (sinstrument == NULL)
+ return;
+
+ size = offsetof(SharedBitmapHeapInstrumentation, sinstrument)
+ + sinstrument->num_workers * sizeof(BitmapHeapScanInstrumentation);
+
+ node->sinstrument = palloc(size);
+ memcpy(node->sinstrument, sinstrument, size);
}
diff --git a/src/include/executor/nodeBitmapHeapscan.h b/src/include/executor/nodeBitmapHeapscan.h
index ea003a9caa..446a664590 100644
--- a/src/include/executor/nodeBitmapHeapscan.h
+++ b/src/include/executor/nodeBitmapHeapscan.h
@@ -28,5 +28,6 @@ extern void ExecBitmapHeapReInitializeDSM(BitmapHeapScanState *node,
ParallelContext *pcxt);
extern void ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
ParallelWorkerContext *pwcxt);
+extern void ExecBitmapHeapRetrieveInstrumentation(BitmapHeapScanState *node);
#endif /* NODEBITMAPHEAPSCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index abfcd5f590..cac684d9b3 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1746,6 +1746,19 @@ typedef struct BitmapIndexScanState
struct IndexScanDescData *biss_ScanDesc;
} BitmapIndexScanState;
+/* ----------------
+ * BitmapHeapScanInstrumentation information
+ *
+ * exact_pages total number of exact pages retrieved
+ * lossy_pages total number of lossy pages retrieved
+ * ----------------
+ */
+typedef struct BitmapHeapScanInstrumentation
+{
+ uint64 exact_pages;
+ uint64 lossy_pages;
+} BitmapHeapScanInstrumentation;
+
/* ----------------
* SharedBitmapState information
*
@@ -1789,6 +1802,20 @@ typedef struct ParallelBitmapHeapState
ConditionVariable cv;
} ParallelBitmapHeapState;
+/* ----------------
+ * Instrumentation data for a parallel bitmap heap scan.
+ *
+ * A shared memory struct that each parallel worker copies its
+ * BitmapHeapScanInstrumentation information into at executor shutdown to
+ * allow the leader to display the information in EXPLAIN ANALYZE.
+ * ----------------
+ */
+typedef struct SharedBitmapHeapInstrumentation
+{
+ int num_workers;
+ BitmapHeapScanInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedBitmapHeapInstrumentation;
+
/* ----------------
* BitmapHeapScanState information
*
@@ -1797,8 +1824,7 @@ typedef struct ParallelBitmapHeapState
* tbmiterator iterator for scanning current pages
* tbmres current-page data
* pvmbuffer buffer for visibility-map lookups of prefetched pages
- * exact_pages total number of exact pages retrieved
- * lossy_pages total number of lossy pages retrieved
+ * stats execution statistics
* prefetch_iterator iterator for prefetching ahead of current page
* prefetch_pages # pages prefetch iterator is ahead of current
* prefetch_target current target prefetch distance
@@ -1807,6 +1833,7 @@ typedef struct ParallelBitmapHeapState
* shared_tbmiterator shared iterator
* shared_prefetch_iterator shared iterator for prefetching
* pstate shared state for parallel bitmap scan
+ * sinstrument statistics for parallel workers
* ----------------
*/
typedef struct BitmapHeapScanState
@@ -1817,8 +1844,7 @@ typedef struct BitmapHeapScanState
TBMIterator *tbmiterator;
TBMIterateResult *tbmres;
Buffer pvmbuffer;
- uint64 exact_pages;
- uint64 lossy_pages;
+ BitmapHeapScanInstrumentation stats;
TBMIterator *prefetch_iterator;
int prefetch_pages;
int prefetch_target;
@@ -1827,6 +1853,7 @@ typedef struct BitmapHeapScanState
TBMSharedIterator *shared_tbmiterator;
TBMSharedIterator *shared_prefetch_iterator;
ParallelBitmapHeapState *pstate;
+ SharedBitmapHeapInstrumentation *sinstrument;
} BitmapHeapScanState;
/* ----------------
diff --git a/src/tools/pgindent/typedefs.list b/src/tools/pgindent/typedefs.list
index 9320e4d808..635e6d6e21 100644
--- a/src/tools/pgindent/typedefs.list
+++ b/src/tools/pgindent/typedefs.list
@@ -262,6 +262,7 @@ BitmapAndPath
BitmapAndState
BitmapHeapPath
BitmapHeapScan
+BitmapHeapScanInstrumentation
BitmapHeapScanState
BitmapIndexScan
BitmapIndexScanState
@@ -2603,6 +2604,7 @@ SetToDefault
SetupWorkerPtrType
ShDependObjectInfo
SharedAggInfo
+SharedBitmapHeapInstrumentation
SharedBitmapState
SharedDependencyObjectType
SharedDependencyType
--
2.34.1
view thread (23+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
In-Reply-To: <CAApHDvqFtd-9DYH70sbjD7iB-Eq-xSip1LPr=nayfpPd1pkZVw@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox