public inbox for [email protected]  
help / color / mirror / Atom feed
From: Heikki Linnakangas <[email protected]>
To: Tomas Vondra <[email protected]>
To: Dilip Kumar <[email protected]>
To: David Geier <[email protected]>
Cc: PostgreSQL Developers <[email protected]>
Cc: Melanie Plageman <[email protected]>
Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
Date: Thu, 14 Mar 2024 17:30:30 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAFiTN-v1yDvU=X+hwfJ+55=sbgDj=_kuvbduEG-F7=BjpWcnuw@mail.gmail.com>
	<[email protected]>

On 18/02/2024 00:31, Tomas Vondra wrote:
> Do you plan to work continue working on this patch? I did take a look,
> and on the whole it looks reasonable - it modifies the right places etc.

+1

> I think there are two things that may need an improvement:
> 
> 1) Storing variable-length data in ParallelBitmapHeapState
> 
> I agree with Robert the snapshot_and_stats name is not great. I see
> Dmitry mentioned phs_snapshot_off as used by ParallelTableScanDescData -
> the reasons are somewhat different (phs_snapshot_off exists because we
> don't know which exact struct will be allocated), while here we simply
> need to allocate two variable-length pieces of memory. But it seems like
> it would work nicely for this. That is, we could try adding an offset
> for each of those pieces of memory:
> 
>   - snapshot_off
>   - stats_off
> 
> I don't like the GetSharedSnapshotData name very much, it seems very
> close to GetSnapshotData - quite confusing, I think.
> 
> Dmitry also suggested we might add a separate piece of shared memory. I
> don't quite see how that would work for ParallelBitmapHeapState, but I
> doubt it'd be simpler than having two offsets. I don't think the extra
> complexity (paid by everyone) would be worth it just to make EXPLAIN
> ANALYZE work.

I just removed phs_snapshot_data in commit 84c18acaf6. I thought that 
would make this moot, but now that I rebased this, there are stills some 
aesthetic questions on how best to represent this.

In all the other node types that use shared instrumentation like this, 
the pattern is as follows: (using Memoize here as an example, but it's 
similar for Sort, IncrementalSort, Agg and Hash)

/* ----------------
  *	 Shared memory container for per-worker memoize information
  * ----------------
  */
typedef struct SharedMemoizeInfo
{
	int			num_workers;
	MemoizeInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
} SharedMemoizeInfo;

/* this struct is backend-private */
typedef struct MemoizeState
{
	ScanState	ss;				/* its first field is NodeTag */
	...
	MemoizeInstrumentation stats;	/* execution statistics */
	SharedMemoizeInfo *shared_info; /* statistics for parallel workers */
} MemoizeState;

While the scan is running, the node updates its private data in 
MemoizeState->stats. At the end of a parallel scan, the worker process 
copies the MemoizeState->stats to MemoizeState->shared_info->stats, 
which lives in shared memory. The leader process copies 
MemoizeState->shared_info->stats to its own backend-private copy, which 
it then stores in its MemoizeState->shared_info, replacing the pointer 
to the shared memory with a pointer to the private copy. That happens in 
ExecMemoizeRetrieveInstrumentation().

This is a little different for parallel bitmap heap scans, because a 
bitmap heap scan keeps some other data in shared memory too, not just 
instrumentation data. Also, the naming is inconsistent: the equivalent 
of SharedMemoizeInfo is actually called ParallelBitmapHeapState. I think 
we should rename it to SharedBitmapHeapInfo, to make it clear that it 
lives in shared memory, but I digress.

We could now put the new stats at the end of ParallelBitmapHeapState as 
a varlen field. But I'm not sure that's a good idea. In 
ExecBitmapHeapRetrieveInstrumentation(), would we make a backend-private 
copy of the whole ParallelBitmapHeapState struct, even though the other 
fields don't make sense after the shared memory is released? Sounds 
confusing. Or we could introduce a separate struct for the stats, and 
copy just that:

typedef struct SharedBitmapHeapInstrumentation
{
	int			num_workers;
	BitmapHeapScanInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
} SharedBitmapHeapInstrumentation;

typedef struct BitmapHeapScanState
{
	ScanState	ss;				/* its first field is NodeTag */
	...
	SharedBitmapHeapInstrumentation sinstrument;
} BitmapHeapScanState;

that compiles, at least with my compiler, but I find it weird to have a 
variable-length inner struct embedded in an outer struct like that.

Long story short, I think it's still better to store 
ParallelBitmapHeapInstrumentationInfo separately in the DSM chunk, not 
as part of ParallelBitmapHeapState. Attached patch does that, rebased 
over current master.


I didn't address any of the other things that you, Tomas, pointed out, 
but I think they're valid concerns.

-- 
Heikki Linnakangas
Neon (https://neon.tech)


Attachments:

  [text/x-patch] v3-0001-Parallel-Bitmap-Heap-Scan-reports-per-worker-stat.patch (11.3K, 2-v3-0001-Parallel-Bitmap-Heap-Scan-reports-per-worker-stat.patch)
  download | inline diff:
From 04f11a37ee04b282c51e9dae68ead7c9c3d5fb3d Mon Sep 17 00:00:00 2001
From: David Geier <[email protected]>
Date: Tue, 8 Nov 2022 19:40:31 +0100
Subject: [PATCH v3 1/1] Parallel Bitmap Heap Scan reports per-worker stats

Similarly to other nodes (e.g. hash join, sort, memoize),
Bitmap Heap Scan now reports per-worker stats in the EXPLAIN
ANALYZE output. Previously only the heap blocks stats for the
leader were reported which was incomplete in parallel scans.

Discussion: https://www.postgresql.org/message-id/flat/b3d80961-c2e5-38cc-6a32-61886cdf766d%40gmail.com
---
 src/backend/commands/explain.c            | 45 +++++++++--
 src/backend/executor/execParallel.c       |  3 +
 src/backend/executor/nodeBitmapHeapscan.c | 93 ++++++++++++++++++++---
 src/include/executor/nodeBitmapHeapscan.h |  1 +
 src/include/nodes/execnodes.h             | 33 +++++++-
 5 files changed, 157 insertions(+), 18 deletions(-)

diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c
index a9d5056af48..4c1c1bf6268 100644
--- a/src/backend/commands/explain.c
+++ b/src/backend/commands/explain.c
@@ -3476,26 +3476,57 @@ show_hashagg_info(AggState *aggstate, ExplainState *es)
 static void
 show_tidbitmap_info(BitmapHeapScanState *planstate, ExplainState *es)
 {
+	Assert(es->analyze);
+
 	if (es->format != EXPLAIN_FORMAT_TEXT)
 	{
 		ExplainPropertyInteger("Exact Heap Blocks", NULL,
-							   planstate->exact_pages, es);
+							   planstate->stats.exact_pages, es);
 		ExplainPropertyInteger("Lossy Heap Blocks", NULL,
-							   planstate->lossy_pages, es);
+							   planstate->stats.lossy_pages, es);
 	}
 	else
 	{
-		if (planstate->exact_pages > 0 || planstate->lossy_pages > 0)
+		if (planstate->stats.exact_pages > 0 || planstate->stats.lossy_pages > 0)
 		{
 			ExplainIndentText(es);
 			appendStringInfoString(es->str, "Heap Blocks:");
-			if (planstate->exact_pages > 0)
-				appendStringInfo(es->str, " exact=%ld", planstate->exact_pages);
-			if (planstate->lossy_pages > 0)
-				appendStringInfo(es->str, " lossy=%ld", planstate->lossy_pages);
+			if (planstate->stats.exact_pages > 0)
+				appendStringInfo(es->str, " exact=%ld", planstate->stats.exact_pages);
+			if (planstate->stats.lossy_pages > 0)
+				appendStringInfo(es->str, " lossy=%ld", planstate->stats.lossy_pages);
 			appendStringInfoChar(es->str, '\n');
 		}
 	}
+
+	if (planstate->pstate != NULL)
+	{
+		for (int n = 0; n < planstate->sinstrument->num_workers; n++)
+		{
+			BitmapHeapScanInstrumentation *si = &planstate->sinstrument->sinstrument[n];
+
+			if (si->exact_pages == 0 && si->lossy_pages == 0)
+				continue;
+
+			if (es->workers_state)
+				ExplainOpenWorker(n, es);
+
+			if (es->format == EXPLAIN_FORMAT_TEXT)
+			{
+				ExplainIndentText(es);
+				appendStringInfo(es->str, "Heap Blocks: exact=%ld lossy=%ld\n",
+						 si->exact_pages, si->lossy_pages);
+			}
+			else
+			{
+				ExplainPropertyInteger("Exact Heap Blocks", NULL, si->exact_pages, es);
+				ExplainPropertyInteger("Lossy Heap Blocks", NULL, si->lossy_pages, es);
+			}
+
+			if (es->workers_state)
+				ExplainCloseWorker(n, es);
+		}
+	}
 }
 
 /*
diff --git a/src/backend/executor/execParallel.c b/src/backend/executor/execParallel.c
index 8c53d1834e9..bfb3419efb7 100644
--- a/src/backend/executor/execParallel.c
+++ b/src/backend/executor/execParallel.c
@@ -1076,6 +1076,9 @@ ExecParallelRetrieveInstrumentation(PlanState *planstate,
 		case T_MemoizeState:
 			ExecMemoizeRetrieveInstrumentation((MemoizeState *) planstate);
 			break;
+		case T_BitmapHeapScanState:
+			ExecBitmapHeapRetrieveInstrumentation((BitmapHeapScanState *) planstate);
+			break;
 		default:
 			break;
 	}
diff --git a/src/backend/executor/nodeBitmapHeapscan.c b/src/backend/executor/nodeBitmapHeapscan.c
index ca548e44eb4..45de261e2fa 100644
--- a/src/backend/executor/nodeBitmapHeapscan.c
+++ b/src/backend/executor/nodeBitmapHeapscan.c
@@ -236,9 +236,9 @@ BitmapHeapNext(BitmapHeapScanState *node)
 			}
 
 			if (tbmres->ntuples >= 0)
-				node->exact_pages++;
+				node->stats.exact_pages++;
 			else
-				node->lossy_pages++;
+				node->stats.lossy_pages++;
 
 			/* Adjust the prefetch target */
 			BitmapAdjustPrefetchTarget(node);
@@ -647,6 +647,17 @@ ExecEndBitmapHeapScan(BitmapHeapScanState *node)
 {
 	TableScanDesc scanDesc;
 
+	/*
+	 * When ending a parallel worker, copy the statistics gathered by the
+	 * worker back into shared memory so that it can be picked up by the main
+	 * process to report in EXPLAIN ANALYZE.
+	 */
+	if (node->sinstrument != NULL && IsParallelWorker())
+	{
+		Assert(ParallelWorkerNumber <= node->sinstrument->num_workers);
+		node->sinstrument->sinstrument[ParallelWorkerNumber] = node->stats;
+	}
+
 	/*
 	 * extract information from the node
 	 */
@@ -716,8 +727,8 @@ ExecInitBitmapHeapScan(BitmapHeapScan *node, EState *estate, int eflags)
 	scanstate->return_empty_tuples = 0;
 	scanstate->vmbuffer = InvalidBuffer;
 	scanstate->pvmbuffer = InvalidBuffer;
-	scanstate->exact_pages = 0;
-	scanstate->lossy_pages = 0;
+	scanstate->stats.exact_pages = 0;
+	scanstate->stats.lossy_pages = 0;
 	scanstate->prefetch_iterator = NULL;
 	scanstate->prefetch_pages = 0;
 	scanstate->prefetch_target = 0;
@@ -840,7 +851,22 @@ void
 ExecBitmapHeapEstimate(BitmapHeapScanState *node,
 					   ParallelContext *pcxt)
 {
-	shm_toc_estimate_chunk(&pcxt->estimator, sizeof(ParallelBitmapHeapState));
+	Size		size;
+
+	/*
+	 * We store ParallelBitmapHeapState followed by
+	 * SharedBitmapHeapInstrumentationInfo in the same shmem chunk.
+	 */
+	size = MAXALIGN(sizeof(ParallelBitmapHeapState));
+
+	/* don't need this if not instrumenting or no workers */
+	if (node->ss.ps.instrument && pcxt->nworkers > 0)
+	{
+		size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument));
+		size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
+	}
+
+	shm_toc_estimate_chunk(&pcxt->estimator, size);
 	shm_toc_estimate_keys(&pcxt->estimator, 1);
 }
 
@@ -855,13 +881,27 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
 							ParallelContext *pcxt)
 {
 	ParallelBitmapHeapState *pstate;
+	SharedBitmapHeapInstrumentation *sinstrument = NULL;
 	dsa_area   *dsa = node->ss.ps.state->es_query_dsa;
+	char	   *ptr;
+	Size		size;
 
 	/* If there's no DSA, there are no workers; initialize nothing. */
 	if (dsa == NULL)
 		return;
 
-	pstate = shm_toc_allocate(pcxt->toc, sizeof(ParallelBitmapHeapState));
+	size = MAXALIGN(sizeof(ParallelBitmapHeapState));
+	if (node->ss.ps.instrument && pcxt->nworkers > 0)
+	{
+		size = add_size(size, offsetof(SharedBitmapHeapInstrumentation, sinstrument));
+		size = add_size(size, mul_size(pcxt->nworkers, sizeof(BitmapHeapScanInstrumentation)));
+	}
+
+	ptr = shm_toc_allocate(pcxt->toc, size);
+	pstate = (ParallelBitmapHeapState *) ptr;
+	ptr += MAXALIGN(sizeof(ParallelBitmapHeapState));
+	if (node->ss.ps.instrument && pcxt->nworkers > 0)
+		sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
 
 	pstate->tbmiterator = 0;
 	pstate->prefetch_iterator = 0;
@@ -874,8 +914,16 @@ ExecBitmapHeapInitializeDSM(BitmapHeapScanState *node,
 
 	ConditionVariableInit(&pstate->cv);
 
+	/* ensure any unfilled slots will contain zeroes */
+	if (sinstrument)
+	{
+		sinstrument->num_workers = pcxt->nworkers;
+		memset(sinstrument->sinstrument, 0, pcxt->nworkers * sizeof(BitmapHeapScanInstrumentation));
+	}
+
 	shm_toc_insert(pcxt->toc, node->ss.ps.plan->plan_node_id, pstate);
 	node->pstate = pstate;
+	node->sinstrument = sinstrument;
 }
 
 /* ----------------------------------------------------------------
@@ -917,10 +965,37 @@ void
 ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
 							   ParallelWorkerContext *pwcxt)
 {
-	ParallelBitmapHeapState *pstate;
+	char	   *ptr;
 
 	Assert(node->ss.ps.state->es_query_dsa != NULL);
 
-	pstate = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
-	node->pstate = pstate;
+	ptr = shm_toc_lookup(pwcxt->toc, node->ss.ps.plan->plan_node_id, false);
+
+	node->pstate = (ParallelBitmapHeapState *) ptr;
+	ptr += MAXALIGN(sizeof(ParallelBitmapHeapState));
+
+	if (node->ss.ps.instrument)
+		node->sinstrument = (SharedBitmapHeapInstrumentation *) ptr;
+}
+
+/* ----------------------------------------------------------------
+ *		ExecBitmapHeapRetrieveInstrumentation
+ *
+ *		Transfer bitmap heap scan statistics from DSM to private memory.
+ * ----------------------------------------------------------------
+ */
+void
+ExecBitmapHeapRetrieveInstrumentation(BitmapHeapScanState *node)
+{
+	SharedBitmapHeapInstrumentation *sinstrument = node->sinstrument;
+	Size		size;
+
+	if (sinstrument == NULL)
+		return;
+
+	size = offsetof(SharedBitmapHeapInstrumentation, sinstrument)
+		+ sinstrument->num_workers * sizeof(BitmapHeapScanInstrumentation);
+
+	node->sinstrument = palloc(size);
+	memcpy(node->sinstrument, sinstrument, size);
 }
diff --git a/src/include/executor/nodeBitmapHeapscan.h b/src/include/executor/nodeBitmapHeapscan.h
index ea003a9caae..446a664590a 100644
--- a/src/include/executor/nodeBitmapHeapscan.h
+++ b/src/include/executor/nodeBitmapHeapscan.h
@@ -28,5 +28,6 @@ extern void ExecBitmapHeapReInitializeDSM(BitmapHeapScanState *node,
 										  ParallelContext *pcxt);
 extern void ExecBitmapHeapInitializeWorker(BitmapHeapScanState *node,
 										   ParallelWorkerContext *pwcxt);
+extern void ExecBitmapHeapRetrieveInstrumentation(BitmapHeapScanState *node);
 
 #endif							/* NODEBITMAPHEAPSCAN_H */
diff --git a/src/include/nodes/execnodes.h b/src/include/nodes/execnodes.h
index 27614ab50fb..1124a51cd21 100644
--- a/src/include/nodes/execnodes.h
+++ b/src/include/nodes/execnodes.h
@@ -1659,6 +1659,12 @@ typedef struct BitmapIndexScanState
 	struct IndexScanDescData *biss_ScanDesc;
 } BitmapIndexScanState;
 
+typedef struct BitmapHeapScanInstrumentation
+{
+	long lossy_pages;
+	long exact_pages;
+} BitmapHeapScanInstrumentation;
+
 /* ----------------
  *	 SharedBitmapState information
  *
@@ -1702,6 +1708,20 @@ typedef struct ParallelBitmapHeapState
 	ConditionVariable cv;
 } ParallelBitmapHeapState;
 
+/* ----------------
+ *	 Instrumentation data for a parallel bitmap heap scan.
+ *
+ * During a parallel bitmap heap scan, this lives in shared memory.  At the
+ * end, each worker copies their own stats to the right slot.  Finally, the
+ * leader copies the data to its local memory
+  * ----------------
+ */
+typedef struct SharedBitmapHeapInstrumentation
+{
+	int			num_workers;
+	BitmapHeapScanInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER];
+} SharedBitmapHeapInstrumentation;
+
 /* ----------------
  *	 BitmapHeapScanState information
  *
@@ -1736,8 +1756,6 @@ typedef struct BitmapHeapScanState
 	int			return_empty_tuples;
 	Buffer		vmbuffer;
 	Buffer		pvmbuffer;
-	long		exact_pages;
-	long		lossy_pages;
 	TBMIterator *prefetch_iterator;
 	int			prefetch_pages;
 	int			prefetch_target;
@@ -1746,6 +1764,17 @@ typedef struct BitmapHeapScanState
 	TBMSharedIterator *shared_tbmiterator;
 	TBMSharedIterator *shared_prefetch_iterator;
 	ParallelBitmapHeapState *pstate;
+
+	/*
+	 * In a parallelized bitmap heap scan, each worker sets their
+	 * instrumentaton data in pstate->sinstrument at the end.  The leader
+	 * copies the shared-memory info back to local storage before DSM
+	 * shutdown, and stores it here in 'instrument'.  It remains NULL in
+	 * workers and in non-parallel scans.
+	 */
+	SharedBitmapHeapInstrumentation *sinstrument;
+
+	BitmapHeapScanInstrumentation stats;
 } BitmapHeapScanState;
 
 /* ----------------
-- 
2.39.2



view thread (23+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox