Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rkrFQ-00FxJk-6b for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Mar 2024 20:00:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rkrFN-002nTT-PK for pgsql-hackers@arkaria.postgresql.org; Thu, 14 Mar 2024 20:00:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rkrFN-002nTK-EA for pgsql-hackers@lists.postgresql.org; Thu, 14 Mar 2024 20:00:13 +0000 Received: from mail-qt1-x82e.google.com ([2607:f8b0:4864:20::82e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rkrFK-004X8l-HD for pgsql-hackers@lists.postgresql.org; Thu, 14 Mar 2024 20:00:13 +0000 Received: by mail-qt1-x82e.google.com with SMTP id d75a77b69052e-42f2d02fbdeso6969071cf.1 for ; Thu, 14 Mar 2024 13:00:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1710446409; x=1711051209; darn=lists.postgresql.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=gkuLjmiOEuLtYith9uYzyMwXmL2sVwT+/ZBVMR+HnBM=; b=lduAETPUeuebS52ooyNUOMHc60IiHPCIHjdgdPcsiI+Sv2Oiu0fxS5BRQYRxDjcwxC MTihqUUIKirYuj82LF9Dk/2qh3yOupLd1iUVvs2X4bC9S8fdKlHy05wGodgJ9wBFClHw 557hTB2M62XrMNLheI2Mpbuwi0wJ4orSmrIK9jERu+/5Q2oCGl6OpDABgPeDVCk6Tfj9 z/TTtXAPS1I+4fEw6uxdUoe/cXmOtkUwfgm6FHkbEDxV1wBnPUvtfrNCZ3IP/kn/S/ZL wn2i2Uwa5Buc14sDnZXTLB9CEEiRkPcsfLR5nY1A8u2KxXGYhhsBH/5+eCyurOsesSCg Gj6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1710446409; x=1711051209; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=gkuLjmiOEuLtYith9uYzyMwXmL2sVwT+/ZBVMR+HnBM=; b=U5W9kwsL5XCSUW3SuCbHtl7uZ6ZqnxAu+tfAXNLAexxiCA2hIASVymvAAQpDKe7aLp GoJt07yzGugmzC3qVHT6m8pfH+GhMtB0rC5LIDf5Swgmr61QKNapIE0LsEzFQvmb0yFH yhzQmQxoYvppZfq8BDPoZ9jMbuZflETKz0Pwi827sR6fuXQ8ztzHZWRafCcqqotuTJgc RriTg2DQkRcKve5446nip3G/GHUIOtVTaPqhZH+t864kkR3WhJq31Y9kGKlEVXlh02sA 8H9ORzkxmqt1vQG+cnRkEMcq5O4jv1giAZ5zn2EH0UNg4+2wS1xhatfYbIjQVSWSwPcD FoHA== X-Forwarded-Encrypted: i=1; AJvYcCV1IXlVMpuYKLMcDJDElENvOdjPpmMazBlJrdaxL6RwVyz/gAAwSCp3oWzVw+OuS3xCSRsaA4GW0i8JcxyojTxveAOt/wSLkRt5ujxC6Fn4yAtM X-Gm-Message-State: AOJu0YwYOHWMjDiaFmJFvkzEyZC1tlECmy/1hzlriNxBw6BRX7X9OLWB 2CJhlS36aTwrnH/UpkK85dpsooEXS3mRP/BDQwMLtXPJpxzGdSv7 X-Google-Smtp-Source: AGHT+IEepbZF+GUJEhHcjRjb0r4gkRlusuTWjaaTGNf4M5eqWZNkOGwrztiAj81xUFvLTyRZYuq7qw== X-Received: by 2002:a05:622a:1716:b0:42e:e112:11e6 with SMTP id h22-20020a05622a171600b0042ee11211e6mr3249238qtk.11.1710446408644; Thu, 14 Mar 2024 13:00:08 -0700 (PDT) Received: from liskov ([151.197.204.27]) by smtp.gmail.com with ESMTPSA id x15-20020ac87a8f000000b00430ae4a59c2sm77600qtr.23.2024.03.14.13.00.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 14 Mar 2024 13:00:08 -0700 (PDT) Date: Thu, 14 Mar 2024 16:00:06 -0400 From: Melanie Plageman To: Heikki Linnakangas Cc: Tomas Vondra , Dilip Kumar , David Geier , PostgreSQL Developers Subject: Re: Parallel Bitmap Heap Scan reports per-worker stats in EXPLAIN ANALYZE Message-ID: <20240314200006.emadubnx5nukthhi@liskov> References: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Mar 14, 2024 at 05:30:30PM +0200, Heikki Linnakangas wrote: > On 18/02/2024 00:31, Tomas Vondra wrote: > > Do you plan to work continue working on this patch? I did take a look, > > and on the whole it looks reasonable - it modifies the right places etc. > > +1 > > > I think there are two things that may need an improvement: > > > > 1) Storing variable-length data in ParallelBitmapHeapState > > > > I agree with Robert the snapshot_and_stats name is not great. I see > > Dmitry mentioned phs_snapshot_off as used by ParallelTableScanDescData - > > the reasons are somewhat different (phs_snapshot_off exists because we > > don't know which exact struct will be allocated), while here we simply > > need to allocate two variable-length pieces of memory. But it seems like > > it would work nicely for this. That is, we could try adding an offset > > for each of those pieces of memory: > > > > - snapshot_off > > - stats_off > > > > I don't like the GetSharedSnapshotData name very much, it seems very > > close to GetSnapshotData - quite confusing, I think. > > > > Dmitry also suggested we might add a separate piece of shared memory. I > > don't quite see how that would work for ParallelBitmapHeapState, but I > > doubt it'd be simpler than having two offsets. I don't think the extra > > complexity (paid by everyone) would be worth it just to make EXPLAIN > > ANALYZE work. > > I just removed phs_snapshot_data in commit 84c18acaf6. I thought that would > make this moot, but now that I rebased this, there are stills some aesthetic > questions on how best to represent this. > > In all the other node types that use shared instrumentation like this, the > pattern is as follows: (using Memoize here as an example, but it's similar > for Sort, IncrementalSort, Agg and Hash) > > /* ---------------- > * Shared memory container for per-worker memoize information > * ---------------- > */ > typedef struct SharedMemoizeInfo > { > int num_workers; > MemoizeInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER]; > } SharedMemoizeInfo; > > /* this struct is backend-private */ > typedef struct MemoizeState > { > ScanState ss; /* its first field is NodeTag */ > ... > MemoizeInstrumentation stats; /* execution statistics */ > SharedMemoizeInfo *shared_info; /* statistics for parallel workers */ > } MemoizeState; > > While the scan is running, the node updates its private data in > MemoizeState->stats. At the end of a parallel scan, the worker process > copies the MemoizeState->stats to MemoizeState->shared_info->stats, which > lives in shared memory. The leader process copies > MemoizeState->shared_info->stats to its own backend-private copy, which it > then stores in its MemoizeState->shared_info, replacing the pointer to the > shared memory with a pointer to the private copy. That happens in > ExecMemoizeRetrieveInstrumentation(). > > This is a little different for parallel bitmap heap scans, because a bitmap > heap scan keeps some other data in shared memory too, not just > instrumentation data. Also, the naming is inconsistent: the equivalent of > SharedMemoizeInfo is actually called ParallelBitmapHeapState. I think we > should rename it to SharedBitmapHeapInfo, to make it clear that it lives in > shared memory, but I digress. FWIW, if we merge a BHS streaming read user like the one I propose in [1] (not as a pre-condition to this but just as something to make you more comfortable with these names), the ParallelBitmapHeapState will basically only contain the shared iterator and the coordination state for accessing it and could be named as such. Then if you really wanted to be consistent with Memoize, you could name the instrumentation SharedBitmapHeapInfo. But, personally I prefer the name you gave it: SharedBitmapHeapInstrumentation. I think that would have been a better name for SharedMemoizeInfo since num_workers is really just used as the length of the array of instrumentation info. > We could now put the new stats at the end of ParallelBitmapHeapState as a > varlen field. But I'm not sure that's a good idea. In > ExecBitmapHeapRetrieveInstrumentation(), would we make a backend-private > copy of the whole ParallelBitmapHeapState struct, even though the other > fields don't make sense after the shared memory is released? Sounds > confusing. Or we could introduce a separate struct for the stats, and copy > just that: > > typedef struct SharedBitmapHeapInstrumentation > { > int num_workers; > BitmapHeapScanInstrumentation sinstrument[FLEXIBLE_ARRAY_MEMBER]; > } SharedBitmapHeapInstrumentation; > > typedef struct BitmapHeapScanState > { > ScanState ss; /* its first field is NodeTag */ > ... > SharedBitmapHeapInstrumentation sinstrument; > } BitmapHeapScanState; > > that compiles, at least with my compiler, but I find it weird to have a > variable-length inner struct embedded in an outer struct like that. In the attached patch, BitmapHeapScanState->sinstrument is a pointer, though. Or are you proposing the above as an alternative that you decided not to go with? > Long story short, I think it's still better to store > ParallelBitmapHeapInstrumentationInfo separately in the DSM chunk, not as > part of ParallelBitmapHeapState. Attached patch does that, rebased over > current master. The approach in the attached patch looks good to me. > I didn't address any of the other things that you, Tomas, pointed out, but I > think they're valid concerns. I'll send a separate review of these issues that are still present in your patch as well. - Melanie [1] https://www.postgresql.org/message-id/flat/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com