Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w3DYV-00111m-0Q for pgsql-hackers@arkaria.postgresql.org; Thu, 19 Mar 2026 13:36:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w3DYS-0005Lu-1E for pgsql-hackers@arkaria.postgresql.org; Thu, 19 Mar 2026 13:36:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w3DYS-0005Lm-0A for pgsql-hackers@lists.postgresql.org; Thu, 19 Mar 2026 13:36:52 +0000 Received: from mail-ej1-x632.google.com ([2a00:1450:4864:20::632]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w3DYP-000000000X0-3FEL for pgsql-hackers@postgresql.org; Thu, 19 Mar 2026 13:36:52 +0000 Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-b9825ba7e8dso52224966b.3 for ; Thu, 19 Mar 2026 06:36:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1773927409; cv=none; d=google.com; s=arc-20240605; b=d103T/7RKbM6OGpghPzbN6ExgixLyhIuNjLC1s5qu8cpe7NPs9hvulhNQPLTEHHRWQ xinvKLJiP8PsAP+Ub1nsFRsrlSzlX2xZPwMwL5si8sepTmd15gGy3J3/K8nKKfslisEr nKQvssqTPjQXFR+VFMPgXSuCoKt6KmwkYlF5NHidcZoAUOlPWw3AStaM7RIiHjgdekc/ 9hsT3kF7hfBTO+wpfa0xNHgTC7ja6n58vgXXkIOTPdwjtq94buRun1n9vToD7zjij9X9 h7sIag/OhZxr8txJOBbuFUeL2MlC+FdOuIyTUoZh3bM5LBzawRu1s4pfqO8Rk9CqXCP8 Tr7Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=MCBtsnWHiE2Vp2PRv9u5kkIvBqpfifk54uMYZksDBJE=; fh=idWrSAy/H7hgApmOsL5kg6WpAtzZYmzMx0UqkXY11fk=; b=aKzliuujIbs80WjExB55EVU4x553bHO1l6+oXTWHM7R5OrQ946ZxZDRD9CjnUtAjXj tiAaJma9oAz4emFtqQssnKU5XcraAxZ39RgXL0L7N2IdWzN4VcYwYNtBRIacIgghEXLN djhmt7mlUY5d8vwq2fyKnu5849cwOdPi6rh8LUN/MI+7A8dwzu+39Au7Ip5E41gdqeb0 MKbtvb2SqfoAnkMOtEvmWCCRbCg4ei4V39zSRBtcHmhOW/pdrFNENti032GWr36QZj16 bGsSCokUQA4+OZ+fwvg4imFiL+QFLLsZI7RwqJbByGVo+j+wkLIy1vcOz847E50vcbH1 Ue/A==; darn=postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773927409; x=1774532209; darn=postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=MCBtsnWHiE2Vp2PRv9u5kkIvBqpfifk54uMYZksDBJE=; b=dHgxBQmqOef465f/zm526Evw2c6Mx+7pHi0WPRhrr24NOGvSdYS2yX2TWBkqsACcvV 4WmLRarzwUgAROyOZExTzPAwnZEsxgfl33pS0jfGkPLhdgnXBH5AAL31q7eqEyfTCNlW nH/f4oeV5cUOaIuby332P54UgePZbCPb2XHhG9GYIqV8hCwAaXfssE7MexK5vL87OovK 3UzLfPBTImOLxztdqPTJYkCCBc0A7Ja1D84cDqkKPYfpQKmBS2kGxuOMhgZDp9dnQ+gd 4trLBq8NgSjydqEn0InWAIginkVTaxvRpnU9UD26HPHTkFNnQlLmvJbGNvRRzipkOZ1Y d63w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773927409; x=1774532209; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=MCBtsnWHiE2Vp2PRv9u5kkIvBqpfifk54uMYZksDBJE=; b=syHdfg/Qeug8CuOop/U9to1SnXrQAIPGe0y+wVMw3nvVK/Aft9Mxa3NwRwuv1RDhED fgnopallSTln/F3dJu3xK2CGV/f0/PcMK0R2f/bkLvkUCvMu1odg96HVJCFxnAYlMG85 oHezQcESYDnRxFLf9q/tIz0BumPAy8+Po+wYNhaw2PssaJNh5YWD/TojF2SvcFeFd08x IPs1ZVmDv6LYGoWxwT6m5co0SmBIKVtco6LoPBDAvS1T1JK/qZbfu9xT+hUfLybmc31u 247cJOOOap2BVQMSxQobsz4jxD5j5TzAmktgurSIPV8NsIq0Nl9zvteJ6tXq5hrjasCm E5kw== X-Forwarded-Encrypted: i=1; AJvYcCXRQhJx4Tr+mGRH06b5fqEqFPwHjouF45c0KY+12Ok6SpM84rNYxHWpRby97i0Re+jaxU6/typ6MQjxuKge@postgresql.org X-Gm-Message-State: AOJu0YyCyZXfTEeNat9McM2sJB1blgplbaD7iqFNRcxvVBhBhY7ikv3F VjkF6TftjVgT2a5nxQIqXGdeD8iXnGmam4UVoPpTVsHyvz54K/Bj0omw7xOok92MDnqutXDZN8c sP/ApVTaxztUUvMabuqh8w73EYycTeqA= X-Gm-Gg: ATEYQzy44SQfkFVUE6bitP94pNtx/0rkjnuhTTADKGDVFm7bGevYe5bD+D55308+HwQ JXxl0hSSfmc5Y7rePvU+dOxqKMnJRz2j3YIH2WrtERg97Hi6enllW0wZiPJMgYJCoYdS/dnjW33 NRlYmVMl41Wa9z1sfQhR/GL++UvmQ2mbhg5a3SqwjvLXwc2BXHE2lutvBO8tZ3/8ayMroBHLLHQ 4E82hfOAySW184fSmP9MaB07q4/LZbm8e4r1s3U3Z/axJK2XsAKdMfgCJdq/L+wIRQ9/60HSnj0 z5Z0s14VZyHKCQSwWW4tB4ZwSPARP7mjKxQ9i2Q= X-Received: by 2002:a17:907:3fa2:b0:b93:3792:4b03 with SMTP id a640c23a62f3a-b97f4a7e7e1mr437690966b.32.1773927408777; Thu, 19 Mar 2026 06:36:48 -0700 (PDT) MIME-Version: 1.0 References: <5a37c2e3-619d-4816-84d7-0b27e3e6797f@iki.fi> <26c766d6-db0f-43d3-a618-44f8d40a3121@iki.fi> <62b8dc23-8f6a-4cac-91ff-f74bb5bc159a@iki.fi> <8a6799be-bd42-49fb-8914-856c97bb1977@iki.fi> In-Reply-To: <8a6799be-bd42-49fb-8914-856c97bb1977@iki.fi> From: Robert Haas Date: Thu, 19 Mar 2026 09:36:36 -0400 X-Gm-Features: AaiRm52LB6t9OBCEsZR6YKl4-FmQAlG879jQvqhU-r_el1t2hnQGUHgLJ-j7KX4 Message-ID: Subject: Re: Better shared data structure management and resizable shared data structures To: Heikki Linnakangas Cc: Ashutosh Bapat , Andres Freund , pgsql-hackers , chaturvedipalak1911@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Mar 19, 2026 at 6:31=E2=80=AFAM Heikki Linnakangas wrote: > We could add more callbacks that get called at different times. For > example the callback that would get called before shared memory is > allocated, which could adjust the size according to MaxBackends. That > would fully replace shmem_request_hook. Or a callback that would get > called later in the startup sequence, if we wanted to e.g. load the > pg_stat_statements file later in startup. > > This would be a natural place for other resources in future too. We > could support declaring "named lwlock tranches" here to replace > RequestNamedLWLockTranche() for example, although I think it's still > better to encourage embedding the LWLock in the struct instead. > > _PG_init in pg_stat_statements still does a lot more than register that > struct. It declares the GUCs and installs other hooks for example. We > could perhaps move those to the subsystem descriptor too, although I'm > not sure if that's worth the code churn. Without taking a strong position on this particular design idea, I kind of wonder if we should be going the opposite direction: instead of bundling more and more things into descriptors, try to let people write declarative code that does whatever it is that they want to do. I think descriptors are pretty limiting as a concept, because they can only do the exact things that they're designed to do. For instance, I find the change to use LWLockPadded rather than LWLock * in pgssSharedState to be a clear improvement, because I'd rather have fewer objects and less pointer-chasing. Now, that LWLockPadded is going to need to be initialized. I would rather do that by writing LWLockInitialize(&pgss->lock.lock, tranche_id) as you did than by adding something to a descriptor, since doing the latter is almost certainly going to be a less intuitive syntax for the same thing and I'm going to have to spend time verifying that whatever locution I'm compelled to write actually does what I want. And if somebody adds "really light weight locks" to the system then we'll have to add RLWLockInitialize() to the things that the descriptor system knows how to do, and if for some reason I want to do LWLockInitialize(&mythingy->lock.lock, some_random_condition ? this_tranche_id : that_tranche_id), the descriptor system will probably become an annoying straightjacket. Or if that example isn't compelling enough, imagine that I have an array of structs each of which contains an LWLockPadded and I need to go loop over the array and do all the initializations. Or maybe space is at a premium and I want to use LWLock rather than LWLockPadded. Or maybe something else. Code is just more flexible than having to go through descriptors, which is why a lot of modern languages go to a great deal of trouble to make closures a first-class concept. Let me take a step back and say what I think the problems in this area are, to see whether we agree on the basics. I suspect the reason you've undertaken this project is the fact that, currently, requesting shared memory and allocating it are totally decoupled. The number of bytes you request and the number of bytes you actually allocated could be totally different, and then some completely unrelated subsystem can break because it's allocating last, and even though it requested the bytes it wanted to allocate, some other subsystem under-requested and now the bytes this subsystem wants are not actually available. Tracking this kind of thing down can be a giant pain in the rear end, bordering on impossible IME. Also, if we want to be able to resize stuff in shared memory in some happy future, the need for precise tracking of these sorts of things presumably goes way up, although the exact details of that are not altogether clear to me. Furthermore, as you point out, even if everyone behaves themselves and requests and allocates the same number of bytes, that's still annoying if it means redoing some computation. I think the answer to this problem is to make requests into named objects. You're not allowed to request a number of BYTES of shared memory any more; you have to request "the shared memory bytes for the object named XYZ". So instead of RequestAddinShmemSpace(pgss_memsize()), you would say something like RequestAddinShmemSpace("pg_stat_staements", pgss_memsize()) and then later instead of saying pgss =3D ShmemInitStruct("pg_stat_statements", sizeof(pgssSharedState), &found), you say pgss =3D ShmemInitStruct("pg_stat_statements", &size, &found). The other big problem that I think we have in this area is that it's unclear what you're allowed to do in _PG_init() vs. some other callback, and sometimes you need IsUnderPostmaster checks or IsPostmasterEnvironment checks or process_shared_preload_libraries_in_progress checks. From my point of view, good goals would include (1) moving as much logic as possible into _PG_init() vs. having to put it elsewhere and (2) removing as many conditional checks as possible from it and aiming for _PG_init() functions that just run from start to finish in all cases. What _PG_init() already does pretty well is allow you to do per-backend setup. For instance, pg_plan_advice needs no shared resources, so _PG_init() was easy to write and, IMHO, easy to read. It's requesting diverse types of resources -- GUCs, an EXPLAIN extension ID, an EXPLAIN option, and hooks, but it can just do all of those things one after another with no conditional logic and, IMHO, life is great. We fall down a little bit because of the fact that PGC_POSTMASTER GUCs can't be added after startup -- see autoprewarm.c, for instance, which calls out that problem implicitly; and I suspect that issue is also why pg_stat_statements has the process_shared_preload_libraries_in_progress check at the top, because it looks to me like everything else that the function does would be completely fine to do later. So maybe we could adjust DefineCustomBLAHVariable to do nothing if there's a PGC_POSTMASTER variable requested and it's too late to create one, instead of blowing up. Or create the variable but attach some property to it that causes it to generate an error when set, e.g. ERROR: pg_stat_statements.max cannot be changed now because the library that created it was not included in shared_preload_libraries at startup time (wordsmithing likely needed). Shared resources do require some split-up of initialization: as you point out, if _PG_init() is called before we know MaxBackends, then we can't even size data structures who size depends on that quantity yet, and we certainly can't initialize anything, because shared memory might not have been created yet. I don't think we can completely avoid the need for callbacks here, but... just spitballing, how about something like this: extern void DefineShmemRegion(char *name, size_t size, void **localptr, void (*init_callback)(void *), int flags); extern void DefineShmemRegionDynamic(char *name, size_t (*sizing_callback)(void *), void **localptr, void (*init_callback)(void *), int flags); extern void *GetShmemRegion(char *name); #define DSR_FLAGS_NO_SLOP 0x01 #define DSR_FLAGS_DSA_OK 0x02 If DefineShmemRegion() or DefineShmemRegionDynamic() is called at shared_preload_libraries time, it arranges to increase the size of the main shared memory segment by the given amount, or the computed amount (for things that depend on MaxBackends). Then, once the main shared memory segment is created, it invokes the init_callback and sets *localptr. If either of these functions are called after the main shared memory segment has been created, they check for an existing allocation, and if one is found, they just set *localptr. (They can actually probably exit quickly if *localptr is already set.) Otherwise, they try to allocate from DSA if DSR_FLAGS_DSA_OK is given, or else from the slop space unless DSR_FLAGS_NO_SLOP is given. If that works, they then call the init_callback under a suitable lock and set *localptr. If not, they fail silently. Functions that need to use the local pointers do something like this: if (unlikely(pgss =3D=3D NULL)) pgss =3D GetShmemRegion("pg_stat_statements"); ...which throws a suitable error -- not just a generic one that the region doesn't exist, but something that's sensitive to different failure conditions: the region was never registered, the region was registered after shared_preload_libraries time and not enough slop space remains, or whatever. GetShmemRegion() could even retry the initialization in certain cases, e.g. if DSA is OK and we previously were called too early in startup to try DSA, we can try now, or if DSA allocation failed due to OOM, we can try again. I *think* this design gets rid of all the IsUnderPostmaster and shared_preload_libraries_in_progress checks in individual subsystems, and all the use of shmem startup hooks. You just ask for what you want and if there's a way to get it, the system gives it to you, and if there's not, it generates an error at the latest possible time, and also tries to self-heal if that's reasonable. If you do load your module in shared_preload_libraries, then by the time main shared memory initialization completes in the postmaster, everyone's localptr values (like pgss) will be initialized, but if it happens to be an EXEC_BACKEND build, those same calls will also happen in every postmaster child, and will automatically re-find the shared memory areas and reinitialize all the pointers. If you load your module later, your localptr values will hopefully be initialized by the end of _PG_init(), but if that doesn't work out, then the unlikely-protected calls to GetShmemRegion will produce suitable errors at a suitable time. And I think it all works out nicely in a standalone backend, too. This is all kind of a brain dump and is not fully thought-through and might be riddled with cognitive errors, but what I'm sort of trying to convey is where I think the complexity in the current system comes from (which is that we require every subsystem/extension author to know how the sausage is made, and we don't enforce consistency between requests and allocations) and what I don't really like about descriptors as a solution (which is that they are harder to read than imperative code and can interfere with cases where somebody wants to do something slightly different than what the descriptor-designer had in mind). I hope that some of it is helpful to you... --=20 Robert Haas EDB: http://www.enterprisedb.com