Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vTxnz-00EXwZ-31 for pgsql-hackers@arkaria.postgresql.org; Fri, 12 Dec 2025 07:43:12 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vTxnx-006rEs-24 for pgsql-hackers@arkaria.postgresql.org; Fri, 12 Dec 2025 07:43:10 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vTxnx-006rEj-08 for pgsql-hackers@lists.postgresql.org; Fri, 12 Dec 2025 07:43:09 +0000 Received: from mail-pl1-x62d.google.com ([2607:f8b0:4864:20::62d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vTxnv-000H5v-1D for pgsql-hackers@postgresql.org; Fri, 12 Dec 2025 07:43:08 +0000 Received: by mail-pl1-x62d.google.com with SMTP id d9443c01a7336-2958db8ae4fso9510355ad.2 for ; Thu, 11 Dec 2025 23:43:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765525385; x=1766130185; darn=postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0nTTTzflADbFJt7ywnsyGcY4xkt00ZPNifExbMHjnZ8=; b=nUjUyrd7i2urtua0kBx6lbBJN2AvJ8MVSBAsX8StBUnWb2IQmOQ6MXC2//+7tSND/7 UCMAA842giaEAeLuPd1hM1OL+X40B3gaDxCiIpCWVp8eqf6M4sT4e2lyuzUmPQUNFzfF 4BzIXD3/mz+KL0tT3yXCehq6ZGGIYiblcHspf9pMPQrTmOFNpljg3a7sHnQC1sd1eMls TN5tSoADdIgbOHyS1RD5ONgAAW99AePwo7dXfDYYWOzTsNRqxicGpaag5jG/QGpNPgNa kHMAPSUtlrvlEZEbb3bpt8guzqBeudJPME4bWTWyOW4WwHp2ktgsZCN6JSkV/9rel2Uq lxkQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765525385; x=1766130185; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0nTTTzflADbFJt7ywnsyGcY4xkt00ZPNifExbMHjnZ8=; b=Xe7kvV76keMSDwmYHv8PnNkVzPaR7zrizGk+9WQB4Ho+88yfvKJ8JDpwqtldzWjK5C aciwg79FvL8LmwBdhFRMgRvliatdLFxfoc1B4h+mV/LUmwB2Mre7cHsisvqYDO2r4a1F D0ML+OVmLlDIyyvjyhNzl7T3lfGXb794uu6j/4RkTEunUfxx5yhgY6uI0oabZCTRaSXc zHMsI7RdFtj9CAjeuIGZB7TquB/r5Pf0ldibaiNFARs8zZbMDEH5UjWZPQ6h+Jn3f0E3 FbYFbsg+3SAMN4LmEK2EhmPgWFD8qmzw5qctu6NlOjBFbd1VK5mAoofGrrTZQihlbuAn xq8g== X-Gm-Message-State: AOJu0YwVN4qcQhOTz0wiEUG2KmaswH759Z4SAnIFViJ7RpG8AL7G1WGe nCezycfNRjLeImXCbJgs3O25OYQaGgLWcfytgYCgJh19STsmDgVG6lZy X-Gm-Gg: AY/fxX79W7hAfNlQsvDoW3HNCo5/4Cs6b796WUpQoeh847A1vnRwK7frAzi1mpj4GzI 9X5ph/WJGGw7h3TVJiYQAY9UpUkF3zQPQl3iki04ERfMQ4Z+JDxgbnHQqG3os5cY6iCrrR8EWr0 wXTxeQ2W9f7t3vE9jhL+fqGtzlFwSaNIz6Mnw5HHEjjVlHFaNNCOpqpH1W7r0vCR/9R8afb8BR8 F3UxH/Ubnva4YcvaOritq1GPQw6lOmZlgX4D3OgO22kIVVdN+9GYTd87XD/TOsUzc4tTtIeB0gf 5Q13q/gMt/eG1rTb6gwthvVIKWo2rgSm7o9j2Fg93Ns646Z7lvzcYJip+TeqHn6STY+DVdVEeyl CPkWcIBWx5I3Ey/5cqJgj67eq9+Ddl2AmWyMHF5SvA4EhrvJf9M7QhvMsUUwRmDw6xyaqS8JJCh SGqxuRJXa/eqpaQepSjp4= X-Google-Smtp-Source: AGHT+IG73iNWPm4K1vOod4K+DrCgYhqVFpsd/EM3nLTtUwV4qEj0eWfY+ATYEB/soA/NTfJRvmmf0g== X-Received: by 2002:a17:903:2ac4:b0:275:81ca:2c5 with SMTP id d9443c01a7336-29f2448dc54mr13199275ad.59.1765525385109; Thu, 11 Dec 2025 23:43:05 -0800 (PST) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-29f2ebc340csm5657175ad.28.2025.12.11.23.43.02 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 11 Dec 2025 23:43:04 -0800 (PST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.700.81\)) Subject: Re: [oauth] Stabilize the libpq-oauth ABI (and allow alternative implementations?) From: Chao Li In-Reply-To: Date: Fri, 12 Dec 2025 15:42:25 +0800 Cc: PostgreSQL Hackers Content-Transfer-Encoding: quoted-printable Message-Id: <3720B2E1-0B96-4063-8D63-B5AE6AFEA159@gmail.com> References: To: Jacob Champion X-Mailer: Apple Mail (2.3826.700.81) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Dec 10, 2025, at 05:00, Jacob Champion = wrote: >=20 > Hi everybody, >=20 > We introduced the libpq-oauth module late in the cycle for PG18, and > to put it mildly, its interface isn't great. The original > implementation depended on libpq internals, and we had to make sure > that we didn't start crashing during major or minor version upgrades. > So there were a bunch of compromises made to keep things safe, > including the restriction that the module name has to change every > major release. >=20 > Separately, but closely related: PG18's OAuth support allows you to > customize the client flow via a libpq hook function. Third-party > applications can make use of that, but our own utilities can still > only use the builtin device flow. That's annoying. >=20 > I've been working to replace the internal ABI with a stable one, > hopefully to solve both problems at the same time. A dlopen() is a > pretty clear seam for other people to use to modify and extend. > Unfortunately my first attempt (not pictured) ended up in a rabbit > hole, because I tried to tackle the third-party use case first. My > second attempt, attached, focuses on the ABI stabilization instead, > which I think is more likely to succeed. >=20 > (This took enough thinking that I'm really glad we didn't try this for > PG18. Thanks for letting me take on some technical debt for a bit.) >=20 > =3D Design =3D >=20 > Here's the train of thought behind the core changes (which are in = patch 0004): >=20 > The builtin-flow code in fe-auth-oauth.c is similar to the custom-flow > code, but it's just ever-so-slightly different. I'd like to unify the > two, so I want libpq-oauth to make use of the public > PGoauthBearerRequest API, and that means that almost all of the > injections made in the PG18 ABI need to be replaced. >=20 > Most of those injections are simply subsumed by the hook API > (hooray!). A couple of others can be replaced by PQconninfo(). Four > are left over: > - pgthreadlock_t > - libpq_gettext > - conn->errorMessage > - conn->oauth_issuer_id >=20 > I think we should keep injecting libpq_gettext; no third-party > implementations would be able to use that. And application hooks are > presumably capable of figuring out top-level locking already, since > the application has to have called PQregisterThreadLock() if it needed > to coordinate with libpq. >=20 > That leaves error messages and the issuer identifier. I think both > would be useful for hooks to have, too, so I'd like to add them to > PQAUTHDATA_OAUTH_BEARER_TOKEN. >=20 > =3D PQAUTHDATA_OAUTH_BEARER_TOKEN, version 2 =3D >=20 > My original plan for authdata extensions was to add new members to the > end of the structs that libpq passes into the hook. Applications would > gate on a feature macro during compilation to see whether they could > use the new members. That should work fine for an application hook; > you're not allowed to downgrade libpq past the version that your > applications are compiled against, lest you lose symbols (or other > feature-flag functionality) you're relying on. >=20 > Plugins, unfortunately, can't rely on the feature macro. As we found > out during the libpq-oauth split [1], we have to handle a long-running > application with an old libpq that loads an upgraded libpq-oauth, even > if the OS package dependencies suggest otherwise. (A plugin > architecture flips the direction of the runtime dependency arrow.) >=20 > There are a couple ways we could handle this. For this draft, I've > implemented what I think is a middle ground between verbosity and > type-safety: introduce a new V2 struct that "inherits" the V1 struct > and can be down-cast in the callbacks, kinda similar to our Node > hierarchy. We could go even more verbose, and duplicate the entire > PGoauthBearerToken struct -- matching the callback parameter types for > maximum safety -- but I'm not convinced that this would be a good use > of maintenance effort. The ability to cast between the two means we > can share v1 and v2 implementations in our tests. >=20 > We could also just add the new members at the end, say that you're > only allowed to use them if the V2 hook type is in use, and scribble > on them in V1 hooks to try to get misbehaving implementations to crash > outright. This arguably has the same amount of type-safety as the > downcast, and the resulting code looks nicer. (The libcurl API we use > does something similar with curl_version_info().) But it is definitely > more "magic". >=20 > Also of note: this adds a PQExpBuffer to libpq-fe.h. Technically, that > type is "internal". But... is it really, though? It doesn't seem > possible for us to make incompatible changes there without crashing > earlier psqls, in which case I would like to make use of it too. Maybe > this deserves its own minithread. >=20 > Okay, on to the full patchset. >=20 > =3D Roadmap: Prep =3D >=20 > The first few patches are bugfixes I intend to backpatch to 18. >=20 > - 0001: I stomped on the SOCKTYPE name in libpq-fe.h, but that's not > in our namespace and it's conceivable that it might collide with > someone else. (It collided with my own test code during my work on > this.) > - 0002 fixes a copy-paste bug in meson.build, which luckily hadn't > caused problems yet. > - 0003 ports Tom's debug2 fix for Test::Cluster::connect_fails() over > to 002_client.pl. (Currently, log checks in this test aren't made > after connection failures, but I don't really want to chase that down > later after I've forgotten about it.) >=20 > =3D Roadmap: Implementation =3D >=20 > Next three patches are the core implementation, which stabilizes the > ABI for libpq-oauth. I feel fairly confident that this, or something > close to it, could land in PG19. >=20 > - 0004 introduces the new PQAUTHDATA_OAUTH_BEARER_TOKEN_V2 API. >=20 > As described above. >=20 > - 0005 makes use of the new API in libpq-oauth. >=20 > This removes more code than it introduces, which is exciting. >=20 > Now we can rename libpq-oauth-19.so to just libpq-oauth.so, since we > no longer depend on anything that could change between major versions. > (We still need lock and locale support from libpq, as mentioned above, > so they continue to be decoupled via injection at init time.) >=20 > Some of the code in this patch overlaps with the translation fixes > thread [2], which I need to get to first. I'm hoping some additional > simplifications can be made after those localization bugs are fixed. >=20 > I think I'd also like to get the threadlock into the module API (but > not the hook API). A third-party flow might need to perform its own > one-time initialization, and if it relies on an old version of Curl > (or worse, Kerberos [3]), it'll need to use the same lock that the > top-level application has registered for libpq. So I imagine I'll need > to break out an initialization function here. Alternatively, I could > introduce an API into libpq to retrieve the threadlock function in > use? >=20 > - 0006 removes a potential ABI-compatibility pitfall for future devs. >=20 > libpq-oauth needs to use the shared-library variant of libpgcommon, > but it can no longer assume that the encoding support exported by > libpq is compatible. So it must not accidentally link against those > functions (see [4]). I don't imagine anyone will try adding code that > does this in practice; we're pretty UTF8-centric in OAuth. But just to > be safe, define USE_PRIVATE_ENCODING_FUNCS so that anyone who tries > will fail the build. >=20 > =3D Roadmap: Plugins? =3D >=20 > So now we have a stable ABI, which technically means that any > enterprising developer who wants to play games with LD_LIBRARY_PATH > could implement their own version of an OAuth flow, and have our > utilities make use of it into the future. >=20 > That brings us to patch 0007, which experimentally promotes the stable > API to a public header, and introduces a really janky environment > variable so that people don't have to play games. It will be obvious > from the code that this is not well-baked yet. >=20 > I hope the ability to dlopen() a custom flow can make it for 19; I > just don't know that this envvar approach is any good. The ideal > situation, IMO, is for a flow to be selected in the connection string. > But we have to lock that down, similarly to how we protect > local_preload_libraries, to prevent horrible exploits. At which point > we'll have essentially designed a generic libpq plugin system. Not > necessarily a terrible thing, but I don't think I have time to take it > on for PG19. >=20 > WDYT? > --Jacob >=20 > [1] https://postgr.es/m/aAkJnDQq3mOUvmQV%40msg.df7cb.de > [2] = https://postgr.es/m/TY4PR01MB1690746DB91991D1E9A47F57E94CDA%40TY4PR01MB169= 07.jpnprd01.prod.outlook.com > [3] https://postgr.es/m/aSSp03wmNMngi/Oe%40ubby > [4] https://postgr.es/c/b6c7cfac8 > = <0001-libpq-fe.h-Don-t-claim-SOCKTYPE-in-the-global-namesp.patch><0002-lib= pq-oauth-use-correct-c_args-in-meson.build.patch><0003-oauth_validator-Avo= id-races-in-log_check.patch><0004-libpq-Introduce-PQAUTHDATA_OAUTH_BEARER_= TOKEN_V2.patch><0005-libpq-oauth-Use-the-PGoauthBearerRequestV2-API.patch>= <0006-libpq-oauth-Never-link-against-libpq-s-encoding-func.patch><0007-WIP= -Introduce-third-party-OAuth-flow-plugins.patch> Hi Jacob, This is a solid patch set. Only a few small comments: 1 - 0001 ``` /* for PGoauthBearerRequest.async() */ #ifdef _WIN32 -#define SOCKTYPE uintptr_t /* avoids depending on = winsock2.h for SOCKET */ +#define PQ_SOCKTYPE uintptr_t /* avoids depending on winsock2.h for = SOCKET */ #else -#define SOCKTYPE int +#define PQ_SOCKTYPE int #endif ``` The commit message has explained why SOCKTYPE is temporary and the = reason why adding prefix =E2=80=9CPG_=E2=80=9D is to avoid collisions. = But I don=E2=80=99t think code readers will always read commit messages, = given the macro is a local and temporary, why adding a prefix starting = with a underscore, like =E2=80=9C_PQ_SOCKTYPE=E2=80=9D, which explicitly = indicates the macro is kinda private. =3D=3D=3D 0002 & 0003 Looks good. =3D=3D=3D 2 - 0004 ``` + * Helper for handling user flow failures. If the implementation put = anything + * into request->error, it's added to conn->errorMessage here. ``` Typo: put -> puts 3 - 0004 ``` +# Make sure the v1 hook continues to work. */ +test( ``` =E2=80=9C*/=E2=80=9C in the end of the comment line seems a typo. 4 - 0005 ``` + PQAUTHDATA_OAUTH_BEARER_TOKEN, /* server requests an OAuth = Bearer token + = * (prefer v2, below, instead) */ ``` "(prefer v2, below, instead)" looks confusing to me, though I can = understand what it means. Maybe make it clearer, like =E2=80=9C(v2 is = preferred; see below)" =3D=3D=3D 0006 Looks good. =3D=3D=3D =3D=3D=3D Not reviewing 0007 as it marks with WIP. =3D=3D=3D Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/