public inbox for [email protected]  
help / color / mirror / Atom feed
PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
7+ messages / 4 participants
[nested] [flat]

* PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
@ 2026-04-15 15:55  Daniel Schreiber <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Daniel Schreiber @ 2026-04-15 15:55 UTC (permalink / raw)
  To: [email protected]

Dear PostgreSQL developers,

my colleagues and I probably found a bug in libpq when libpq is dlopened 
and closed multiple times during the lifetime of a process. In our setup 
we use a PAM module which links to libpq. The process using PAM is 
linked against openssl, so openssl is loaded during the complete 
lifetime of the process whereas libpq is loaded only during PAM 
authentication (and unloaded when PAM has finished).

We observed the bug on a Debian 13 system using libpq from Debian. To 
reproduce the bug, compile the attached c file using the following gcc 
command line:

gcc libpq1-dlopen.c -Wall -Wextra -o libpq1-dlopen -ldl -lssl -lcrypto

Then run the binary with a postgresql connection string as an argument. 
The connection string has to include 'sslmode=require'. The program will 
in a loop try to dlopen libpq, then connect to the server, finish the 
connection and unload libpq.

According to our findings every time a connection is established after 
dlopening libpq one of the 127 available BIO_METHOD structures in 
OpenSSL is consumed:
https://github.com/postgres/postgres/blob/REL_17_9/src/interfaces/libpq/fe-secure-openssl.c#L1987

So after 127 cycles registering the callbacks fails and in our use case 
the application is no longer able to authenticate using PAM. As a 
workaround we LD_PRELOAD libpq in the application.

I am not subscribed yet to the mailing list, so please CC me.

Thank you,

Daniel
-- 
Daniel Schreiber
Facharbeitsgruppe Systemsoftware
Universitaetsrechenzentrum

Technische Universität Chemnitz
Straße der Nationen 62 (Raum B303)
09111 Chemnitz
Germany

Tel:     +49 371 531 35444



Attachments:

  [text/x-csrc] libpq1-dlopen.c (3.0K, 2-libpq1-dlopen.c)
  download | inline:
/*
 * src/test/examples/testlibpq.c
 *
 *
 * testlibpq.c
 *
 *      Test the C version of libpq, the PostgreSQL frontend library.
 */
#include <stdio.h>
#include <stdlib.h>
#include <postgresql/libpq-fe.h>
#include <openssl/ssl.h>
#include <dlfcn.h>

typedef PGconn* (*PQconnectdb_func)(const char *conninfo);
typedef void (*PQfinish_func)(PGconn *conn);
typedef ConnStatusType (*PQstatus_func)(const PGconn *conn);
typedef char* (*PQerrorMessage_func)(const PGconn *conn);


static void
exit_nicely(PGconn *conn, PQfinish_func finish_func)
{
    finish_func(conn);
    exit(1);
}

// Dummy to force linking of openssl
void dummy_openssl() {
    SSL_library_init();
}

int
main2(int argc, char **argv)
{
    const char *conninfo;
    PGconn     *conn;
    PGresult   *res;
    int         nFields;
    int         i,
                j;
    void *handle;
    PQconnectdb_func conn_func;
    PQfinish_func finish_func;
    PQstatus_func status_func;
    PQerrorMessage_func errorMessage_func;
    char* error;

    handle = dlopen("libpq.so", RTLD_LAZY);
    if (!handle) {
        fprintf(stderr, "Fehler beim Laden: %s\n", dlerror());
        return 1;
    }
    
    conn_func = (PQconnectdb_func)dlsym(handle, "PQconnectdb");
    error = dlerror();
    if (error != NULL) {
        fprintf(stderr, "error loading function: %s\n", error);
        dlclose(handle);
        return 1;
    }
    finish_func = (PQfinish_func)dlsym(handle, "PQfinish");
    error = dlerror();
    if (error != NULL) {
        fprintf(stderr, "error loading function: %s\n", error);
        dlclose(handle);
        return 1;
    }
    status_func = (PQstatus_func)dlsym(handle, "PQstatus");
    error = dlerror();
    if (error != NULL) {
        fprintf(stderr, "error loading function: %s\n", error);
        dlclose(handle);
        return 1;
    }
    errorMessage_func = (PQerrorMessage_func)dlsym(handle, "PQerrorMessage");
    error = dlerror();
    if (error != NULL) {
        fprintf(stderr, "error loading function: %s\n", error);
        dlclose(handle);
        return 1;
    }

    /*
     * If the user supplies a parameter on the command line, use it as the
     * conninfo string; otherwise default to setting dbname=postgres and using
     * environment variables or defaults for all other connection parameters.
     */
    if (argc > 1)
        conninfo = argv[1];
    else
        conninfo = "dbname = postgres";

    /* Make a connection to the database */
    conn = conn_func(conninfo);

    /* Check to see that the backend connection was successfully made */
    if (status_func(conn) != CONNECTION_OK)
    {
        fprintf(stderr, "%s", errorMessage_func(conn));
        exit_nicely(conn, finish_func);
    }

    /* close the connection to the database and cleanup */
    finish_func(conn);
    /* unload libpq */
    dlclose(handle);

    return 0;
}

int
main(int argc, char **argv) {
	int i=0;
	dummy_openssl();
	for (i=0; i < 229; i++) {
		printf("%d\n", i);
		/* will fail at i==127 */
		main2(argc, argv);
	}
}


  [application/pkcs7-signature] smime.p7s (4.8K, 3-smime.p7s)
  download

^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
@ 2026-04-17 19:14  Jacob Champion <[email protected]>
  parent: Daniel Schreiber <[email protected]>
  0 siblings, 1 reply; 7+ messages in thread

From: Jacob Champion @ 2026-04-17 19:14 UTC (permalink / raw)
  To: Daniel Schreiber <[email protected]>; +Cc: [email protected]

On Fri, Apr 17, 2026 at 7:33 AM Daniel Schreiber
<[email protected]> wrote:
> my colleagues and I probably found a bug in libpq when libpq is dlopened
> and closed multiple times during the lifetime of a process. In our setup
> we use a PAM module which links to libpq. The process using PAM is
> linked against openssl, so openssl is loaded during the complete
> lifetime of the process whereas libpq is loaded only during PAM
> authentication (and unloaded when PAM has finished).
>
> [snip]
>
> According to our findings every time a connection is established after
> dlopening libpq one of the 127 available BIO_METHOD structures in
> OpenSSL is consumed:
> https://github.com/postgres/postgres/blob/REL_17_9/src/interfaces/libpq/fe-secure-openssl.c#L1987

Right. I think in this *particular* case, we should simply skip the
call to BIO_get_new_index(). We don't need it, IIUC.

But I think we may also need to set expectations on whether or not
infinite dlopen/dlclose loops are supported in general. If we ever
come across a situation in which a call to BIO_get_new_index() is
necessary, that leak just fundamentally can't be plugged. The same is
true for any third-party libraries (or their dependencies, or
theirs...) that require "one-time", irreversible calls which can't be
tracked after we're unloaded. And we can't push these concerns up to
the top level application developer, because they don't know we exist.

(I'd be surprised if this were the only such resource leak across all
supported versions and combinations of Kerberos, OpenSSL, OpenLDAP,
Curl, etc. etc. From a quick search, you're the first to report this
in the ten years since the leak was introduced, so there may be more
dragons where you're headed.)

--Jacob






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
@ 2026-04-22 18:29  Jacob Champion <[email protected]>
  parent: Jacob Champion <[email protected]>
  0 siblings, 2 replies; 7+ messages in thread

From: Jacob Champion @ 2026-04-22 18:29 UTC (permalink / raw)
  To: PostgreSQL Hackers <[email protected]>; +Cc: Daniel Schreiber <[email protected]>

[moving to -hackers]

On Fri, Apr 17, 2026 at 12:14 PM Jacob Champion
<[email protected]> wrote:
>
> On Fri, Apr 17, 2026 at 7:33 AM Daniel Schreiber
> <[email protected]> wrote:
> > my colleagues and I probably found a bug in libpq when libpq is dlopened
> > and closed multiple times during the lifetime of a process. In our setup
> > we use a PAM module which links to libpq. The process using PAM is
> > linked against openssl, so openssl is loaded during the complete
> > lifetime of the process whereas libpq is loaded only during PAM
> > authentication (and unloaded when PAM has finished).
> >
> > [snip]
> >
> > According to our findings every time a connection is established after
> > dlopening libpq one of the 127 available BIO_METHOD structures in
> > OpenSSL is consumed:
> > https://github.com/postgres/postgres/blob/REL_17_9/src/interfaces/libpq/fe-secure-openssl.c#L1987
>
> Right. I think in this *particular* case, we should simply skip the
> call to BIO_get_new_index(). We don't need it, IIUC.

Attached is a proposal to do that.

> But I think we may also need to set expectations on whether or not
> infinite dlopen/dlclose loops are supported in general. If we ever
> come across a situation in which a call to BIO_get_new_index() is
> necessary, that leak just fundamentally can't be plugged. The same is
> true for any third-party libraries (or their dependencies, or
> theirs...) that require "one-time", irreversible calls which can't be
> tracked after we're unloaded. And we can't push these concerns up to
> the top level application developer, because they don't know we exist.
>
> (I'd be surprised if this were the only such resource leak across all
> supported versions and combinations of Kerberos, OpenSSL, OpenLDAP,
> Curl, etc. etc. From a quick search, you're the first to report this
> in the ten years since the leak was introduced, so there may be more
> dragons where you're headed.)

If anyone has thoughts on that, I'd love to hear them. I don't mind
removing this unnecessary code in HEAD, or even backpatching as a
courtesy -- but if it were up to me, I would not guarantee zero global
resource leaks across libpq and its entire dependency graph. (Even if
we magically had control over all those dependencies, I think it'd
still be reasonable for libpq devs to use "allocate once and move on"
patterns... and I want to continue using those in my new code.)

Thanks,
--Jacob


Attachments:

  [application/octet-stream] 0001-Remove-call-to-BIO_get_new_index-in-OpenSSL-code.patch (3.0K, 2-0001-Remove-call-to-BIO_get_new_index-in-OpenSSL-code.patch)
  download | inline diff:
From 800678db5674b0321f63fb420f942fb543b8d722 Mon Sep 17 00:00:00 2001
From: Jacob Champion <[email protected]>
Date: Mon, 20 Apr 2026 15:29:54 -0700
Subject: [PATCH] Remove call to BIO_get_new_index() in OpenSSL code

BIO_meth_new() takes an "index type" as its first argument. Older
OpenSSL documentation used to suggest that this argument should be
constructed by registering a custom index with BIO_get_new_index() and
combining that with the appropriate "BIO class" bit.

However, custom BIO indices are an extremely limited resource [1], and
newer documentation suggests that clients should only take one if they
expect to search a BIO chain for it later:

  `type` can be set to either `BIO_TYPE_NONE` or via BIO_get_new_index()
  if a unique type is required for searching[...] Note that
  BIO_get_new_index() can only be used 127 times before it returns an
  error.

We don't fall into that category (we immediately discard the index we've
created), and it doesn't look like OpenSSL has ever required a nonzero
index, so avoid registering one altogether.

Per complaint by Daniel Schreiber that libpq eventually breaks OpenSSL
when repeatedly dlopen/dlclose'd. It's not clear to me that we support
that use case in general (related TODO: decide whether to backpatch
this), but this change seems like a clear improvement going forward.

[1] https://github.com/openssl/openssl/issues/23655

Reported-by: Daniel Schreiber <[email protected]>
Discussion: https://postgr.es/m/f7fe39b3-7e99-4939-8852-07350549161d%40hrz.tu-chemnitz.de
Backpatch-through: TODO
---
 src/backend/libpq/be-secure-openssl.c    | 9 ++-------
 src/interfaces/libpq/fe-secure-openssl.c | 8 +-------
 2 files changed, 3 insertions(+), 14 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index a3e222f3a3d..6c3717bc024 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -1419,13 +1419,8 @@ port_bio_method(void)
 {
 	if (!port_bio_method_ptr)
 	{
-		int			my_bio_index;
-
-		my_bio_index = BIO_get_new_index();
-		if (my_bio_index == -1)
-			return NULL;
-		my_bio_index |= BIO_TYPE_SOURCE_SINK;
-		port_bio_method_ptr = BIO_meth_new(my_bio_index, "PostgreSQL backend socket");
+		port_bio_method_ptr = BIO_meth_new(BIO_TYPE_SOURCE_SINK,
+										   "PostgreSQL backend socket");
 		if (!port_bio_method_ptr)
 			return NULL;
 		if (!BIO_meth_set_write(port_bio_method_ptr, port_bio_write) ||
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index fbd3c63fb5d..2214a141847 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -1841,13 +1841,7 @@ pgconn_bio_method(void)
 
 	if (!pgconn_bio_method_ptr)
 	{
-		int			my_bio_index;
-
-		my_bio_index = BIO_get_new_index();
-		if (my_bio_index == -1)
-			goto err;
-		my_bio_index |= BIO_TYPE_SOURCE_SINK;
-		res = BIO_meth_new(my_bio_index, "libpq socket");
+		res = BIO_meth_new(BIO_TYPE_SOURCE_SINK, "libpq socket");
 		if (!res)
 			goto err;
 
-- 
2.34.1



^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
@ 2026-04-22 19:22  Nico Williams <[email protected]>
  parent: Jacob Champion <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Nico Williams @ 2026-04-22 19:22 UTC (permalink / raw)
  To: Jacob Champion <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Daniel Schreiber <[email protected]>

On Wed, Apr 22, 2026 at 11:29:04AM -0700, Jacob Champion wrote:
> > (I'd be surprised if this were the only such resource leak across all
> > supported versions and combinations of Kerberos, OpenSSL, OpenLDAP,
> > Curl, etc. etc. From a quick search, you're the first to report this
> > in the ten years since the leak was introduced, so there may be more
> > dragons where you're headed.)
> 
> If anyone has thoughts on that, I'd love to hear them. I don't mind
> removing this unnecessary code in HEAD, or even backpatching as a
> courtesy -- but if it were up to me, I would not guarantee zero global
> resource leaks across libpq and its entire dependency graph. (Even if
> we magically had control over all those dependencies, I think it'd
> still be reasonable for libpq devs to use "allocate once and move on"
> patterns... and I want to continue using those in my new code.)

Leaking a dl handle is a way to prevent unloading.  Not saying that's a
great answer, just that it's a workaround.






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
@ 2026-04-22 19:23  Tom Lane <[email protected]>
  parent: Jacob Champion <[email protected]>
  1 sibling, 1 reply; 7+ messages in thread

From: Tom Lane @ 2026-04-22 19:23 UTC (permalink / raw)
  To: Jacob Champion <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Daniel Schreiber <[email protected]>

Jacob Champion <[email protected]> writes:
> If anyone has thoughts on that, I'd love to hear them. I don't mind
> removing this unnecessary code in HEAD, or even backpatching as a
> courtesy -- but if it were up to me, I would not guarantee zero global
> resource leaks across libpq and its entire dependency graph.

I agree that we have no real ability to guarantee that.
Still, as far as the presented patch goes, it seems like a clear
win so I'd vote for fix-and-backpatch.

Should we write the arguments as BIO_TYPE_NONE | BIO_TYPE_SOURCE_SINK
rather than just BIO_TYPE_SOURCE_SINK?

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
@ 2026-04-22 22:10  Jacob Champion <[email protected]>
  parent: Nico Williams <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Jacob Champion @ 2026-04-22 22:10 UTC (permalink / raw)
  To: Nico Williams <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Daniel Schreiber <[email protected]>

On Wed, Apr 22, 2026 at 12:22 PM Nico Williams <[email protected]> wrote:
> Leaking a dl handle is a way to prevent unloading.  Not saying that's a
> great answer, just that it's a workaround.

Hmm, I did that for our handle to libpq-oauth, but I imagine that
leaking a handle to _ourselves_ may make someone very unhappy with us
at some point? Plus, it might kick off the tiniest, most pointless
arms race:

    // why does libpq do this
    dlclose(libpq_handle);
    dlclose(libpq_handle);

I guess we could play around with RTLD_NODELETE... Something to keep
in the back pocket, maybe?

--Jacob






^ permalink  raw  reply  [nested|flat] 7+ messages in thread

* Re: PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times
@ 2026-04-22 22:10  Jacob Champion <[email protected]>
  parent: Tom Lane <[email protected]>
  0 siblings, 0 replies; 7+ messages in thread

From: Jacob Champion @ 2026-04-22 22:10 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: PostgreSQL Hackers <[email protected]>; Daniel Schreiber <[email protected]>

On Wed, Apr 22, 2026 at 12:23 PM Tom Lane <[email protected]> wrote:
> I agree that we have no real ability to guarantee that.
> Still, as far as the presented patch goes, it seems like a clear
> win so I'd vote for fix-and-backpatch.

Sounds good to me.

> Should we write the arguments as BIO_TYPE_NONE | BIO_TYPE_SOURCE_SINK
> rather than just BIO_TYPE_SOURCE_SINK?

Good question... Popularity-wise, the shorter spelling shows up across
quite a few projects on GitHub, but the only spelling of
`BIO_meth_new(BIO_TYPE_NONE | ...)` that I can find is a single place
inside OpenSSL's own test suite -- which also uses the shorter
alternative, in two places. So my vote is BIO_TYPE_SOURCE_SINK; we'll
be in good company.

Thanks,
--Jacob






^ permalink  raw  reply  [nested|flat] 7+ messages in thread


end of thread, other threads:[~2026-04-22 22:10 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-04-15 15:55 PostgreSQL 17: Bug in libpq when libpq is dlopened/closed multiple times Daniel Schreiber <[email protected]>
2026-04-17 19:14 ` Jacob Champion <[email protected]>
2026-04-22 18:29   ` Jacob Champion <[email protected]>
2026-04-22 19:22     ` Nico Williams <[email protected]>
2026-04-22 22:10       ` Jacob Champion <[email protected]>
2026-04-22 19:23     ` Tom Lane <[email protected]>
2026-04-22 22:10       ` Jacob Champion <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox