public inbox for [email protected]  
help / color / mirror / Atom feed
libpq maligning postgres stability
5+ messages / 5 participants
[nested] [flat]

* libpq maligning postgres stability
@ 2025-03-27 15:19 Andres Freund <[email protected]>
  2025-03-27 15:48 ` Re: libpq maligning postgres stability Robert Haas <[email protected]>
  2026-05-26 07:22 ` Re: libpq maligning postgres stability Jelte Fennema-Nio <[email protected]>
  0 siblings, 2 replies; 5+ messages in thread

From: Andres Freund @ 2025-03-27 15:19 UTC (permalink / raw)
  To: pgsql-hackers

Hi,

We have several places in libpq where libpq says that a connection closing is
probably due to a server crash with a message like:

server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing


I think this is rather unhelpful, at least these days. There are a lot of
reasons the connection could have failed, the server having terminated
abnormally is just one of them.

It's common to see this due to network issues, for example.  I've quite a few
times fielded worried questions of postgres users due to the message.

The reason I was looking at this message just now was a discussion of CI
failures on windows [1], which were likely caused by the known issue of
windows occasionally swallowing the server's last messages before the backend
exits (more detail e.g. in [2]).  It's easy to think that the failure was
wrongly caused by a postgres crash, due to the message, rather than due to not
receiving the expected FATAL.


And we don't even just add this message when the connection was actually
closed unexpectedly, we often do it even when we *did* get a FATAL, as in this
example:

psql -c 'select pg_terminate_backend(pg_backend_pid())'
FATAL:  57P01: terminating connection due to administrator command
LOCATION:  ProcessInterrupts, postgres.c:3351
server closed the connection unexpectedly
	This probably means the server terminated abnormally
	before or while processing the request.
connection to server was lost


I think this one is mostly a weakness in how libpq tracks connection state,
but it kind of shows the silliness of claiming postgres probably crashed.


Greetings,

Andres Freund


[1] Via Bilal:

    4 of the failures on the front page are related to Windows:
    https://cirrus-ci.com/build/4878370632105984
    https://cirrus-ci.com/build/5063665856020480
    https://cirrus-ci.com/build/4636858312818688
    https://cirrus-ci.com/build/6385762419081216

[2] https://postgr.es/m/CA%2BhUKGLR10ZqRCvdoRrkQusq75wF5%3DvEetRSs2_u1s%2BFAUosFQ%40mail.gmail.com





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: libpq maligning postgres stability
  2025-03-27 15:19 libpq maligning postgres stability Andres Freund <[email protected]>
@ 2025-03-27 15:48 ` Robert Haas <[email protected]>
  2025-03-27 16:03   ` Re: libpq maligning postgres stability Christoph Berg <[email protected]>
  2025-04-07 22:37   ` Re: libpq maligning postgres stability Bruce Momjian <[email protected]>
  1 sibling, 2 replies; 5+ messages in thread

From: Robert Haas @ 2025-03-27 15:48 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers

On Thu, Mar 27, 2025 at 11:19 AM Andres Freund <[email protected]> wrote:
> We have several places in libpq where libpq says that a connection closing is
> probably due to a server crash with a message like:
>
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing
>
> I think this is rather unhelpful, at least these days. There are a lot of
> reasons the connection could have failed, the server having terminated
> abnormally is just one of them.
>
> It's common to see this due to network issues, for example.  I've quite a few
> times fielded worried questions of postgres users due to the message.

Yeah, I agree. I used to think this hint was helpful, but it's gotten
less helpful as the years have passed, because the server is more
stable these days. Another thing that can cause this (as discussed in
Discord) is that the individual backend process can have died, but not
the server as a whole. In that case, the hint is only accurate if you
mean "server" to read your individual server process.

I wonder if, in addition to removing the hint, we could also consider
rewording the message. For example, a slight rewording to "server
connection closed unexpectedly" would avoid implying that it was the
server that took action, which is correct, because it could be a
firewall in between the machines or even security software on the
client side.  Maybe there is some more dramatic rewording that is even
better, but there's probably some value in keeping it similar to what
people are used to seeing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: libpq maligning postgres stability
  2025-03-27 15:19 libpq maligning postgres stability Andres Freund <[email protected]>
  2025-03-27 15:48 ` Re: libpq maligning postgres stability Robert Haas <[email protected]>
@ 2025-03-27 16:03   ` Christoph Berg <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Christoph Berg @ 2025-03-27 16:03 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

Re: Robert Haas
> I wonder if, in addition to removing the hint, we could also consider
> rewording the message. For example, a slight rewording to "server
> connection closed unexpectedly" would avoid implying that it was the

There is a lot of software doing string-parsing of this part of the
message, so it might be advisable to leave the first line alone.

https://sources.debian.org/src/php-laravel-framework/10.48.25+dfsg-2/src/Illuminate/Database/Detects...
https://sources.debian.org/src/python-taskflow/5.9.1-4/taskflow/persistence/backends/impl_sqlalchemy...
https://sources.debian.org/src/gnucash/1:5.10-0.1/libgnucash/backend/dbi/gnc-backend-dbi.cpp/?hl=798...
https://sources.debian.org/src/pgbouncer/1.24.0-3/test/test_misc.py/?hl=301#L301
https://sources.debian.org/src/icingaweb2-module-reporting/1.0.2-2/library/Reporting/RetryConnection...
https://sources.debian.org/src/storm/1.0-1/storm/databases/postgres.py/?hl=353#L353
https://sources.debian.org/src/timescaledb/2.19.0+dfsg-1/test/expected/loader-tsl.out/?hl=473#L473
https://sources.debian.org/src/odoo/18.0.0+dfsg-2/addons/web/tests/test_db_manager.py/?hl=277#L277

https://codesearch.debian.net/search?q=server+closed+the+connection+unexpectedly&literal=1

(There might be room for asking why this string parsing is being done,
is libpq missing "connection lost" detection vs. other errors?)

The remaining message lines are admittedly very pessimistic about
PostgreSQL's stability and should mention networking issues first.

Christoph





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: libpq maligning postgres stability
  2025-03-27 15:19 libpq maligning postgres stability Andres Freund <[email protected]>
  2025-03-27 15:48 ` Re: libpq maligning postgres stability Robert Haas <[email protected]>
@ 2025-04-07 22:37   ` Bruce Momjian <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Bruce Momjian @ 2025-04-07 22:37 UTC (permalink / raw)
  To: Robert Haas <[email protected]>; +Cc: Andres Freund <[email protected]>; pgsql-hackers

On Thu, Mar 27, 2025 at 11:48:26AM -0400, Robert Haas wrote:
> On Thu, Mar 27, 2025 at 11:19 AM Andres Freund <[email protected]> wrote:
> > We have several places in libpq where libpq says that a connection closing is
> > probably due to a server crash with a message like:
> >
> > server closed the connection unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing
> >
> > I think this is rather unhelpful, at least these days. There are a lot of
> > reasons the connection could have failed, the server having terminated
> > abnormally is just one of them.
> >
> > It's common to see this due to network issues, for example.  I've quite a few
> > times fielded worried questions of postgres users due to the message.
> 
> Yeah, I agree. I used to think this hint was helpful, but it's gotten
> less helpful as the years have passed, because the server is more
> stable these days. Another thing that can cause this (as discussed in
> Discord) is that the individual backend process can have died, but not
> the server as a whole. In that case, the hint is only accurate if you
> mean "server" to read your individual server process.
> 
> I wonder if, in addition to removing the hint, we could also consider
> rewording the message. For example, a slight rewording to "server
> connection closed unexpectedly" would avoid implying that it was the
> server that took action, which is correct, because it could be a
> firewall in between the machines or even security software on the
> client side.  Maybe there is some more dramatic rewording that is even
> better, but there's probably some value in keeping it similar to what
> people are used to seeing.

FYI, I researched these messages in 2023 to see if the message can be
adjusted based on the code line generating the message, but with no
conclusion:

	https://www.postgresql.org/message-id/flat/CA%2BTgmoZYvqmyQpzSUdtDmtk4Aj94MppDGe9qVJczbPLy4G2Yfg%40m...

-- 
  Bruce Momjian  <[email protected]>        https://momjian.us
  EDB                                      https://enterprisedb.com

  Do not let urgent matters crowd out time for investment in the future.





^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: libpq maligning postgres stability
  2025-03-27 15:19 libpq maligning postgres stability Andres Freund <[email protected]>
@ 2026-05-26 07:22 ` Jelte Fennema-Nio <[email protected]>
  1 sibling, 0 replies; 5+ messages in thread

From: Jelte Fennema-Nio @ 2026-05-26 07:22 UTC (permalink / raw)
  To: Andres Freund <[email protected]>; +Cc: pgsql-hackers; Robert Haas <[email protected]>; Bruce Momjian <[email protected]>

On Thu, 27 Mar 2025 at 16:19, Andres Freund <[email protected]> wrote:
> And we don't even just add this message when the connection was actually
> closed unexpectedly, we often do it even when we *did* get a FATAL, as in this
> example:
>
> psql -c 'select pg_terminate_backend(pg_backend_pid())'
> FATAL:  57P01: terminating connection due to administrator command
> LOCATION:  ProcessInterrupts, postgres.c:3351
> server closed the connection unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> connection to server was lost
>
>
> I think this one is mostly a weakness in how libpq tracks connection state,
> but it kind of shows the silliness of claiming postgres probably crashed.

I ran into this for the nth time (this time while trying to have psql
handle certain FATAL errors differently). Turns out fixing this is
actually really simple. All that's needed is to mark a connection as
CONNECTION_BAD whenever a FATAL or PANIC error is received by the
client.

(this change is intended for PG20)


Attachments:

  [text/x-patch] v1-0001-libpq-Consider-a-connection-with-a-FATAL-error-to.patch (2.5K, 2-v1-0001-libpq-Consider-a-connection-with-a-FATAL-error-to.patch)
  download | inline diff:
From a299e7c7779331dbc09d2c8bfc5923153e15763a Mon Sep 17 00:00:00 2001
From: Jelte Fennema-Nio <[email protected]>
Date: Tue, 26 May 2026 09:05:56 +0200
Subject: [PATCH v1] libpq: Consider a connection with a FATAL error to be
 closed

This starts marking a connection as closed (i.e. CONNECTION_BAD) when
the client receives a FATAL/PANIC error. Previously any FATAL error would get the
the "server closed the connection unexpectedly" string appended like such:

FATAL:  57P01: terminating connection due to administrator command
LOCATION:  ProcessInterrupts, postgres.c:3431
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.

This addition to the error is just plain incorrect, the server told the
client that it was closing the connection. So it's not unexpected, nor
did the server terminate abnormally. It also makes the error harder to
parse by a client, because it would lose the ability to use
PQresultErrorField on the final PGresult.
---
 src/interfaces/libpq/fe-protocol3.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/src/interfaces/libpq/fe-protocol3.c b/src/interfaces/libpq/fe-protocol3.c
index 840e018cd18..504e8592196 100644
--- a/src/interfaces/libpq/fe-protocol3.c
+++ b/src/interfaces/libpq/fe-protocol3.c
@@ -955,6 +955,25 @@ pqGetErrorNotice3(PGconn *conn, bool isError)
 					sizeof(conn->last_sqlstate));
 		else if (id == PG_DIAG_STATEMENT_POSITION)
 			have_position = true;
+		else if (isError && id == PG_DIAG_SEVERITY_NONLOCALIZED &&
+				 (strcmp(workBuf.data, "FATAL") == 0 ||
+				  strcmp(workBuf.data, "PANIC") == 0))
+		{
+			/*
+			 * A FATAL or PANIC from the server means the backend is going to
+			 * tear the connection down right after delivering this message.
+			 * Mark the connection bad immediately so callers that drain
+			 * results (PQexecFinish, PQexecStart's discard loop, etc.) stop
+			 * reading from the socket after receiving this result. Further
+			 * reads from the socket will receive an EOF, which would cause us
+			 * to incorrectly report this as an unexpected connection closure
+			 * by appending "server closed the connection unexpectedly ..." to
+			 * the server's own error message. We read SEVERITY_NONLOCALIZED
+			 * rather than SEVERITY so the check is independent of the
+			 * server's lc_messages setting.
+			 */
+			conn->status = CONNECTION_BAD;
+		}
 	}
 
 	/*

base-commit: 2c4bd2bf5700db98be0602854a8b7fa2c16b5f4a
-- 
2.54.0



^ permalink  raw  reply  [nested|flat] 5+ messages in thread


end of thread, other threads:[~2026-05-26 07:22 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-27 15:19 libpq maligning postgres stability Andres Freund <[email protected]>
2025-03-27 15:48 ` Robert Haas <[email protected]>
2025-03-27 16:03   ` Christoph Berg <[email protected]>
2025-04-07 22:37   ` Bruce Momjian <[email protected]>
2026-05-26 07:22 ` Jelte Fennema-Nio <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox