Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vC9z8-009lbZ-Dr for pgsql-hackers@arkaria.postgresql.org; Fri, 24 Oct 2025 05:05:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1vC9z7-00Dpq4-D0 for pgsql-hackers@arkaria.postgresql.org; Fri, 24 Oct 2025 05:05:04 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1vC9z7-00DppI-1f for pgsql-hackers@lists.postgresql.org; Fri, 24 Oct 2025 05:05:04 +0000 Received: from mail-qt1-x82c.google.com ([2607:f8b0:4864:20::82c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vC9z4-003Tcs-1S for pgsql-hackers@lists.postgresql.org; Fri, 24 Oct 2025 05:05:03 +0000 Received: by mail-qt1-x82c.google.com with SMTP id d75a77b69052e-4e8b20885dcso22720851cf.1 for ; Thu, 23 Oct 2025 22:05:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1761282301; x=1761887101; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=L0SilhYDqJLECzhOAjtyAk1N24KTPVqPqx+jSozq978=; b=Hfko8tkmEGRDDh28nr5GpvT07H6LvaDte8IdqcnP3T/9KJZKfPo2Qo5XEeJY+Ex7L3 okfkunbYVs4WEZxJKwCjYxjUZBfHvm0VaDi9LEOIKc/JxE0Rw53FfvK2Z++ZBGD5BvW+ kU5Clv2Hr3n8kGanakcl4y6oVWi/G7xE3DSFlUcKg45/NroaYHh+Lu4XTLBkHihucVHR 2R9VkNCnkAk4hM0o+iNM3n2LIoLheFmy1kK5nlIru4PDijd5Ek/qQQXNV9ZiSALgGwPF bxsz+3D1Fky0hVrbCgMJEAP7lze6QAxcxdzHWvDHq18gbpvHojeJthmkwHTocFxBrzox 6Ysg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1761282301; x=1761887101; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=L0SilhYDqJLECzhOAjtyAk1N24KTPVqPqx+jSozq978=; b=RcHd68bb1HTDM9lj0L+SO56bFLBFla368IsP53MSmT739MeXXjDAOtyZmlEJE32AjB kS7UtkPpg1bP8gUt8jPbG0rKeZFyFJwPKHmiMYzuwZvPJ6Bh39WTapQatcPRDUvxNPY5 cxwSAXGIS/NAfQGm7fhnQpFmSveHlSxt3V/5uLx7VANEOOoWXIxEHNtcvHapbbKeHnfO OwaefV7ymKR47iWzUCiNGt6xux/xDxIxQEl0pZyTMVGtE00oE+buQrQldbu5wX2TN2p7 7n2oAXeGQcPN9/qaSVMpGTkBFyC4pb7Yd7WzlE1DJZRLQyhDgOOmB61zADIGCF5I1gOS OPMg== X-Gm-Message-State: AOJu0YyoIi9dopHEsAZxQnncKBCN6JGDL1B2oAeX7H3lwjWeaw8Tl8qj LNMz6GQhwEOUi19FEi1vIUks83haYHVbninGfElzXf8uyxl/0vYf9QwZLmUIIDXbqbooPwuwu5t 6FmdTIfjlsrY6h2ZrGqNFqoKLKM8EveE= X-Gm-Gg: ASbGncuM8TPYQjLzBavC8O7Al3iO0o6Oc+izXi4XAsikUiVEH32/od6hyJDqUJazsFd O4nnmagSH2fp+imgvUHMKblj3keI231Srl5ONjGaB+npvgkSd6hWTXUkp9MzK4JXL4SNlDzqFJH aP8sKfZMSUiUyTn39qf1VwdhxP9ZsmZ+PaTivHzxaDBT3kHld3kr2Qs9RUMJS8i3BnksBrjVyDo WCBhtefPZGF2noscM5b7n/hYkkMD1bXV3IwwXLcMALSPhjv0DdIBfPW6YlNz2hMwsaEGl4Xnw== X-Google-Smtp-Source: AGHT+IHPPhMQoheOAxaImobRSQ/ZTiXGyLuwtjyjA99LKTuK0NcrnitBWwr+xfquK5mOHf2n4BuiYSzdygF0/nJNkjI= X-Received: by 2002:a05:622a:1108:b0:4e8:86ea:9efd with SMTP id d75a77b69052e-4e89d1dd671mr371396671cf.4.1761282301214; Thu, 23 Oct 2025 22:05:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Kirill Reshke Date: Fri, 24 Oct 2025 10:04:50 +0500 X-Gm-Features: AS18NWBwTDU21DUK2lgKFp5Rzo-vWpLuqvAH_sShMq9dCERj5tewdWYrj0eOUS8 Message-ID: Subject: Re: Add GoAway protocol message for graceful but fast server shutdown/switchover To: Jelte Fennema-Nio Cc: PostgreSQL Hackers , Dave Cramer , Jacob Champion , Heikki Linnakangas Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 23 Oct 2025 at 18:05, Jelte Fennema-Nio wrote: > > This change introduces a new GoAway backend-to-frontend protocol > message (byte 'g') that the server can send to the client to politely > request that client to disconnect/reconnect when convenient. This message is > advisory only - the connection remains fully functional and clients may > continue executing queries and starting new transactions. "When > convenient" is obviously not very well defined, but the primary target > clients are clients that maintain a connection pool. Such clients should > disconnect/reconnect a connection in the pool when there's no user of > that connection. This is similar to how such clients often currently > remove a connection from the pool after the connection hits a maximum > lifetime of e.g. 1 hour. > > This new message is used by Postgres during the already existing "smart" > shutdown procedure (i.e. when postmaster receives SIGTERM). When > Postgres is in "smart" shutdown mode existing clients can continue to > run queries as usual but new connection attempts are rejected. This mode > is primarily useful when triggering a switchover of a read replica. A > load balancer can route new connections only to the new read replica, > while the old load balancer keeps serving the existing connections until > they disconnect. The problem is that this draining of connections could > often take a long time. Even when clients only run very short > queries/transactions because the session can be kept open much longer > (many connection pools use 1 hour max lifetime of a connection by default). > With the introduction of the GoAway message Postgres now sends this > message to all connected clients when it enters smart shutdown mode. > If these clients respond to the message by reconnecting/disconnecting > earlier than their maximum connection lifetime the draining can complete > much quicker. Similar benefits to switchover duration can be achieved > for other applications or proxies implementing the Postgres protocol, > like when switching over a cluster of PgBouncer machines to a newer > version. > > Applications/clients that use libpq can periodically check the result of > the new PQgoAwayReceived() function to find out whether they have been > asked to reconnect. Hi! Im +1 on this idea. This is something I wanted back in 2020, when implementing the 'online restart' feature for odyssey[0], but never bothered to create a thread. Due to its asyn engine complexity, odyssey cannot simply reuse tcp connections from 'old' binary, so we accept new connections in new binary and try to drop connections in old binary with some rate. About patches: in 0001: >+ >+ if (strcmp(value, "latest") == 0) >+ { >+ *result = PG_PROTOCOL_LATEST; >+ return true; >+ } Not needed? we already have this check at the beginning of pqParseProtocolVersion In 0002: > + The GoAway message is sent by the server during a > + smart shutdown to politely request that clients disconnect. I'm not sure this wording is super-foolproof. First of all, is it 'client', not 'clients'? Looks like we should describe single client to single server interaction in this doc. Maybe also change the last sentence to ' ... to instruct clients to disconnect.' ? Maybe this wording is not great also, but I want to reflect in doc that disconnection is strongly advised, yet not obligatory > + Applications should check this flag > + periodically and disconnect gracefully when possible, such as after > + completing the current transaction or unit of work. What flag? Also, 'Applications should' - no, they shouldn't, is it just an option? Maybe we should change wording to something like 'Applications can decide that it is recommendatory to close (or maybe re-open) their connection with the server as soon as they get at least one 'GoAway' msg.' Also, can the server send more than one 'GoAway' msg? If yes, should we document this? > - * notice. (An ERROR is very possibly the backend telling us why > + * notice. (An ERROR is very possibly the backend telling us why This change is unrelated Other coding changes looks straightforward and are fine to me. [0] https://github.com/yandex/odyssey -- Best regards, Kirill Reshke