Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1udqfo-001Joc-6p for pgsql-general@arkaria.postgresql.org; Mon, 21 Jul 2025 13:35:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1udqfn-0013YE-57 for pgsql-general@arkaria.postgresql.org; Mon, 21 Jul 2025 13:35:19 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1udqfm-0013Y6-Pr for pgsql-general@lists.postgresql.org; Mon, 21 Jul 2025 13:35:19 +0000 Received: from mail-ed1-x52d.google.com ([2a00:1450:4864:20::52d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1udqfk-0001q0-1Y for pgsql-general@lists.postgresql.org; Mon, 21 Jul 2025 13:35:17 +0000 Received: by mail-ed1-x52d.google.com with SMTP id 4fb4d7f45d1cf-612a8e6f675so6323280a12.3 for ; Mon, 21 Jul 2025 06:35:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1753104913; x=1753709713; darn=lists.postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:from:to:cc:subject:date :message-id:reply-to; bh=nif7YcLRkll13lUm8FxM1dmVrEU5l/KJlMSl/r4+k7w=; b=oUEENz/YD6zDJ98uLKq42+Zwa+N/qB4h+B9rpbF/bnxPrc+O578VDe7nT3/0+N91FU iPBx++Dbukd3MmnfCO6PwDlc23K8EYbUTaFdpp7cxblTX1aTmaeHBWUbm6d34+iftO/V h8CYj/MLIL2N8d/ucw/rcqz8KYrgi9zFdfZs7hT4XzdlGCoG/R1RKeBiAb4V1tS/2r/x U1g6iTkHaEPGs3ifCfC4MRdXZhjWUaiAAAcLZb6fHlv82AGKQffuCPyybvpcOGTGbcQk +bEaLGlSKKATAyJHpf6flQMaW18fdMmadlPHqpz55WTi4PautUby1t2sK/O+/c2P8yE/ zASw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753104913; x=1753709713; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:to:from:subject:message-id:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=nif7YcLRkll13lUm8FxM1dmVrEU5l/KJlMSl/r4+k7w=; b=YbHW0m6jLVcN8Rq38NJTOUywswF19HcpccXkvSIIC0qJKKoqKFm589JB4MqcSPNnko U+R5HbjjeWS4ERhMGQRiUCbyoTwkQ2+mhX4f09DInOe+GAKKHEQp/C6tZgH1xoBqgrtm iGDUS9fU5Tzihl2k9sAxMs9fD2MHStzrJdNoi1Qyto8DZZdE1HnpVGESCqU5nVhrF0zj 2cockHjxTUrA30EGfzvg4Z9Xm7c40svp8CewW1xkDlvdE9xhwuVlSxCj7P9ZL7rZ2niq VdEr/NVguPsgwbrTBodWVwlnhzgLRicXs/G48BtpylS5poVverQv/VBsKzVG2Bkm3I+t 3O2g== X-Forwarded-Encrypted: i=1; AJvYcCVnNLOmZ/TrQ9oTYB2Wgu3eSnJrkKlHuBsqvhOggbrZNngN+wOOO6RVubV7fuPIi21A3PiILa/oIvoRevaI@lists.postgresql.org X-Gm-Message-State: AOJu0YxKv6TuMeja1B6TOsixRqplgaFleM0lE4lMI87nbKxXnhWgWwum 8EnczBeCJXSKNDT7z+EQQiqS/wVzzPYJQpx1me0toK/1tCQf/kEtACOyZd0uPwbdc88= X-Gm-Gg: ASbGncsnrP4SCRuSDx48kfd2HnifX/gHlxfaLBtETskwiIDqIqnqImntTxk+dMYSWLp WjKFPiqMlm49G3oTrk47iTMlTCL6Dqn401reE8qi8gelmuuZAIbGuED9XjI72z8T9AGxhImG4Gs zQCO7U5ZVWHJglS6D4fhv3afHzpAPo/1aL1tkxSKXFwyhZ7mMhdroBIxurngEczm8n0Rh1mSlZx V0DsNwKPzaRqoSYLc8We17tjWmA1Lc32WhvmsRXTqN/NxEqeOoQE718uFV3JlqHAZQfn0ZYveC7 EOhz0e/c3ltSxUqKO0JsEiCmh90wkhzQHJ7FI8pw3PmuldUD9GNT25G3tsF9kUXvyQmgrVk05Ha gBmblT5PK4MTGB2GrFzjO9/2/qNutBlAsZbgT0UuXvfG0v4Oycv2E X-Google-Smtp-Source: AGHT+IHpwVSzDUklfUVGIgC4K9hiMYbpVGoXfMV56PRfWz+fEpb3DecsGiUBNZdY6gfGbEOYJgcDBw== X-Received: by 2002:a17:907:d24:b0:ae2:9291:9226 with SMTP id a640c23a62f3a-ae9ce1e71b1mr1952796766b.59.1753104912684; Mon, 21 Jul 2025 06:35:12 -0700 (PDT) Received: from laurenz.albe-K4N0CV00F97414D ([2001:871:5e:b993:1c5b:9f06:75d3:1eaa]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-aec6cafd0e5sm675448266b.167.2025.07.21.06.35.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Jul 2025 06:35:12 -0700 (PDT) Message-ID: <7b0ac868e97508aac3661a6a665fea00be2b923e.camel@cybertec.at> Subject: Re: Postgresql 16.9 fast shutdown hangs with walsenders eating 100% CPU From: Laurenz Albe To: Klaus Darilion , "pgsql-general@lists.postgresql.org" Date: Mon, 21 Jul 2025 15:35:11 +0200 In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.56.2 (3.56.2-1.fc42) MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, 2025-07-21 at 10:47 +0000, Klaus Darilion wrote: > (Note: I have also attached the whole email for better readability of the= logs) Your mail looks good enough the way it is: https://postgr.es/m/DBAPR03MB6358854AD71C8ABA5CA10A8DF15DA%40DBAPR03MB6358.= eurprd03.prod.outlook.com > Our setup: 5 Node Patroni Cluster with PostgreSQL 16.9. > db1: current leader > db2: sync-replica > db3/4/5: replica > =C2=A0 > The replicas connect to the leader using the host IP of the leader. So th= ere are > 4 walsender for patroni, 1 sync and 3 async. > =C2=A0 > The patroni cluster utilizes a service IP-address (VIP). The VIP is used = by all > clients connecting to the current leader. These clients are: > - some web-apps doing normal DB queries (read/write) > - 2 barman backup clients using streaming replication > - 58 logical replication clients > =C2=A0 > Additionally we use https://github.com/EnterpriseDB/pg_failover_slots to = sync and > advance the logical replication slots on the replicas. The failover_slots= plugin > periodically connects to leader using the VIP. > =C2=A0 > We had a planned maintenance and wanted to switch the leader from db1 to = db2: > 12:32:18: patronictl switchover --leader db1 --candidate db2 > =C2=A0 > So postmaster received the fast shutdown request from Patroni and started > shutting down the client connection processes: > =C2=A0 > Usually the switchover only takes a few seconds. After waiting a few minu= tes > we became anxious and started debugging. > =C2=A0 > Using "ps -Alf|grep postgres" we saw that there were no more normal clien= t > connections, but still 58 logical replicaton walsender processes and > 6 streaming replication walsenders. > "top" revealed that the walsenders were eating CPU. We have had a somewhat similar report: https://www.postgresql.org/message-id/flat/18985-64431d78bcabae95%40postgre= sql.org What is the logical decoding plugin you are using? If it is "pgoutput", what are the walsenders doing? You can try "strace" an= d use "gdb" to break into the walsenders and take a stack trace. Yours, Laurenz Albe