Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1upR3N-0055rS-Q9 for pgsql-general@arkaria.postgresql.org; Fri, 22 Aug 2025 12:39:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1upR3N-005g69-6x for pgsql-general@arkaria.postgresql.org; Fri, 22 Aug 2025 12:39:33 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1upR3M-005g61-TF for pgsql-general@lists.postgresql.org; Fri, 22 Aug 2025 12:39:33 +0000 Received: from lana.depesz.com ([88.198.49.178] helo=depesz.com) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1upR3L-001BwI-0g for pgsql-general@lists.postgresql.org; Fri, 22 Aug 2025 12:39:32 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=depesz.com; s=20170201; h=In-Reply-To:Content-Transfer-Encoding:Content-Type:MIME-Version :References:Reply-To:Message-ID:Subject:Cc:To:Sender:From:Date:Content-ID: Content-Description; bh=N2JLSR1IY5/CRvev74ZeLuaFewd7QQRJo6NODxgYpqA=; b=QfMnw geF68PrcfKj6vyfY4Y3yCsBySTxJ7pORQesFDGKQDFstCC/2WW5HhyqJQuHOj20agu7HgMV2lu+Qg Ejdw/ltQ7ljtUUPS6zkcj1g5orbkkbXAppPo11OAomqzeqHBD734YnUvNrm1Bg39vlafmPpUn9Oul jbkA/Z01kYXc=; Received: from depesz by depesz.com with local (Exim 4.96) (envelope-from ) id 1upR3J-00CGIu-2X; Fri, 22 Aug 2025 14:39:29 +0200 Date: Fri, 22 Aug 2025 14:39:29 +0200 From: hubert depesz lubaczewski Sender: depesz@depesz.com To: Chris Wilson Cc: Adrian Klaver , PostgreSQL General Subject: Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug Message-ID: Reply-To: depesz@depesz.com References: <05969854-0d19-4726-ae1b-586659dd443b@aklaver.com> <25334887-f1c3-40a1-94b0-753c7d67ae2b@aklaver.com> <2a3e4a8d-e8c2-46d6-ad7d-9e631ce6725e@aklaver.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Aug 21, 2025 at 07:38:34PM +0100, Chris Wilson wrote: > If all your queries are coming through pgBouncer, and only those hang (the > server itself responds if you connect directly to it), then it might be > this pgBouncer issue: > > https://github.com/pgbouncer/pgbouncer/issues/1054 > > Although that issue is now "closed", because the invisible "debug" log > message was upgraded to a warning (and I don't think that change is in any > released version), the underlying problem still exists: pgbouncer hangs > completely (stops forwarding packets) for a while if the PAM > authentication queue becomes full. > > If you have a relatively slow PAM service (such as pam_ldap) then you can > trigger it by opening ~100 connections to pgBouncer simultaneously (without > waiting for previous ones to authenticate), something like this: > > for i in `seq 1 100`; do psql -h pgbouncer -p 6432 -U user db_name -c > "SELECT 1" & done Please note that during the time of problem *NOTHING* seems to be happening in Pg. There are no messages in logs about anything. We don't have direct access to db, so there is no sane way to check if directly sent queries will work. I can, of course, start shell, and run queries there, but ephemeral nature of the problem makes it more difficult. As for authentication - there is no ldap/pam in place. All auth is based on certs/passowrds-in-file, and we never noticed authentication slowdowns. Plus - queries like 'DISCARD ALL' are not ran because someone connected… Best regards, depesz