Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1up6q8-00Gp18-1G for pgsql-general@arkaria.postgresql.org; Thu, 21 Aug 2025 15:04:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1up6q6-00Gkzl-HJ for pgsql-general@arkaria.postgresql.org; Thu, 21 Aug 2025 15:04:31 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1up6q5-00Gkzc-EW for pgsql-general@lists.postgresql.org; Thu, 21 Aug 2025 15:04:30 +0000 Received: from fhigh-b4-smtp.messagingengine.com ([202.12.124.155]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1up6q3-0012Ix-14 for pgsql-general@lists.postgresql.org; Thu, 21 Aug 2025 15:04:29 +0000 Received: from phl-compute-06.internal (phl-compute-06.internal [10.202.2.46]) by mailfhigh.stl.internal (Postfix) with ESMTP id EE05C7A0125; Thu, 21 Aug 2025 11:04:26 -0400 (EDT) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-06.internal (MEProxy); Thu, 21 Aug 2025 11:04:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1755788666; x=1755875066; bh=5c8chF07y08TRvR+24G/PH67dh6hT1Aqz/O+rOrSh94=; b= WuX9GNZXrtrq1d4Nm3Zm4VI4uDdrr2GZSFUXAyqKZYrKIAXj7GumqCL08y8mNIzo 74i90QF6RTQE8Ij7mDeuxo7Rh8XMxoqt7Jr1Pw96rLo5AX4yiU36nxbUFuFPjVj9 u2lZVrL5p6PKxGktF51yczou8W9oqvib9ztL4SZE9b9dwwdZ+Ifr9Ogl/krQ+or4 wkrJ7qaS6KAuJL+1IN7+5xZFODuY3179oUtcgV8U3B0x0NTsoSl+/1BNJdMYZTgW nRhHZuCFO7FT3q5DhDaKuK9MmmkiucGBU5nWNjVdoPRC5AaYlhKRk1J56vqDSnyz tpenHQve+lx2yHx4S5EmJA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1755788666; x= 1755875066; bh=5c8chF07y08TRvR+24G/PH67dh6hT1Aqz/O+rOrSh94=; b=G PT59JJP83EDbZNKgpog9347SkFNmTGVbeJ9h8Afk3pqpr4G7693gRdgJqsRUcEwe FuJGiem0enqmX+cAJEWvht1crCL1Wp5WSJKuwNb4Q2OtCkTTzRe0PyxAMLpmjQar XGDiqGZYFxr+rqPvPlq3KfkihG9ogjH4JfYiwZ90LsXp2bF7JLN+XiU0RN/UTCkx wIRnaaX+8lq+UT0uoKdyZTVUAeXDmbKVB5v1AXCoLF0cNyv0u6Bq+qd9UBwU4B3O li+c1NSAJdAqtREOdfxqGfkmKgSlQX6alXwaqAxCrfTVPHOcQzUyofzKagUj59MO bhBXNP3ddBdfQbPcLw8Hw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdduieduheehucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeetughrihgr nhcumfhlrghvvghruceorggurhhirghnrdhklhgrvhgvrhesrghklhgrvhgvrhdrtghomh eqnecuggftrfgrthhtvghrnhephfeviefhveelffeftdehudekveefhfeftdegieefveet fffgfeehtdfftedutedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrg hilhhfrhhomheprggurhhirghnrdhklhgrvhgvrhesrghklhgrvhgvrhdrtghomhdpnhgs pghrtghpthhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepuggvphgvsh iiseguvghpvghsiidrtghomhdprhgtphhtthhopehpghhsqhhlqdhgvghnvghrrghlsehl ihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrgh X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 21 Aug 2025 11:04:25 -0400 (EDT) Message-ID: <25334887-f1c3-40a1-94b0-753c7d67ae2b@aklaver.com> Date: Thu, 21 Aug 2025 08:04:25 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug To: depesz@depesz.com Cc: PostgreSQL General References: <05969854-0d19-4726-ae1b-586659dd443b@aklaver.com> Content-Language: en-US From: Adrian Klaver In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 8/21/25 03:07, hubert depesz lubaczewski wrote: > On Wed, Aug 20, 2025 at 10:45:13AM -0700, Adrian Klaver wrote: >> On 8/20/25 09:08, hubert depesz lubaczewski wrote: >>> On Wed, Aug 20, 2025 at 08:14:47AM -0700, Adrian Klaver wrote: >>>> Hmm. >>>> >>>> From initial post: >>>> >>>> "For ~ 1 second there are no logs going to log (we usually have at 5-20 >>>> messages logged per second), no connection, nothing. And then we get >>>> bunch (30+) messages with the same milisecond time." >>>> Are the 30+ messages all coming in on one connection or multiple >>>> connections? >>> Multiple connections. >>>> Also to be clear these are statements that are being run on the replica >>>> locally, correct? >>> What do you mean locally? >> I should have been clearer. Are the queries being run against the replica or >> the primary? > > All to replica. Primary has its own work, of course, but the problem > we're experiencing is on replicas. If I am following there is more then one primary --> replica pair and the problem exists across all the pairs. >> How many applications servers are hitting the database? > > To be honest, I'm not sure. I have visibility into dbs, and bouncers, > not really into Apps. I know that these are automatically dynamically > scaled, so number of app server is very varying. > > I'd say anything from 40 to 200 app servers hit first layer of bouncers, > which we usually have 6-9 (2-3 per az). > > These go to 2nd layer of bouncers, on the db server itself. By bouncer I assume you mean something like pgBouncer, a connection pooler. Is it possible to determine what bouncer the queries in question are coming from? > > Best regards, > > depesz > -- Adrian Klaver adrian.klaver@aklaver.com