Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uokWa-009Vy6-Ez for pgsql-general@arkaria.postgresql.org; Wed, 20 Aug 2025 15:14:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uokWZ-009CRO-8g for pgsql-general@arkaria.postgresql.org; Wed, 20 Aug 2025 15:14:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uokWY-009CRC-UI for pgsql-general@lists.postgresql.org; Wed, 20 Aug 2025 15:14:51 +0000 Received: from fout-a2-smtp.messagingengine.com ([103.168.172.145]) by makus.postgresql.org with smtp (Exim 4.96) (envelope-from ) id 1uokWX-000rgA-0Q for pgsql-general@lists.postgresql.org; Wed, 20 Aug 2025 15:14:50 +0000 Received: from phl-compute-03.internal (phl-compute-03.internal [10.202.2.43]) by mailfout.phl.internal (Postfix) with ESMTP id D966BEC05BB; Wed, 20 Aug 2025 11:14:48 -0400 (EDT) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-03.internal (MEProxy); Wed, 20 Aug 2025 11:14:48 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1755702888; x=1755789288; bh=co2ETXRRQXbdlCJ5V/A62xoXog4lSfYtgEfD1tUWAiA=; b= VUXd4RAumJN9CIQc0oHuVrsMI1gCG1dLstiE0bMHY7yuah4qKtEKq4zuOSao6Evi WvPcCEiCQQ9tiG+EUJc/ScoChEu1KSlRu1K+ZoRmu3Np+2R06LRjNTdLCQilUJr5 QQA/BKM5BDb9rY4UUtKpV6Y74UrXaGCrCZDOHsjQtLmRSsHzWIsyqi4GUnn3H5Ln q+3aMcZtpIklXLw009/Zr7cvpEFij5x9+MKRyKRWWHhoKd6+CDVqTnFIWF0dzw8k I5U97wKgz1igmzPqCTtTlnChuj9NzOkCCta9FP30u1DDnWojDeQzj2uGWZAAURMG THQDlmEnoykIxey5GooKzg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1755702888; x= 1755789288; bh=co2ETXRRQXbdlCJ5V/A62xoXog4lSfYtgEfD1tUWAiA=; b=R ypCj78U4yTiTegkASHQKTwYbSlMi454IVXJi7we9DrBzDhWWXZwynhOsRhefUGny 0bVlzbmpIzPmkluPMDKDvk8uXzCC5D77gNfec1d4WmjIo9NvklEQstkKritbjgC3 qozvTjUiAX36vgHKexva51XYk4ZoxLwVfdEuxs7Cmk8Iy/FnoWQWADsRxl1V0ONd 0NAr9kW6Nj0bR99q7J5OHRR2j7WuCM8ZUjAZOOQYmtaKpJxSlNBqWHa3NQJV74FJ hHJzx8vjIK1b+5RJ2OjVnuxAXbG624KryRHhm1JzSOELhHCjGYy7G3RSH84BbbC/ VG/6FKS/Txf2YE7keXX7A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeeffedrtdefgdduheekieelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhepkfffgggfuffvvehfhfgjtgfgsehtjeertddtvdejnecuhfhrohhmpeetughrihgr nhcumfhlrghvvghruceorggurhhirghnrdhklhgrvhgvrhesrghklhgrvhgvrhdrtghomh eqnecuggftrfgrthhtvghrnhephfeviefhveelffeftdehudekveefhfeftdegieefveet fffgfeehtdfftedutedtnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrg hilhhfrhhomheprggurhhirghnrdhklhgrvhgvrhesrghklhgrvhgvrhdrtghomhdpnhgs pghrtghpthhtohepvddpmhhouggvpehsmhhtphhouhhtpdhrtghpthhtohepuggvphgvsh iiseguvghpvghsiidrtghomhdprhgtphhtthhopehpghhsqhhlqdhgvghnvghrrghlsehl ihhsthhsrdhpohhsthhgrhgvshhqlhdrohhrgh X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Wed, 20 Aug 2025 11:14:47 -0400 (EDT) Message-ID: <05969854-0d19-4726-ae1b-586659dd443b@aklaver.com> Date: Wed, 20 Aug 2025 08:14:47 -0700 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug To: depesz@depesz.com Cc: PostgreSQL General References: Content-Language: en-US From: Adrian Klaver In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 8/20/25 04:32, hubert depesz lubaczewski wrote: > On Tue, Aug 19, 2025 at 11:39:03AM -0700, Adrian Klaver wrote: >>> Every now and then (usually every 3-5 minutes, but not through the whole >>> day), we see situations where every query suddently takes ~ 1 second. >> Given the subject line, what you are reporting is happening on the replica, >> correct? > > Yes. > >> If so where is the replica relative to the primary, in terms of network >> distance? > > =$ ping -c 10 primary > reports: > 10 packets transmitted, 10 received, 0% packet loss, time 9181ms > rtt min/avg/max/mdev = 0.942/0.956/0.991/0.012 ms > >> Also what are the 'hardware' specifications on the replica instance? > > c8g.48xlarge ec2 instance. It is arm64, 192 cores, with 384 gb of ram. > > As for storage, this is relatitvely slow, because this db is rather > small: > gp3 500gb volume, with 6000 iops. At no point is IO in any way close to > limits, the whole db fits easily in ram. Hmm. From initial post: "For ~ 1 second there are no logs going to log (we usually have at 5-20 messages logged per second), no connection, nothing. And then we get bunch (30+) messages with the same milisecond time." Are the 30+ messages all coming in on one connection or multiple connections? Also to be clear these are statements that are being run on the replica locally, correct? Does the AWS monitoring indicate any issues? > > Best regards, > > depesz > -- Adrian Klaver adrian.klaver@aklaver.com