Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1up3gP-00FXQt-R1 for pgsql-general@arkaria.postgresql.org; Thu, 21 Aug 2025 11:42:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1up3gN-00Fzaf-Ll for pgsql-general@arkaria.postgresql.org; Thu, 21 Aug 2025 11:42:16 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1up3gN-00FzZK-Aa for pgsql-general@lists.postgresql.org; Thu, 21 Aug 2025 11:42:15 +0000 Received: from mail-wm1-f51.google.com ([209.85.128.51]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1up3gL-0015PK-2t for pgsql-general@lists.postgresql.org; Thu, 21 Aug 2025 11:42:15 +0000 Received: by mail-wm1-f51.google.com with SMTP id 5b1f17b1804b1-45a1b0c8867so6604645e9.3 for ; Thu, 21 Aug 2025 04:42:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1755776532; x=1756381332; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=UDjhUQgc4OeoEvV5WJcIsHkUlS+5R/ZYUU14bxKwt8o=; b=jgFOuNk16vF6vJCcvOhc6JMQrG6GBH2rRjSDMxosJn4cAFGYuH+Fg7kUQh5UO1bxei rR86p6fXYlg4PnttZ92YcmWaxfVnTIY10HEoCa/sfXcj84IaWjP2QnUH2NeYMuZqqXKB 96Hn6rtJ8+Mxi7VWZoYksKjOj9jXI0pCdEacyAtanLbuHcYg6q8Tkx2Eh955FE+sDbSv F4gxz76LjSzMeG4W9L0KVb1olUr47if4CAjSXn0gK45v77P/UYHnSabqfcFoVeD5J47Y m5N3c9P2UEWUbUrThiZklWz5dPFrZcccvk1ZCNPFFmjH74EJvDUTyul9iyE1qdrH1AW3 t2/g== X-Forwarded-Encrypted: i=1; AJvYcCUM7MaRR/yCTB8Q82kREZJ+HjeDkvLiNLsnwDkaDL4/iXORjH9phomREN+ssgTXlUpMLnWCknlKGerXp5Wf@lists.postgresql.org X-Gm-Message-State: AOJu0YyrVnULC9avdHGp7uiib+PLRtcplkZvISRR71dDTPIfleb5BrvA Etg+MKnEIXYAlbzpqODr2lSDDLLv2QHvKTyDWLoPZSYsPB2CsjKoqt11Po1dhB8X8GDeXwI8GBj LbzLhB1bhhitZkQJS9GZ1R7I03qhBSL3RyR5Qtc1NFcCP X-Gm-Gg: ASbGnctLZWQ6WojM3SwrIiWotfI5z1xdUVXGBNNdxnCf9f2xBHgP1dEt1A5JujAgLMM aodWtcX1HpF2F+EPuK7ubiSkmcJKg+2wXF47Tfz6t1rqj0ggzSa5lHwIg+lrP8974sckMu+bIvB FzoqpMm5Y7tzttJ6o0IbckY+qdAmkp4dZEKVM4bXkIBSN9tjVyk1auF1nd3wqPNvVbpr/OsJACm Zshcnay X-Google-Smtp-Source: AGHT+IHgORFnAMWPVeZUiOyn7dTZsoZEEtWNtEoci9guBG+C8aP3xASD9Mk/LCJ/QQVzlQagTe2YoXCX24MmXEMt3yE= X-Received: by 2002:a05:600c:1392:b0:458:bf9c:274f with SMTP id 5b1f17b1804b1-45b4d859100mr18865185e9.29.1755776532192; Thu, 21 Aug 2025 04:42:12 -0700 (PDT) MIME-Version: 1.0 References: <05969854-0d19-4726-ae1b-586659dd443b@aklaver.com> In-Reply-To: From: Thom Brown Date: Thu, 21 Aug 2025 12:41:44 +0100 X-Gm-Features: Ac12FXwuzyA4LjIXAeYvSz1QJn1LrpKvrCxF4YMj5XFmkUuJ_R89rcS4dYh5ArQ Message-ID: Subject: Re: Streaming replica hangs periodically for ~ 1 second - how to diagnose/debug To: depesz@depesz.com Cc: Adrian Klaver , PostgreSQL General Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, 21 Aug 2025 at 11:03, hubert depesz lubaczewski wrote: > > On Wed, Aug 20, 2025 at 06:30:00PM +0100, Thom Brown wrote: > > Do you have THP enabled? Can you use mpstat and see what %steal shows as? > > Sorry, what is "THP"? I tried searching for "what is thp", and most > common search results are related to some chemical compound. Ah, yeah I meant transparent hugepage: cat /sys/kernel/mm/transparent_hugepage/enabled This should show it being set as "never". > > mpstat, yes, we have it. Let me quickly show what I get: > > Two days ago, at ~ 10:10pm UTC I saw this in Pg logs: > > # grep -oP '^2025-08-19 22:09:2\d\.\d+ UTC' postgresql-2025-08-19_220000.csv | uniq -c | grep -C3 -P '^\s*\d\d' > 2 2025-08-19 22:09:29.084 UTC > 1 2025-08-19 22:09:29.094 UTC > 2 2025-08-19 22:09:29.097 UTC > 70 2025-08-19 22:09:29.109 UTC > 90 2025-08-19 22:09:29.110 UTC > 6 2025-08-19 22:09:29.111 UTC > 1 2025-08-19 22:09:29.153 UTC > 1 2025-08-19 22:09:29.555 UTC > > As you can see we have 70, and then 90 messages all logged with the same > timestamp. All of them (160) were "duration:x" lines. > > At the time, mpstat (called using `mpstat 5`) looked like this: > > CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle > 22:09:04 all 2.25 0.00 0.27 0.24 0.00 0.10 0.00 0.00 0.00 97.14 > 22:09:09 all 2.70 0.00 0.28 0.27 0.00 0.12 0.00 0.00 0.00 96.63 > 22:09:14 all 2.83 0.00 0.63 0.23 0.00 0.09 0.00 0.00 0.00 96.22 > 22:09:19 all 2.66 0.00 0.28 0.23 0.00 0.10 0.00 0.00 0.00 96.74 > 22:09:24 all 2.36 0.00 0.27 0.21 0.00 0.10 0.00 0.00 0.00 97.06 > 22:09:29 all 2.49 0.00 0.40 0.40 0.00 0.11 0.00 0.00 0.00 96.60 > 22:09:34 all 2.65 0.00 0.28 0.23 0.00 0.11 0.00 0.00 0.00 96.73 > 22:09:39 all 2.26 0.00 0.26 0.21 0.00 0.09 0.00 0.00 0.00 97.19 > 22:09:44 all 2.31 0.00 0.25 0.22 0.00 0.09 0.00 0.00 0.00 97.13 > 22:09:49 all 2.44 0.00 0.22 0.24 0.00 0.08 0.00 0.00 0.00 97.02 > 22:09:54 all 2.23 0.00 0.23 0.28 0.00 0.10 0.00 0.00 0.00 97.16 > 22:09:59 all 2.48 0.00 0.26 0.30 0.00 0.11 0.00 0.00 0.00 96.85 > 22:10:04 all 2.29 0.00 0.28 0.26 0.00 0.12 0.00 0.00 0.00 97.06 > 22:10:09 all 2.31 0.00 0.22 0.21 0.00 0.11 0.00 0.00 0.00 97.14 > 22:10:14 all 2.38 0.00 0.25 0.21 0.00 0.10 0.00 0.00 0.00 97.05 > 22:10:19 all 2.54 0.00 0.24 0.20 0.00 0.11 0.00 0.00 0.00 96.91 > 22:10:24 all 2.26 0.00 0.25 0.22 0.00 0.10 0.00 0.00 0.00 97.17 > 22:10:29 all 2.12 0.00 0.27 0.24 0.00 0.09 0.00 0.00 0.00 97.27 > 22:10:34 all 2.56 0.00 0.26 0.26 0.00 0.12 0.00 0.00 0.00 96.80 > 22:10:39 all 2.16 0.00 0.23 0.24 0.00 0.10 0.00 0.00 0.00 97.27 > 22:10:44 all 2.37 0.00 0.26 0.22 0.00 0.10 0.00 0.00 0.00 97.05 > 22:10:49 all 2.25 0.00 0.25 0.61 0.00 0.09 0.00 0.00 0.00 96.80 > 22:10:54 all 2.41 0.00 0.28 0.22 0.00 0.10 0.00 0.00 0.00 96.99 > 22:10:59 all 2.83 0.00 0.29 0.19 0.00 0.12 0.00 0.00 0.00 96.57 This output looks fine, so it doesn't show anything concerning, so suggests the issue is somehow on the Postgres side. Did you happen to poll pg_stat_activity at the time to see whether you had lots of IPC waits? I'm wondering whether the storage layer is freezing up for a moment. Thom