Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufh3Y-00AVRk-2B for pgsql-general@arkaria.postgresql.org; Sat, 26 Jul 2025 15:43:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ufh3W-00ANmN-5g for pgsql-general@arkaria.postgresql.org; Sat, 26 Jul 2025 15:43:26 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ufh3V-00ANmE-Qn for pgsql-general@lists.postgresql.org; Sat, 26 Jul 2025 15:43:26 +0000 Received: from mail-pf1-x432.google.com ([2607:f8b0:4864:20::432]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1ufh3S-000yjv-2Z for pgsql-general@lists.postgresql.org; Sat, 26 Jul 2025 15:43:25 +0000 Received: by mail-pf1-x432.google.com with SMTP id d2e1a72fcca58-748e63d4b05so1953237b3a.2 for ; Sat, 26 Jul 2025 08:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1753544601; x=1754149401; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=mMBBZXQJ7scq5dkZ/J/mzPQnP25pyCuq/04nTFWa1ik=; b=U7/oSkDAHB8TW+xWYlsVGn6UEWyFd8zkZz0m65BDJ+ZBZwm14DvHcu1bWIo6kHI4mD J3pWgRxAXS9pttKfIBY99YxiQC/HuzWoJrE4364N2BiXWZRV3DIjvLX+VmQA1xbn1mO+ SBSMMPCsXjCXcWMF8Sas3Su3N+xgI1nlPtcHvakU74U6v5AP0+VIgubPSaE7tzo91WX4 AK01v+BOV8HFP7LUXsRIP8I+5xolWabtpncw49VuiB1hJ3LlzXHLdaNmqBfRGMnJrG98 L5U30qnJ4AXH0lI2Xp1eWPXdv7/RXiYbj1uwK7UGbGT3QGerP4rsaQYpLoHi1IX3CrRd w80Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753544601; x=1754149401; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mMBBZXQJ7scq5dkZ/J/mzPQnP25pyCuq/04nTFWa1ik=; b=xIQc/FQrj1nV2y/im4p/KlRv7ijd6t1D4OcfaTs8jAZbwnNqZof8esLUesbrYMIOIG c7Fp0lmkFdZRpGSk68ZDfTqnAZgCYaEVOc1eNo3RT9TJysxt3d+zK7wD17X9mGBtPpDj DZqy8zeVIO4AhpJB74QZT6901kWVkuGT8LfjilVsPTFAtQJT9/aiD2PE/J3wKpwWvPFV wbDGoiLXjJbY5mCR821O96Pa1xtfskHUSic15taxDCCoSQYzKLvavEMAM7ArRorRxpx6 MvFhG4cY4b8hk2PGqRsUwWJKbNoL8wbMgPjpO3XYWTqwmaSOh0eH8SWN/sf4Y92IH4c0 V/DQ== X-Forwarded-Encrypted: i=1; AJvYcCWiDmPzsFKUttSJpxQiA3fpzZ5KAo9K3bDrJT5jjjX5LE7DQAX5Jz1m2VqC5BlqbqgsdnBthOreWxxO3kQA@lists.postgresql.org X-Gm-Message-State: AOJu0YyYczd2/r2Yia9sezE5Ytsnfq30kqiViIp4pu8xbuxcvpVTjnUu YtNZaq2oZ61+jnycQDVAG4ZDTVGk5u6yWdT7VrrKlJB3zbcI49snJLVyv3iAeb8M7AD3MzygLq7 aNvzz43W/pqpqT7KQ1qh4y/F3hOo3l5I= X-Gm-Gg: ASbGncvjGWC9e3OhN1qpPK99mx+9ELhnCQ3TpzD0aVIH7UmHCnaSD28WXbN/kv3OEb5 FMwV4lJpgIEuQfP56CbNq7pB24uxES4WjsUD+DQQwWyAjJqIa2oJoc2whv8UL9emreLOnapoi1y ++7hR6T/pWKm5ZqmJ3eyqusHSTiJNg4sEkryFNGTOwAsOC+/aZHo8SdEV5A8FN6ruTV//bgD871 IAIEzh/dWHFVoSiClrORYOiZPuVufderzUA3sRIx/NFLu9kyA== X-Google-Smtp-Source: AGHT+IEp7k8SkzFfl9Fz6w/zbv0wHbHxTpCfuIvySQITpvtJoG1fXDaB7i3haaVTSmzLIMQg8KI1aS7O52ZFbVzGLtM= X-Received: by 2002:a05:6a21:62c9:b0:23d:781f:1516 with SMTP id adf61e73a8af0-23d781f2791mr7884421637.22.1753544601088; Sat, 26 Jul 2025 08:43:21 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Jon Zeppieri Date: Sat, 26 Jul 2025 11:43:09 -0400 X-Gm-Features: Ac12FXz-ofXbdc20xRPS8KJNVnJYFx8HQalMlP7KHr-D3fN4FYL578bXjQDhVXM Message-ID: Subject: Re: Possible causes of high_replay lag, given replication settings? To: Greg Sabino Mullane Cc: Nick Cleaton , pgsql-general@lists.postgresql.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Jul 25, 2025 at 7:13=E2=80=AFPM Greg Sabino Mullane wrote: > > On Fri, Jul 25, 2025 at 9:57=E2=80=AFAM Jon Zeppieri = wrote: >> >> Thanks for the response, Nick. I'm curious why the situation you describ= e wouldn't also lead to the write_lag and flush_lag also being >> high. If the problem is simply keeping up with the primary, wouldn't you= expect all three lag times to be elevated? > > > No - write and flush are pretty quick and simple, it's just putting the W= AL onto the local disk. Replay involves a lot more work as we have to parse= the WAL and apply the changes, which means doing a lot of I/O across many = files. Still, *hours* to me indicates more than just a lot of extra traffic= . Check that recovery_min_apply_delay is still 0, then log onto the replica= and see what's going on with regards to open transactions and locks. Thanks Greg. `recovery_min_apply_delay` is 0, just checked. Also, I didn't mention in my initial post that it seemed the cause of the delay was long-running queries on the replica, rather than the primary. It's possible, of course, that I'm wrong, but I was able to get the replica moving again when I killed off old queries on the replica. If those were the problem, though, then I don't understand why the max_standby_streaming_delay didn't prevent that situation. - Jon