Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uAkbM-001hMv-5A for pgsql-general@arkaria.postgresql.org; Fri, 02 May 2025 07:14:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uAkbJ-007mNA-5P for pgsql-general@arkaria.postgresql.org; Fri, 02 May 2025 07:14:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uAkbI-007mMz-My for pgsql-general@lists.postgresql.org; Fri, 02 May 2025 07:14:25 +0000 Received: from mail-yb1-xb33.google.com ([2607:f8b0:4864:20::b33]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uAkbH-000eMV-16 for pgsql-general@postgresql.org; Fri, 02 May 2025 07:14:24 +0000 Received: by mail-yb1-xb33.google.com with SMTP id 3f1490d57ef6-e733b858574so1562861276.2 for ; Fri, 02 May 2025 00:14:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746170063; x=1746774863; darn=postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=0CcWmNH/+3bvJQN17KKmcsXYkZ0XJXsjTmhBUWXOj2M=; b=ZJHCgE2YOboW6jbVqUFoXVq5s/znvbfCYWdNtFDpLKRdd8My6bdEbAv9TeUIFDqqZ/ 37E1GoAdoERcAQ1JMbmkmpfWYrBKF02ZuDT0bq2yxbYtnO+/a+1x4DGtszrOPZZqekrC LF29JtXHH1GJzLH1V1bvu85dzwWG2zBklCxGi/u1FqXzs3uriPC7pqz9bb7/R2PbGd0o OrQHBJWz2dAOiQsNTsz8r1K5OFWYrJS1NbiTckzAXb4TG4G+Oan+2oX1P2cQZPzYD+Sj nJeZIWEz8pEljKHrm9MrV/YZHezAkRuUzyzQDPyopTsq03/v0ws+5X9iN080EU42hxhk eUmA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746170063; x=1746774863; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=0CcWmNH/+3bvJQN17KKmcsXYkZ0XJXsjTmhBUWXOj2M=; b=fc7KyRbPnUH0wFF/w/f8RxObIdqlVvq6rJYmlK5WJJhEb8cDbEHuC0HrKbeKxd2dun C9BOUeMJ5CqB7WBxEAdLuGni0rBpYB96VUQ0GeX6GxsFqyrC4ZTkJaN5mCq1ywjWLv/Y yXPGKhseLStX5Z+PnJDv6aEodgMu9KIQiM5hXNzYpvpdwfmlPyb0zrxwA1GXuEgUc6Wq 48tCgn1wBUdUVXs4M62WR9WAPnPV3OX+d94I+AYQMQdL6F7Dr2U33cv2djz2vmLdx+Pm xqNoYpqODKZq2+X/d7onpIW7LiJCXFZKznBBkgaUOG4xMMsfyLH1s+qKBuWUEY8zExcM Mr7w== X-Gm-Message-State: AOJu0Yz4LyFH0iSu7ujl/zrrVaxr1lX5ssD0G+AI22Otbm0LG/05NIg+ wGWvW4iXkLiwnt0T8uvnkKYGYKiWLKbGJxMixLV2hTi9ng2xLHLKMpI5NsktFCd+1YBhJcDQ3us yEnWEMSGFYtJfUA8KefXwXBIljI+TjNGQ X-Gm-Gg: ASbGncsK9otxNE5l2F8Wu8OMEJ5YSrDp0MZsNR8MfRbWD7ViE/2bTWnPpGQj/t6YAMy uiHhLHiASstTMPAl921+YnNgZK9ZzDTI2IHR44bFQRYPC+TsI514QfSI8Z4FIFN2G/V1WMICTUj sB8ICKr/cyk7j48YaNHVkfcFc= X-Google-Smtp-Source: AGHT+IFQrwTM2yM5XzLBgVstsK6h1kLzeEwi5pU2zdfWIy+rO6opgq9tTGNDytDRxnllL6E12idGmyaf4SKg9dJkYm0= X-Received: by 2002:a05:690c:906:b0:708:be8b:8415 with SMTP id 00721157ae682-708ced10917mr30669827b3.1.1746170062625; Fri, 02 May 2025 00:14:22 -0700 (PDT) MIME-Version: 1.0 From: KK CHN Date: Fri, 2 May 2025 12:46:29 +0530 X-Gm-Features: ATxdqUGexXCBI0gXhG3RTEdfaZiK20yjUM1OSvGjj6clrRPzo6dhica-7TZb15k Message-ID: Subject: checkpoint_timeout parameter & WAL archive delay, pgbackrest fails To: pgsql-general Content-Type: multipart/alternative; boundary="0000000000004ce41b063421e66e" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000004ce41b063421e66e Content-Type: text/plain; charset="UTF-8" Hi folks, My pgbackrest backup on one of my RepoServer fails. The backup fails some times with the error WAL file cannot be archived before 60000 ms timeout. The pgbackrest stanza check command is sometimes successful, but sometimes fails. I don't know why PG is unable to copy WAL files from pg_wal to /data/myarchive_dir in real time. I always observed a delay of around 10 minutes for a wal file in pg_wal to appear in /data/my_archive_dir. On investigation I'hv observed that our DB admin has put checkpoint_timeout = 10 m in the postgresql.conf file. I think this causes the WAL archiving delay and subsequently my pgbackrest fails while trying to backup the DB to a remote RepoServer. What the ideal value needed to be set for "checkpoint_timeout" to overcome this issue. I don't want pgbackrest backup fails due to this parameter ?. ( Is it possible to set a very minimum value for checkpoint_timeout what is the minimum value or can I put 0 ? ) archive_command = 'pgbackrest --stanza=My_Repo archive-push %p && cp %p /data/archive/%f' From postgresql logs I am seeing this .. ERROR: [082]: unable to push WAL file '000000010000026300000002' to the archive asynchronously after 60 second(s) HINT: check '/var/log/pgbackrest/My_Repo-archive-push-async.log' for errors. INFO: archive-push command end: aborted with exception [082] 2025-05-02 12:15:17 IST LOG: archive command failed with exit code 82 2025-05-02 12:15:17 IST DETAIL: The failed archive command was: pgbackrest --stanza=My_Repo archive-push pg_wal/000000010000026300000002 && cp pg_wal/000000010000026300000002 /data/archive/000000010000026300000002 INFO: archive-push command begin 2.52.1: [pg_wal/000000010000026300000002] --archive-async --compress-type=zst --exec-id=2848559-384cf49c --log-level-console=info --log-level-file=debug --log-level-stderr=info --pg1-path= /var/lib/postgres/16/data --pg-version-force=16 --process-max=6 --repo1-host=10.50.12.202 --repo1-host-user=pgbackrest --spool-path=/var/spool/pgbackrest --stanza=My_Repo top output on DB cluster: top - 12:37:00 up 66 days, 17:24, 2 users, load average: 4.04, 4.72, 4.56 Tasks: 902 total, 4 running, 897 sleeping, 0 stopped, 1 zombie %Cpu(s): 7.4 us, 1.7 sy, 0.0 ni, 89.9 id, 0.4 wa, 0.2 hi, 0.4 si, 0.0 st MiB Mem : 31837.6 total, 706.1 free, 15243.0 used, 24741.0 buff/cache MiB Swap: 8060.0 total, 6634.0 free, 1426.0 used. 16608.9 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2839363 postgre+ 20 0 8965608 7.2g 7.1g S 70.2 23.0 2:02.61 postgres 2864108 postgre+ 20 0 8967848 7.1g 7.1g S 64.9 22.8 0:30.04 postgres 2865547 postgre+ 20 0 8965432 7.1g 7.1g S 39.1 22.8 0:32.30 postgres 2865752 postgre+ 20 0 8964352 6.9g 6.9g S 16.6 22.3 0:32.94 postgres Model name: Intel(R) Xeon(R) Gold 6430 BIOS Model name: Intel(R) Xeon(R) Gold 6430 CPU family: 6 Model: 143 Thread(s) per core: 1 Core(s) per socket: 16 These are vCPUs (16 nos) , OS RHEL 9, postgres 16 Any hints on how to make pgbackrest take backup properly are much appreciated. Thanks, Krishane --0000000000004ce41b063421e66e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi folks,
My pgbackrest=C2=A0 backup=C2=A0 = on one of=C2=A0 my RepoServer fails.=C2=A0 =C2=A0The backup fails some time= s with the=C2=A0 error WAL file cannot be archived before 60000 ms timeout.= =C2=A0=C2=A0

The pgbackrest=C2=A0 stanza check com= mand is sometimes successful, but sometimes fails.

I don't know why=C2=A0 =C2=A0 PG is unable to=C2=A0 =C2=A0copy=C2=A0 W= AL files from=C2=A0 =C2=A0pg_wal to=C2=A0 /data/myarchive_dir in real time.= I always=C2=A0 observed a delay of around 10 minutes for a wal file in pg_= wal to appear in=C2=A0 /data/my_archive_dir.=C2=A0

On investigation I'hv observed that=C2=A0 our DB admin has put=C2=A0 c= heckpoint_timeout =3D 10 m=C2=A0 in the=C2=A0 postgresql.conf=C2=A0 file.= =C2=A0

I think this causes the WAL archiving delay= and=C2=A0 subsequently my pgbackrest=C2=A0 =C2=A0fails=C2=A0 while trying = to backup the DB=C2=A0 to a remote RepoServer.

Wha= t the ideal value needed=C2=A0to be set for=C2=A0 "checkpoint_timeout&= quot;=C2=A0 to overcome this issue.=C2=A0 I don't want=C2=A0 pgbackrest= backup=C2=A0fails due to this parameter ?.=C2=A0 =C2=A0 ( Is it possible t= o set a very minimum value for checkpoint_timeout=C2=A0 what is the minimum= value=C2=A0 or can I put=C2=A0 =C2=A00=C2=A0 ? )

=
archive_command =3D 'pgbackrest --stanza=3DMy_Repo archi= ve-push %p && cp %p /data/archive/%f'=C2=A0


From postgresql logs=C2=A0 I am seeing this ..=C2=A0

ERROR: [082]: unable to push WAL file '0000000100000= 26300000002' to the archive asynchronously after 60 second(s)
=C2=A0= =C2=A0 =C2=A0 =C2=A0HINT: check '/var/log/pgbackrest/My_Repo-archive-p= ush-async.log' for errors.
INFO: archive-push command end: aborted w= ith exception [082]
2025-05-02 12:15:17 IST LOG: =C2=A0archive command f= ailed with exit code 82
2025-05-02 12:15:17 IST DETAIL: =C2=A0The failed= archive command was: pgbackrest --stanza=3DMy_Repo archive-push pg_wal/000= 000010000026300000002 && cp pg_wal/000000010000026300000002 /data/a= rchive/000000010000026300000002
INFO: archive-push command begin 2.52.1:= [pg_wal/000000010000026300000002] --archive-async --compress-type=3Dzst --= exec-id=3D2848559-384cf49c --log-level-console=3Dinfo --log-level-file=3Dde= bug --log-level-stderr=3Dinfo --pg1-path=3D /var/lib/postgres/16/data=C2=A0=C2=A0=C2=A0--pg-version-force=3D16 --proces= s-max=3D6 --repo1-host=3D10.50.12.202 --repo1-host-user=3Dpgbackrest --spoo= l-path=3D/var/spool/pgbackrest --stanza=3DMy_Repo

top=C2=A0 output= =C2=A0 =C2=A0on DB cluster:

top - 12:37:00 up 66 days, 1= 7:24, =C2=A02 users, =C2=A0load average: 4.04, 4.72, 4.56

Tasks: 902 total, =C2=A0 4 running, 897 sleeping, =C2=A0 0 stopped,= =C2=A0 1 zombie
%Cpu(s): =C2=A07.4 us, =C2=A01.7 sy, =C2=A00.0 n= i, 89.9 id, =C2=A00.4 wa, =C2=A00.2 hi, =C2=A00.4 si, =C2=A00.0 st
MiB Mem : =C2=A031837.6 total, =C2=A0 =C2=A0706.1 free, =C2=A015243.0 use= d, =C2=A024741.0 buff/cache
MiB Swap: =C2=A0 8060.0 total, =C2=A0= 6634.0 free, =C2=A0 1426.0 used. =C2=A016608.9 avail Mem

=C2=A0 =C2=A0 PID USER =C2=A0 =C2=A0 =C2=A0PR =C2=A0NI =C2=A0 =C2=A0V= IRT =C2=A0 =C2=A0RES =C2=A0 =C2=A0SHR S =C2=A0%CPU =C2=A0%MEM =C2=A0 =C2=A0= TIME+ COMMAND
2839363 postgre+ =C2=A020 =C2=A0 0 8965608 =C2=A0 7.2g = =C2=A0 7.1g S =C2=A070.2 =C2=A023.0 =C2=A0 2:02.61 postgres
2864108 post= gre+ =C2=A020 =C2=A0 0 8967848 =C2=A0 7.1g =C2=A0 7.1g S =C2=A064.9 =C2=A02= 2.8 =C2=A0 0:30.04 postgres
2865547 postgre+ =C2=A020 =C2=A0 0 8965432 = =C2=A0 7.1g =C2=A0 7.1g S =C2=A039.1 =C2=A022.8 =C2=A0 0:32.30 postgres
= 2865752 postgre+ =C2=A020 =C2=A0 0 8964352 =C2=A0 6.9g =C2=A0 6.9g S =C2=A0= 16.6 =C2=A022.3 =C2=A0 0:32.94 postgres



Model name: =C2=A0 = =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0Intel(R) Xeon(R) Gold 6430
=C2=A0 =C2= =A0 BIOS Model name: =C2=A0 =C2=A0 Intel(R) Xeon(R) Gold 6430
=C2=A0 =C2= =A0 CPU family: =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A06
=C2=A0 =C2=A0 Model:= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 143
=C2=A0 =C2=A0 Thre= ad(s) per core: =C2=A01
=C2=A0 =C2=A0 Core(s) per socket: =C2=A016
These are vCPUs=C2=A0 =C2=A0 (16 nos) , OS RHEL 9,=C2=A0 postgres 16
<= br>Any hints on how to make=C2=A0 pgbackrest take backup properly are much = appreciated.


Thanks,
Krishane
--0000000000004ce41b063421e66e--