Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t92XT-007O5Y-IQ for pgsql-general@arkaria.postgresql.org; Thu, 07 Nov 2024 13:27:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t92XP-00FXLS-VP for pgsql-general@arkaria.postgresql.org; Thu, 07 Nov 2024 13:27:04 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t92XP-00FXLK-6O for pgsql-general@lists.postgresql.org; Thu, 07 Nov 2024 13:27:04 +0000 Received: from mail-lj1-x235.google.com ([2a00:1450:4864:20::235]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t92XK-000gDt-Dp for pgsql-general@lists.postgresql.org; Thu, 07 Nov 2024 13:27:02 +0000 Received: by mail-lj1-x235.google.com with SMTP id 38308e7fff4ca-2fb3da341c9so9336511fa.2 for ; Thu, 07 Nov 2024 05:26:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vt-edu.20230601.gappssmtp.com; s=20230601; t=1730986017; x=1731590817; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=7xxfIPsSJStgayHB3clfYu8VNNYMfSWIWYfmdppdRxo=; b=XWJP8Jaikww4dDCnF+20HtoMEb8pcXKDHy6CJIf2kiklogj/m6VWTb7NuwTFaXC/5c 0qDIqlqHRiOgPr+RP1ct2+GHfbc67l/fU4CA2U7eHUQBv+K5jlaeXv5EePLgrp72XluJ YONnpBm6JMFe3gogbaMgwpqhc3VC+ifyQSQjM+u57KL2C5TshYhhd99h16tnH6M7N5cK kS0gopy/FcTyu3Jndw2Zv9QDbkCa2IpAlQgO1B2RBVP+08D5LWdUzChs64y+5Q9Krbj1 znf0Svzy3sxMulJiPFl9LnafVtU/Q56RXRuKctarBOKi61kZyZthEwGcuv9tIKNnHqxB C7Cg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730986017; x=1731590817; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=7xxfIPsSJStgayHB3clfYu8VNNYMfSWIWYfmdppdRxo=; b=BZl6Uw+pX7kvI5MaCpxFj1ivHHwlOfzeJBxatanX/PrEF1tIzs0m79fJUQ33N5TdAk +WJsldNvDog5zHVr94ZKRR44x0oCVmkJLC47Y8ClR8gEq5UsXe6b+J6e+1JMb0jelSK0 6QHDzWOLqL5AGaHraVJeKoqVgEGmBp9fF+3dqKrRwV941XqJtA/7c6rH/OZrp+0dW1xJ zd77aSa3Zh3xEbh+c2SHyld9h3EYtoMM3v4kTD7+H/QlOB5+XFGS7eNi/XfX2Wa1wk6k L/YThDHL7T3N0sXAP5P/6IdqqD0faI8SsAMokH9wjQltVMbo/2kSQMoHjRwGnRRtqeNJ TA7w== X-Gm-Message-State: AOJu0YwMtvHeJl1Q1MmrmGEsHVxJz/ABCIqSA9QaYtyxIeJB6H30zWv8 rifbRGCyblXuyebUZ6IvBNDnaokmGOR8sBVhowoeaFpmt3H9YoAGJL9y9kRwyiiJfXMMPKlSxkL YOLt1ZqIt8SLAorKbyGaIhHWi8QOs+syOzSp4j1xB/KJjqOUG X-Google-Smtp-Source: AGHT+IFkd5x1paSo2QxuS1izZhGIJRQmeEo0+oCfhlVBbHOu0oOmXPkA2EckIYnBsLVHVQqMH5JaqS9/OnzMlZ3eWCI= X-Received: by 2002:a05:651c:1508:b0:2fa:ddb5:77f9 with SMTP id 38308e7fff4ca-2fedb7ec448mr128324481fa.40.1730986017121; Thu, 07 Nov 2024 05:26:57 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Craig McIlwee Date: Thu, 7 Nov 2024 08:26:46 -0500 Message-ID: Subject: Re: Trouble using pg_rewind to undo standby promotion To: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000a97e9f0626529641" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a97e9f0626529641 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Thu, Nov 7, 2024 at 4:47=E2=80=AFAM Torsten F=C3=B6rtsch wrote: > Your point of divergence is in the middle of the 7718/000000BF file. So, > you should have 2 such files eventually, one on timeline 1 and the other = on > timeline 2. > > Are you archiving WAL on the promoted machine in a way that your > restore_command can find it? Check archive_command and archive_mode on th= e > promoted machine. > No, the promoted machine is not archiving. How should that work? Is it OK for a log shipping standby that uses restore_command to also push to the same directory with an archive_command or would that cause issues of trying to read and write the same file simultaneously during WAL replay? Or should I be setting up an archive_command that pushes to a separate directory and have a restore_command that knows to check both locations? Hmm, as I write that out, I realize that I could use archive_mode =3D on instead of archive_mode =3D always to avoid the potential for read/write conflicts during WAL replay. I can try this later and report back. Also, do your archive/restore scripts work properly for history files? > The scripts don't do anything special with history files. They are based on the continuous archive docs [1] and this [2] article the with slight modification to include a throttled scp since the log shipping server is located in a different data center from the promoted standby and there is limited bandwidth between the two. (Also note that the archive script from [2] is adapted to properly handle file transfer failures - the one in the article will use the exit code of the rm command so postgres won't be informed the file transfer fails resulting in missing WAL in the archive.) Archive script: --- #!/bin/bash # $1 =3D %p # $2 =3D %f limit=3D10240 # 10Mbps gzip < /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz scp -l $limit /tmp/archive/$2.gz postgres@x.x.x.x :/data/wal_archive/operational/$2.gz exit_code=3D$? rm /tmp/archive/$2.gz exit $exit_code --- Restore script: --- gunzip < /data/wal_archive/operational/$2.gz > $1 --- [1] https://www.postgresql.org/docs/13/continuous-archiving.html#COMPRESSED-ARC= HIVE-LOGS [2] https://www.rockdata.net/tutorial/admin-archive-command/#compressing-and-ar= chiving Craig > --000000000000a97e9f0626529641 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Thu, Nov 7, 2024 at 4:47=E2=80=AFAM To= rsten F=C3=B6rtsch <tfoertsch1= 23@gmail.com> wrote:
Your point of divergenc= e is in the middle of the 7718/000000BF file. So, you should have 2 such fi= les eventually, one on timeline 1 and the other on timeline 2.

Are you archiving WAL on the promoted machine in a way that your res= tore_command can find it? Check archive_command and archive_mode on the pro= moted machine.

No, the promoted= machine is not archiving.=C2=A0 How should that work?=C2=A0 Is it OK for a= log shipping standby that uses restore_command to also push to the same di= rectory with an archive_command or would that cause issues of trying to rea= d and write the same file simultaneously during WAL replay?=C2=A0 Or should= I be setting up an archive_command that pushes to a separate directory and= have a restore_command that knows to check both locations?

<= /div>
Hmm, as I write that out, I realize that I could use archive_mode= =3D on instead of archive_mode =3D always to avoid the potential for read/= write conflicts during WAL replay.=C2=A0 I can try this later and report ba= ck.

<= div dir=3D"ltr">
Also, do your archive/restore scripts work properly fo= r history files?

The scripts do= n't do anything special with history files.=C2=A0 They are based on the= continuous archive docs [1] and this [2] article the with slight modificat= ion to include a throttled scp since the log shipping server is located in = a different data center from the promoted standby and there is limited band= width between the two.=C2=A0 (Also note that the archive script from [2] is= adapted to properly handle file transfer failures - the one in the article= will use the exit code of the rm command so postgres won't be informed= the file transfer fails resulting in missing WAL in the archive.)

Archive script:
---
#!/bin/bash
# $1 =3D %p
# $2 =3D %f

limit=3D10240 # 10Mbps

gzip <= /var/lib/pgsql/13/data/$1 > /tmp/archive/$2.gz

scp -l $limit /tm= p/archive/$2.gz postgres@x.x.x.x:/data/wal_archive/operational/$2.gz
exi= t_code=3D$?

rm /tmp/archive/$2.gz

exit $exit_code
= ---

Restore script:
---
gunzip < /data/wal_archive/operational/$2.g= z > $1
---


--000000000000a97e9f0626529641--