Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t8z6b-0074ZB-Pt for pgsql-general@arkaria.postgresql.org; Thu, 07 Nov 2024 09:47:09 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t8z6Y-00E7l1-3A for pgsql-general@arkaria.postgresql.org; Thu, 07 Nov 2024 09:47:06 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t8z6X-00E7kt-Fp for pgsql-general@lists.postgresql.org; Thu, 07 Nov 2024 09:47:06 +0000 Received: from mail-lj1-x22f.google.com ([2a00:1450:4864:20::22f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t8z6U-000c1r-JF for pgsql-general@lists.postgresql.org; Thu, 07 Nov 2024 09:47:04 +0000 Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2fb5be4381dso7316571fa.2 for ; Thu, 07 Nov 2024 01:47:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730972821; x=1731577621; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=YZPMST5Yz7H8scmbZYEparfae+rrxODbuuZgFCT7mEE=; b=dP1qavNFUMzO6ibo2wgIjInhRHH44Z7jaW7+H9GqJoAWvGJqeYx7nisUcOdTGl8FDH C1fUUuCrIamfhmzbEjlUwOiCXWUwTFGVR89R7C8x4OrqiCiiqC/ZGZTU0PmV8QkIuglK 2aKfFKPeapWmRvpSYNlNJ8guN+l3nsuPqUjgO5XHPvAxRWgFW94T6tRD4Ex2Fc/3e5Ic 2BBrWp9pI6DfoXT2CmxxlNHeaB88v4maLD0zzd7OXA/mr9BqASUnvGvGb+uh8YRzfWV6 aIFois4DIszmhosOygzpqjnDqvD8wL9qrEC4QADp2URxZIF6qpI5WWuYlBy1XvSIOZux 7sGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730972821; x=1731577621; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=YZPMST5Yz7H8scmbZYEparfae+rrxODbuuZgFCT7mEE=; b=j8KXVt2P4w5zNpr5wA4Ca/IAXLTKAaPo10T20FFgDGPZQKLYaJaxdBbXfXaApmS7qF +SVtR1D1f5/EQXo7prXOPBQdwCbFXb8Mx90WmqEeh9SLghYakDa/+yRTOAnNFUFSZ5ha tx0m+cBlv+ITdhP7G686YMDwJE35N9q5PnFXDot8XtcTqRV04Gl7KsKjrAQ9gcB9LRJz DbRDGzCsL+jHnATPiRRid9KDZyAj4s3HHcNDGOscPeTZAIZdu/GUy1uwmK2E6Cyj+aWT TPPasuC1YaFGdVg3q4Jxojq47c3Bp634bHv3Ot77x3j8Rt9s0GdPPhhO/QfmAjI63A2d BBoQ== X-Gm-Message-State: AOJu0Yz39ucw89RlStI8iQGmzFgpUrcLAD4cD4MCInS87xYNCWmVZ7Cs 9UXuE9fdMhnUXi1WZSYJ9kzdYadz53foXgRJvH2EnYUgPx0qu/yASY0/ShvfaVHLGHqtBa9TRGQ j7gnWSei4SdX/SG/QGUXmbgx4qkglauhGgyD9 X-Google-Smtp-Source: AGHT+IFRq9tROl2rt7CRI8kzoxZTZH5ZgQCR4OofAUM8J8lWPNIC+tQ9UmwuvOJA0IibV3tqv9a41ZasD68eexgaBuU= X-Received: by 2002:a05:651c:160f:b0:2fa:c014:4b6b with SMTP id 38308e7fff4ca-2fef43558a2mr81789901fa.41.1730972820408; Thu, 07 Nov 2024 01:47:00 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= Date: Thu, 7 Nov 2024 10:46:49 +0100 Message-ID: Subject: Re: Trouble using pg_rewind to undo standby promotion To: Craig McIlwee Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="00000000000013939206264f84ef" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000013939206264f84ef Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Your point of divergence is in the middle of the 7718/000000BF file. So, you should have 2 such files eventually, one on timeline 1 and the other on timeline 2. Are you archiving WAL on the promoted machine in a way that your restore_command can find it? Check archive_command and archive_mode on the promoted machine. Also, do your archive/restore scripts work properly for history files? On Wed, Nov 6, 2024 at 7:48=E2=80=AFPM Craig McIlwee wrote: > I have a primary -> standby 1 -> standby 2 setup with all servers running > PG 13.8 (this effort is part of getting on to a newer version, but I thin= k > those details aren't relevant to this problem). The first standby uses > streaming replication from the primary and the second standby is using a > WAL archive with a restore_command. To make this standby chain work, > standby 1 is set to archive_mode =3D always with a command that populates= the > WAL archive. > > I would like to be able to promote standby 2 (hereon referred to just as > 'standby'), perform some writes, then rewind it back to the point before > promotion so it can become a standby again. The documentation for > pg_rewind says that this is supported and it seems like it should be > straightforward, but I'm not having any luck getting this to work so I'm > hoping someone can point out what I'm doing wrong. Here's what I did: > > First, observe that WAL is properly being applied from the archive. Note > that we are currently on timeline 1. > > 2024-11-06 09:51:23.286 EST [5438] LOG: restored log file > "0000000100007711000000F9" from archive > 2024-11-06 09:51:23.434 EST [5438] LOG: restored log file > "0000000100007711000000FA" from archive > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/0000000100007711000000FB.gz: No such file o= r > directory > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/00000002.history.gz: No such file or direct= ory > > Next, stop postgres, set wal_log_hints =3D on as required by pg_rewind, a= nd > restart postgres. I also make a copy of the data directory while the > postgres is not running so I can repeat my test, which works fine on a > small test database but won't be possible for the multi TB database that = I > will eventually be doing this on. > > Now promote the standby using "select pg_promote()" and see that it > switches to a new timeline. You can also see that the last WAL applied > from the archive is 7718/BF. > > 2024-11-06 12:10:10.831 EST [4336] LOG: restored log file > "0000000100007718000000BD" from archive > 2024-11-06 12:10:10.996 EST [4336] LOG: restored log file > "0000000100007718000000BE" from archive > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/0000000100007718000000BF.gz: No such file o= r > directory > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/00000002.history.gz: No such file or direct= ory > 2024-11-06 12:10:15.384 EST [4336] LOG: restored log file > "0000000100007718000000BF" from archive > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/0000000100007718000000C0.gz: No such file o= r > directory > 2024-11-06 12:10:15.457 EST [4336] LOG: received promote request > 2024-11-06 12:10:15.457 EST [4336] LOG: redo done at 7718/BFFFFF30 > 2024-11-06 12:10:15.457 EST [4336] LOG: last completed transaction was a= t > log time 2024-11-06 12:10:22.627074-05 > 2024-11-06 12:10:15.593 EST [4336] LOG: restored log file > "0000000100007718000000BF" from archive > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/00000002.history.gz: No such file or direct= ory > 2024-11-06 12:10:15.611 EST [4336] LOG: selected new timeline ID: 2 > 2024-11-06 12:10:15.640 EST [4336] LOG: archive recovery complete > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/00000001.history.gz: No such file or direct= ory > 2024-11-06 12:10:17.028 EST [4329] LOG: database system is ready to > accept connections > > Next, insert a record into just to make some changes that I want to > rollback later. (What I will eventually be doing is creating a publicati= on > to ship data to a newer version, but again, that's not what's important > here.) > > Finally, shutdown postgres and attempt a rewind. The address used in the > --source-server connection string is the address of the primary. > > 2024-11-06 12:11:11.139 EST [4329] LOG: received fast shutdown request > 2024-11-06 12:11:11.143 EST [4329] LOG: aborting any active transactions > 2024-11-06 12:11:11.144 EST [4329] LOG: background worker "logical > replication launcher" (PID 5923) exited with exit code 1 > 2024-11-06 12:11:40.933 EST [4342] LOG: shutting down > 2024-11-06 12:11:41.753 EST [4329] LOG: database system is shut down > > /usr/pgsql-13/bin/pg_rewind --target-pgdata=3D/data/pgsql/operational > --source-server=3D"host=3Dx.x.x.x dbname=3Dpostgres user=3Dxxx password= =3Dxxx" > --dry-run --progress --restore-target-wal > > pg_rewind: connected to server > pg_rewind: servers diverged at WAL location 7718/BFFFFFE8 on timeline 1 > /data/wal_archive/restore_operational.sh: line 2: > /data/wal_archive/operational/0000000200007718000000BF.gz: No such file o= r > directory > pg_rewind: error: could not restore file "0000000200007718000000BF" from > archive > pg_rewind: fatal: could not find previous WAL record at 7718/BFFFFFE8 > > pg_rewind shows the point of divergence as 7718/BF on timeline 1, but whe= n > it tries to replay WAL using the restore command it is trying to find WAL > from timeline 2 rather than picking back up on timeline 1. I tried > setting recovery_target_timeline on the target database to 'current' and > '1' but that gave the same result. Searching the archives, [1] mentions t= he > need to force a checkpoint after promotion which I tried even though the > problem description isn't the same. [2] mentions a problem that looks mo= re > like the one I am facing but has no responses. At this point I don't kno= w > what to do next and hope someone can point me in the right direction. > > [1] > https://www.postgresql.org/message-id/e7b16ddea93a92575cb6d143b6ef602cab2= 2432e.camel%40cybertec.at > [2] > https://www.postgresql.org/message-id/CALp3DH1fLZmPvkOteAbUo4TOLZP-LstKOs= 6Gcw3Bm7acmJqk=3Dw@mail.gmail.com > > Craig > --00000000000013939206264f84ef Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Your point of divergence is in the middle of the 7718/0000= 00BF file. So, you should have 2 such files eventually, one on timeline 1 a= nd the other on timeline 2.

Are you archiving WAL on the= promoted machine in a way that your restore_command can find it? Check arc= hive_command and archive_mode on the promoted machine.

=
Also, do your archive/restore scripts work properly for history files?=

On Wed, Nov 6, 2024 at 7:48=E2=80=AFPM Craig McIlwee <craigm@vt.edu> wrote:
I have a primary -&= gt; standby 1 -> standby 2 setup with all servers running PG 13.8 (this = effort is part of getting on to a newer version,=C2=A0but I think those det= ails aren't relevant to this problem).=C2=A0 The first standby uses str= eaming replication from the primary and the second standby is using a WAL a= rchive with a restore_command.=C2=A0 To make this standby chain work, stand= by 1 is set to archive_mode =3D always with a command that populates the WA= L archive.

I would like to be able to promote stan= dby 2 (hereon referred to just as 'standby'), perform some writes, = then rewind it back to the point before promotion so it can become a standb= y again.=C2=A0 The documentation for pg_rewind says that this is supported= =C2=A0and it seems like it should be straightforward, but I'm not havin= g any luck getting this to work so I'm hoping someone can point out wha= t I'm doing wrong.=C2=A0 Here's what I did:

First, observe that WAL is properly being applied from the archive.=C2=A0= Note that we are currently on timeline 1.
=C2=A0
2024-11-06 09:51:23= .286 EST [5438] LOG: =C2=A0restored log file "0000000100007711000000F9= " from archive
2024-11-06 09:51:23.434 EST [5438] LOG: =C2=A0restor= ed log file "0000000100007711000000FA" from archive
/data/wal_= archive/restore_operational.sh: line 2: /data/wal_archive/operational/00000= 00100007711000000FB.gz: No such file or directory
/data/wal_archive/rest= ore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.= gz: No such file or directory

Next, stop postgres,= set wal_log_hints =3D on as required by pg_rewind, and restart postgres.= =C2=A0 I also make a copy of the data directory while the postgres is not r= unning so I can repeat my test, which works fine on a small test database b= ut won't be possible for the multi TB database that I will eventually b= e doing this on.=C2=A0

Now promote the standby usi= ng "select pg_promote()" and see that it switches to a new timeli= ne.=C2=A0 You can also see that the last WAL applied from the archive is 77= 18/BF.

2024-11-06 12:10:10.831 EST [4336] LOG: =C2= =A0restored log file "0000000100007718000000BD" from archive
2= 024-11-06 12:10:10.996 EST [4336] LOG: =C2=A0restored log file "000000= 0100007718000000BE" from archive
/data/wal_archive/restore_operatio= nal.sh: line 2: /data/wal_archive/operational/0000000100007718000000BF.gz: = No such file or directory
/data/wal_archive/restore_operational.sh: line= 2: /data/wal_archive/operational/00000002.history.gz: No such file or dire= ctory
2024-11-06 12:10:15.384 EST [4336] LOG: =C2=A0restored log file &q= uot;0000000100007718000000BF" from archive
/data/wal_archive/restor= e_operational.sh: line 2: /data/wal_archive/operational/0000000100007718000= 000C0.gz: No such file or directory
2024-11-06 12:10:15.457 EST [4336] L= OG: =C2=A0received promote request
2024-11-06 12:10:15.457 EST [4336] LO= G: =C2=A0redo done at 7718/BFFFFF30
2024-11-06 12:10:15.457 EST [4336] L= OG: =C2=A0last completed transaction was at log time 2024-11-06 12:10:22.62= 7074-05
2024-11-06 12:10:15.593 EST [4336] LOG: =C2=A0restored log file = "0000000100007718000000BF" from archive
/data/wal_archive/rest= ore_operational.sh: line 2: /data/wal_archive/operational/00000002.history.= gz: No such file or directory
2024-11-06 12:10:15.611 EST [4336] LOG: = =C2=A0selected new timeline ID: 2
2024-11-06 12:10:15.640 EST [4336] LOG= : =C2=A0archive recovery complete
/data/wal_archive/restore_operational.= sh: line 2: /data/wal_archive/operational/00000001.history.gz: No such file= or directory
2024-11-06 12:10:17.028 EST [4329] LOG: =C2=A0database sys= tem is ready to accept connections

Next, insert a = record into just to make some changes that I want to rollback later.=C2=A0 = (What I will eventually be doing is creating a publication to ship data to = a newer version, but again, that's not what's important here.)

Finally, shutdown postgres and attempt a rewind.=C2=A0= The address used in the --source-server connection string is the address o= f the primary.

2024-11-06 12:11:11.139 EST [4329] = LOG: =C2=A0received fast shutdown request
2024-11-06 12:11:11.143 EST [4= 329] LOG: =C2=A0aborting any active transactions
2024-11-06 12:11:11.144= EST [4329] LOG: =C2=A0background worker "logical replication launcher= " (PID 5923) exited with exit code 1
2024-11-06 12:11:40.933 EST [4= 342] LOG: =C2=A0shutting down
2024-11-06 12:11:41.753 EST [4329] LOG: = =C2=A0database system is shut down

/usr/pgsql-13/b= in/pg_rewind --target-pgdata=3D/data/pgsql/operational --source-server=3D&q= uot;host=3Dx.x.x.x dbname=3Dpostgres user=3Dxxx password=3Dxxx" --dry-= run --progress --restore-target-wal

pg_rewind: con= nected to server
pg_rewind: servers diverged at WAL location 7718/BFFFFF= E8 on timeline 1
/data/wal_archive/restore_operational.sh: line 2: /data= /wal_archive/operational/0000000200007718000000BF.gz: No such file or direc= tory
pg_rewind: error: could not restore file "00000002000077180000= 00BF" from archive
pg_rewind: fatal: could not find previous WAL re= cord at 7718/BFFFFFE8

pg_rewind shows the point of= divergence=C2=A0as 7718/BF on timeline 1, but when it tries to replay WAL = using the restore command it is trying to find WAL from timeline 2 rather t= han picking=C2=A0back up on timeline 1.=C2=A0 I tried setting=C2=A0recovery= _target_timeline on the target database to 'current' and '1'= ; but that gave the same result. Searching the archives, [1] mentions the n= eed to force a checkpoint after promotion which I tried even though the pro= blem description isn't the same.=C2=A0 [2] mentions a problem that look= s more like the one I am facing but has no responses.=C2=A0 At this point I= don't know what to do next and hope someone can point me in the right = direction.


Craig
--00000000000013939206264f84ef--