MIME-Version: 1.0
From: KK CHN <kkchn.in@gmail.com>
Date: Thu, 1 Aug 2024 15:32:46 +0530
Message-ID: <CAKgGyB8rLxoDThtOSvxmc3iVgcNkz5bF6emzc7YyTNJgF51QTg@mail.gmail.com>
Subject: PgBackRest PTR recovery: After table drop to get dropped state
To: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="00000000000054fda2061e9c30ee"
Archived-At: <https://www.postgresql.org/message-id/CAKgGyB8rLxoDThtOSvxmc3iVgcNkz5bF6emzc7YyTNJgF51QTg%40mail.gmail.com>
Precedence: bulk

--00000000000054fda2061e9c30ee
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

List,

*Not working (start EPAS server always fails):*

1. Testing PTR using  PgBackRest(2.52.1)  on RHEL9  EPAS-16, and RHEL9 (
Repo       Server)

  When I do a PTR

1.  After doing a table drop and then
2. Noting down the time stamp and then
3. Taking an incremental backup in hope that If I do a restore from this
incr Backup, that won't  contain the  dropped table.
4. Correct me  if I am  conceptually wrong here.
5.  I am *never *successful in restoring the EPAS server in this scenario.


*I know the following will work for me, w*hy not the above one if I really
want that state of cluster also  ?

*This is Working. *
 1. Create table
2. Noting down the timestamp
3.  Taking incremental backup on RepoServer.
4. drop the created table .
5. Then stop the EPAS server and do a  PTR, by the  --set=3Dstep 3 incr
backup  and target=3D step 2 time stamp .. It finished the pgaback restore
and promote command
6. I am able to start back the  EPAS server and see the dropped table
recovered there.

But If I want a PTR as in the first section it fails.. Why ?

Thank you,
Krishane


*What I have done and results obtained: *

Created a table important_table4 in my EPAS and note down the time after
creation of this table it is  ( t1 :  "01-AUG-24 14:08:32.447796+05:30" )

Then I performed an Incremental backup   (incr backup:
20240729-160137F_20240801-141148I )
timestamp start/stop: 2024-08-01 14:11:48+05:30 / 2024-08-01 14:11:52+05:30


Now I dropped the table table4 from the EPAS and noted down the time


I want to  restore the table4,, so I stopped EPAS and executed

$ sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta
--set=3D20240729-160137F_20240801-141148I  --target-timeline=3Dcurrent
--type=3Dtime  --target=3D"01-AUG-24 14:08:32.447796+05:30"
--target-action=3Dpromote restore

IT WORKS AS EXPECTED .. after restarting the EPAS I am able to get the
important_table4 back.

root@service01 ~]# sudo -u enterprisedb psql edb
psql (16.3.0)
Type "help" for help.

edb=3D# \dt
                List of relations
 Schema |       Name       | Type  |    Owner
--------+------------------+-------+--------------
 public | important_table  | table | enterprisedb
 public | important_table2 | table | enterprisedb
 public | important_table3 | table | enterprisedb
 public | important_table4 | table | enterprisedb
(4 rows)

SO all works fine !!!! .


*But Now the PROBLEM Statement. *

*1. I am dropping the table table 4 again *
edb=3D# \q
[root@service01 ~]# sudo -u enterprisedb psql -c "begin; drop table
important_table4; commit;" edb
BEGIN
DROP TABLE
COMMIT
*2 .  [root@service01 ~]#* sudo -u enterprisedb psql -Atc "select
current_timestamp" edb  01-AUG-24 14:23:22.085076 +05:30
Noting the time as :   (01-AUG-24 14:23:22.085076 +05:30 )

3. Now  I am performing an incremental backup after step 2  on REPO SErver
( Hoping that this latest INCR Backup is without dropped important_table4,
so that a recovery of the cluster  shouldn't show the table4 again. )

        incr backup details. : 20240729-160137F_20240801-142433I
            timestamp start/stop*: 2024-08-01 14:24:33+05:30 / 2024-08-01
14:24:36+05:30*

4. Now I want to test the database recovery  after dropping the table4 in
step1 to verify that my EPAS restores from the backup in step 3 and time
stamp (01-AUG-24 14:23:22.085076 +05:30,   so that  the restored EPAS
cluster doesn't contain the important_table4.

5.  $ sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta
 --set=3D20240729-160137F_20240801-142433I  --target-timeline=3Dcurrent
--type=3Dtime  --target=3D"01-AUG-24 14:23:22.085076+05:30"
--target-action=3Dpromote restore
 ------------
-------------
INFO: restore command end: completed successfully (1035ms)

*ISSUE:    I am unable to get the EPAS Server* in running state after step
5

 *What am I doing wrong ?  OR am I conceptually wrong ?*


OUTPUT on executing step 5.

[root@service01 ~]# sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo
--delta --set=3D20240729-160137F_20240801-142433I  --target-timeline=3Dcurr=
ent
--type=3Dtime  --target=3D"01-AUG-24 14:23:22.085076+05:30"
--target-action=3Dpromote restore

2024-08-01 14:30:03.535 P00   INFO: restore command begin 2.52.1: --delta
--exec-id=3D82738-b5fe7415 --log-level-console=3Dinfo --log-level-file=3Dde=
bug
--pg1-path=3D/var/lib/edb/as16/data --pg-version-force=3D16
--repo1-host=3D10.10.20.7 --repo1-host-user=3Dpostgres
--set=3D20240729-160137F_20240801-142433I --stanza=3DDemo_Repo
--target=3D"01-AUG-24 14:23:22.085076+05:30" --target-action=3Dpromote
--target-timeline=3Dcurrent --type=3Dtime
2024-08-01 14:30:03.880 P00   INFO: repo1: restore backup set
20240729-160137F_20240801-142433I, recovery will start at 2024-08-01
14:24:33
2024-08-01 14:30:03.881 P00   INFO: remove invalid files/links/paths from
'/var/lib/edb/as16/data'
2024-08-01 14:30:04.567 P00   INFO: write updated
/var/lib/edb/as16/data/postgresql.auto.conf
2024-08-01 14:30:04.569 P00   INFO: restore global/pg_control (performed
last to ensure aborted restores cannot be started)
2024-08-01 14:30:04.569 P00   INFO: restore size =3D 75.9MB, file total =3D=
 2171
2024-08-01 14:30:04.569 P00   INFO: restore command end: completed
successfully (1035ms)
*[root@service01 ~]# systemctl  start edb-as-16.service*

*Now  If I check the server  status  :   Its dead *

[root@service01 ~]# systemctl  status edb-as-16.service
=C3=97 edb-as-16.service - EDB Postgres Advanced Server 16
     Loaded: loaded (/etc/systemd/system/edb-as-16.service; disabled;
preset: disabled)
     *Active: failed* (Result: exit-code) since Thu 2024-08-01 14:30:58
IST; 4s ago
   Duration: 228ms
    Process: 82752 ExecStartPre=3D/usr/edb/as16/bin/edb-as-16-check-db-dir
${PGDATA} (code=3Dexited, status=3D0/SUCCESS)
    Process: 82757 ExecStart=3D/usr/edb/as16/bin/edb-postgres -D ${PGDATA}
(code=3Dexited, status=3D1/FAILURE)
   Main PID: 82757 (code=3Dexited, status=3D1/FAILURE)
        CPU: 325ms

Aug 01 14:30:56 service01 systemd[1]: Starting EDB Postgres Advanced Server
16...
Aug 01 14:30:56 service01 edb-postgres[82757]: 2024-08-01 14:30:56 IST LOG:
 redirecting log output to logging collector process
Aug 01 14:30:56 rservice01 edb-postgres[82757]: 2024-08-01 14:30:56 IST
HINT:  Future log output will appear in directory "log".
Aug 01 14:30:58 service01 systemd[1]: Started EDB Postgres Advanced Server
16.
Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Main process
exited, code=3Dexited, status=3D1/FAILURE
Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Killing process
82758 (edb-postgres) with signal SIGKILL.
A*ug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Failed with
result 'exit-code'.*
[root@service01 ~]#

Any hints/guidance most welcome.

--00000000000054fda2061e9c30ee
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div>List,</div><div><br></div><div><b>Not working (start =
EPAS server always fails):</b>=C2=A0</div><div><br></div><div>1. Testing PT=
R using=C2=A0 PgBackRest(2.52.1)=C2=A0 on RHEL9=C2=A0 EPAS-16, and RHEL9 ( =
Repo=C2=A0 =C2=A0 =C2=A0 =C2=A0Server)=C2=A0</div><div><br></div><div>=C2=
=A0 When I do a PTR</div><div><br></div><div>1.=C2=A0 After doing a table d=
rop and then</div><div>2. Noting down the time stamp and then=C2=A0</div><d=
iv>3. Taking an incremental=C2=A0backup in hope that If I do a restore=C2=
=A0from this incr Backup, that won&#39;t=C2=A0 contain the=C2=A0 dropped ta=
ble.=C2=A0</div><div>4.=C2=A0Correct me=C2=A0 if I=C2=A0am=C2=A0 conceptual=
ly wrong here.=C2=A0=C2=A0</div><div>5.=C2=A0 I am <b>never </b>successful =
in restoring the EPAS server in this scenario.</div><div><br></div><div><b>=
<br></b></div><div><b>I know the following=C2=A0will work for me, w</b>hy n=
ot the above one if I really want=C2=A0that state of cluster also=C2=A0 ?=
=C2=A0<br></div><div><br></div><div><b>This is Working.=C2=A0</b></div><div=
>=C2=A01. Create table=C2=A0</div><div>2. Noting down the timestamp</div><d=
iv>3.=C2=A0 Taking incremental backup on RepoServer.</div><div>4. drop the =
created table .</div><div>5. Then stop the EPAS server and do a=C2=A0 PTR, =
by the=C2=A0 --set=3Dstep 3 incr backup=C2=A0 and target=3D step 2 time sta=
mp .. It finished the pgaback=C2=A0restore and promote=C2=A0command</div><d=
iv>6. I am able to start back the=C2=A0 EPAS server and see the dropped tab=
le recovered there.</div><div><br></div><div>But If I want a PTR as in the =
first section it fails.. Why ?=C2=A0</div><div><br></div><div>Thank you,</d=
iv><div>Krishane</div><div><br></div><div><br></div><div><br></div><div><br=
></div><div><br></div><div><b>What I have done and results obtained:=C2=A0<=
/b></div><div><br></div><div>Created a table important_table4 in my EPAS an=
d note down the time after creation of this table it is=C2=A0 ( t1 :=C2=A0 =
&quot;01-AUG-24 14:08:32.447796+05:30&quot; )</div><div><br></div><div>Then=
 I performed an Incremental backup=C2=A0 =C2=A0(incr backup: 20240729-16013=
7F_20240801-141148I )</div><div>timestamp start/stop: 2024-08-01 14:11:48+0=
5:30 / 2024-08-01 14:11:52+05:30<br></div><div><br></div><div><br></div><di=
v>Now I dropped the table table4 from the EPAS and noted down the time=C2=
=A0</div><div><br></div><div><br></div><div>I want to=C2=A0 restore the tab=
le4,, so I stopped EPAS and executed=C2=A0</div><div><br></div><div>$ sudo =
-u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta --set=3D20240729-16=
0137F_20240801-141148I =C2=A0--target-timeline=3Dcurrent --type=3Dtime =C2=
=A0--target=3D&quot;01-AUG-24 14:08:32.447796+05:30&quot; --target-action=
=3Dpromote restore<br></div><div><br></div><div>IT WORKS AS EXPECTED .. aft=
er restarting the EPAS I am able to get the important_table4 back.=C2=A0</d=
iv><div><br></div><div>root@service01 ~]# sudo -u enterprisedb psql edb<br>=
psql (16.3.0)<br>Type &quot;help&quot; for help.<br><br>edb=3D# \dt<br>=C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 List of relations<br>=
=C2=A0Schema | =C2=A0 =C2=A0 =C2=A0 Name =C2=A0 =C2=A0 =C2=A0 | Type =C2=A0=
| =C2=A0 =C2=A0Owner<br>--------+------------------+-------+--------------<=
br>=C2=A0public | important_table =C2=A0| table | enterprisedb<br>=C2=A0pub=
lic | important_table2 | table | enterprisedb<br>=C2=A0public | important_t=
able3 | table | enterprisedb<br>=C2=A0public | important_table4 | table | e=
nterprisedb<br>(4 rows)<br><br></div><div>SO all works fine !!!! .=C2=A0</d=
iv><div><br></div><div><br></div><div><b><u>But Now the PROBLEM Statement.=
=C2=A0</u></b></div><div><br></div><div><b>1. I am dropping the table table=
 4=C2=A0again=C2=A0</b></div><div>edb=3D# \q<br>[root@service01 ~]# sudo -u=
 enterprisedb psql -c &quot;begin; drop table important_table4; commit;&quo=
t; edb<br>BEGIN<br>DROP TABLE<br>COMMIT<br><b>2 .=C2=A0 [root@service01 ~]#=
</b> sudo -u enterprisedb psql -Atc &quot;select current_timestamp&quot; ed=
b=C2=A0 01-AUG-24 14:23:22.085076 +05:30<br></div><div>Noting the time as :=
=C2=A0 =C2=A0(01-AUG-24 14:23:22.085076 +05:30 )<br></div><div><br></div><d=
iv>3. Now=C2=A0 I am performing an incremental backup after step 2=C2=A0 on=
 REPO SErver ( Hoping that this latest INCR Backup is without dropped impor=
tant_table4, so that a recovery of the cluster=C2=A0 shouldn&#39;t show the=
 table4 again. )=C2=A0</div><div><br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0=
 incr backup details. : 20240729-160137F_20240801-142433I<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 timestamp start/stop<b>: 2024-08-01 14:24:33+05=
:30 / 2024-08-01 14:24:36+05:30</b><br></div><div><br></div><div>4. Now I w=
ant to test the database recovery=C2=A0 after dropping the table4 in step1 =
to verify that my EPAS restores from the backup in step 3 and time stamp (0=
1-AUG-24 14:23:22.085076 +05:30,=C2=A0 =C2=A0so that=C2=A0 the restored EPA=
S cluster doesn&#39;t contain the important_table4.</div><div><br></div><di=
v>5.=C2=A0 $ sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta=
=C2=A0 =C2=A0--set=3D20240729-160137F_20240801-142433I =C2=A0--target-timel=
ine=3Dcurrent --type=3Dtime =C2=A0--target=3D&quot;01-AUG-24 14:23:22.08507=
6+05:30&quot; --target-action=3Dpromote restore</div><div>=C2=A0-----------=
-</div><div>-------------</div><div>INFO: restore command end: completed su=
ccessfully (1035ms)<br></div><div><br></div><div><b>ISSUE:=C2=A0 =C2=A0 I a=
m unable to get the EPAS Server</b> in running state after step 5=C2=A0</di=
v><div><br></div><div>=C2=A0<b>What am I doing=C2=A0wrong ?=C2=A0 OR am I c=
onceptually wrong ?</b></div><div><br></div><div><br></div><div><br></div><=
div><br></div><div>OUTPUT on executing step 5.=C2=A0</div><div><br></div><d=
iv>[root@service01 ~]# sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo=
 --delta --set=3D20240729-160137F_20240801-142433I =C2=A0--target-timeline=
=3Dcurrent --type=3Dtime =C2=A0--target=3D&quot;01-AUG-24 14:23:22.085076+0=
5:30&quot; --target-action=3Dpromote restore</div><div><br>2024-08-01 14:30=
:03.535 P00 =C2=A0 INFO: restore command begin 2.52.1: --delta --exec-id=3D=
82738-b5fe7415 --log-level-console=3Dinfo --log-level-file=3Ddebug --pg1-pa=
th=3D/var/lib/edb/as16/data --pg-version-force=3D16 --repo1-host=3D10.10.20=
.7 --repo1-host-user=3Dpostgres --set=3D20240729-160137F_20240801-142433I -=
-stanza=3DDemo_Repo --target=3D&quot;01-AUG-24 14:23:22.085076+05:30&quot; =
--target-action=3Dpromote --target-timeline=3Dcurrent --type=3Dtime<br>2024=
-08-01 14:30:03.880 P00 =C2=A0 INFO: repo1: restore backup set 20240729-160=
137F_20240801-142433I, recovery will start at 2024-08-01 14:24:33<br>2024-0=
8-01 14:30:03.881 P00 =C2=A0 INFO: remove invalid files/links/paths from &#=
39;/var/lib/edb/as16/data&#39;<br>2024-08-01 14:30:04.567 P00 =C2=A0 INFO: =
write updated /var/lib/edb/as16/data/postgresql.auto.conf<br>2024-08-01 14:=
30:04.569 P00 =C2=A0 INFO: restore global/pg_control (performed last to ens=
ure aborted restores cannot be started)<br>2024-08-01 14:30:04.569 P00 =C2=
=A0 INFO: restore size =3D 75.9MB, file total =3D 2171<br>2024-08-01 14:30:=
04.569 P00 =C2=A0 INFO: restore command end: completed successfully (1035ms=
)<br><b>[root@service01 ~]# systemctl =C2=A0start edb-as-16.service</b><br>=
</div><div><br></div><div><b>Now=C2=A0 If I check the server=C2=A0 status=
=C2=A0 :=C2=A0 =C2=A0Its dead=C2=A0</b><br></div><div><br></div><div>[root@=
service01 ~]# systemctl =C2=A0status edb-as-16.service<br>=C3=97 edb-as-16.=
service - EDB Postgres Advanced Server 16<br>=C2=A0 =C2=A0 =C2=A0Loaded: lo=
aded (/etc/systemd/system/edb-as-16.service; disabled; preset: disabled)<br=
>=C2=A0 =C2=A0 =C2=A0<b>Active: failed</b> (Result: exit-code) since Thu 20=
24-08-01 14:30:58 IST; 4s ago<br>=C2=A0 =C2=A0Duration: 228ms<br>=C2=A0 =C2=
=A0 Process: 82752 ExecStartPre=3D/usr/edb/as16/bin/edb-as-16-check-db-dir =
${PGDATA} (code=3Dexited, status=3D0/SUCCESS)<br>=C2=A0 =C2=A0 Process: 827=
57 ExecStart=3D/usr/edb/as16/bin/edb-postgres -D ${PGDATA} (code=3Dexited, =
status=3D1/FAILURE)<br>=C2=A0 =C2=A0Main PID: 82757 (code=3Dexited, status=
=3D1/FAILURE)<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 CPU: 325ms<br><br>Aug 01 14:30=
:56 service01 systemd[1]: Starting EDB Postgres Advanced Server 16...<br>Au=
g 01 14:30:56 service01 edb-postgres[82757]: 2024-08-01 14:30:56 IST LOG: =
=C2=A0redirecting log output to logging collector process<br>Aug 01 14:30:5=
6 rservice01 edb-postgres[82757]: 2024-08-01 14:30:56 IST HINT: =C2=A0Futur=
e log output will appear in directory &quot;log&quot;.<br>Aug 01 14:30:58 s=
ervice01 systemd[1]: Started EDB Postgres Advanced Server 16.<br>Aug 01 14:=
30:58 service01 systemd[1]: edb-as-16.service: Main process exited, code=3D=
exited, status=3D1/FAILURE<br>Aug 01 14:30:58 service01 systemd[1]: edb-as-=
16.service: Killing process 82758 (edb-postgres) with signal SIGKILL.<br>A<=
b>ug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Failed with resul=
t &#39;exit-code&#39;.</b><br>[root@service01 ~]#<br></div><div><br></div><=
div>Any hints/guidance most welcome.=C2=A0</div><div><br></div></div>

--00000000000054fda2061e9c30ee--