MIME-Version: 1.0
References: <CAKgGyB8rLxoDThtOSvxmc3iVgcNkz5bF6emzc7YyTNJgF51QTg@mail.gmail.com>
In-Reply-To: <CAKgGyB8rLxoDThtOSvxmc3iVgcNkz5bF6emzc7YyTNJgF51QTg@mail.gmail.com>
From: Kashif Zeeshan <kashi.zeeshan@gmail.com>
Date: Thu, 1 Aug 2024 15:00:16 +0500
Message-ID: <CAAPsdhcBz9PSbrD4QY0taouACMBqSZcWZZFoo5tY1OUxp10bww@mail.gmail.com>
Subject: Re: PgBackRest PTR recovery: After table drop to get dropped state
To: KK CHN <kkchn.in@gmail.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000ba5f01061e9c47c8"
Archived-At: <https://www.postgresql.org/message-id/CAAPsdhcBz9PSbrD4QY0taouACMBqSZcWZZFoo5tY1OUxp10bww%40mail.gmail.com>
Precedence: bulk

--000000000000ba5f01061e9c47c8
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi

On Thu, Aug 1, 2024 at 2:54=E2=80=AFPM KK CHN <kkchn.in@gmail.com> wrote:

> List,
>
> *Not working (start EPAS server always fails):*
>
> 1. Testing PTR using  PgBackRest(2.52.1)  on RHEL9  EPAS-16, and RHEL9 (
> Repo       Server)
>
>   When I do a PTR
>
> 1.  After doing a table drop and then
> 2. Noting down the time stamp and then
> 3. Taking an incremental backup in hope that If I do a restore from this
> incr Backup, that won't  contain the  dropped table.
> 4. Correct me  if I am  conceptually wrong here.
> 5.  I am *never *successful in restoring the EPAS server in this scenario=
.
>
>
> *I know the following will work for me, w*hy not the above one if I
> really want that state of cluster also  ?
>
> *This is Working. *
>  1. Create table
> 2. Noting down the timestamp
> 3.  Taking incremental backup on RepoServer.
> 4. drop the created table .
> 5. Then stop the EPAS server and do a  PTR, by the  --set=3Dstep 3 incr
> backup  and target=3D step 2 time stamp .. It finished the pgaback restor=
e
> and promote command
> 6. I am able to start back the  EPAS server and see the dropped table
> recovered there.
>
> But If I want a PTR as in the first section it fails.. Why ?
>
> Thank you,
> Krishane
>
>
>
>
>
> *What I have done and results obtained: *
>
> Created a table important_table4 in my EPAS and note down the time after
> creation of this table it is  ( t1 :  "01-AUG-24 14:08:32.447796+05:30" )
>
> Then I performed an Incremental backup   (incr backup:
> 20240729-160137F_20240801-141148I )
> timestamp start/stop: 2024-08-01 14:11:48+05:30 / 2024-08-01 14:11:52+05:=
30
>
>
> Now I dropped the table table4 from the EPAS and noted down the time
>
>
> I want to  restore the table4,, so I stopped EPAS and executed
>
> $ sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta
> --set=3D20240729-160137F_20240801-141148I  --target-timeline=3Dcurrent
> --type=3Dtime  --target=3D"01-AUG-24 14:08:32.447796+05:30"
> --target-action=3Dpromote restore
>
> IT WORKS AS EXPECTED .. after restarting the EPAS I am able to get the
> important_table4 back.
>
> root@service01 ~]# sudo -u enterprisedb psql edb
> psql (16.3.0)
> Type "help" for help.
>
> edb=3D# \dt
>                 List of relations
>  Schema |       Name       | Type  |    Owner
> --------+------------------+-------+--------------
>  public | important_table  | table | enterprisedb
>  public | important_table2 | table | enterprisedb
>  public | important_table3 | table | enterprisedb
>  public | important_table4 | table | enterprisedb
> (4 rows)
>
> SO all works fine !!!! .
>
>
> *But Now the PROBLEM Statement. *
>
> *1. I am dropping the table table 4 again *
> edb=3D# \q
> [root@service01 ~]# sudo -u enterprisedb psql -c "begin; drop table
> important_table4; commit;" edb
> BEGIN
> DROP TABLE
> COMMIT
> *2 .  [root@service01 ~]#* sudo -u enterprisedb psql -Atc "select
> current_timestamp" edb  01-AUG-24 14:23:22.085076 +05:30
> Noting the time as :   (01-AUG-24 14:23:22.085076 +05:30 )
>
> 3. Now  I am performing an incremental backup after step 2  on REPO SErve=
r
> ( Hoping that this latest INCR Backup is without dropped important_table4=
,
> so that a recovery of the cluster  shouldn't show the table4 again. )
>
>         incr backup details. : 20240729-160137F_20240801-142433I
>             timestamp start/stop*: 2024-08-01 14:24:33+05:30 / 2024-08-01
> 14:24:36+05:30*
>
> 4. Now I want to test the database recovery  after dropping the table4 in
> step1 to verify that my EPAS restores from the backup in step 3 and time
> stamp (01-AUG-24 14:23:22.085076 +05:30,   so that  the restored EPAS
> cluster doesn't contain the important_table4.
>
> 5.  $ sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta
>  --set=3D20240729-160137F_20240801-142433I  --target-timeline=3Dcurrent
> --type=3Dtime  --target=3D"01-AUG-24 14:23:22.085076+05:30"
> --target-action=3Dpromote restore
>  ------------
> -------------
> INFO: restore command end: completed successfully (1035ms)
>
> *ISSUE:    I am unable to get the EPAS Server* in running state after
> step 5
>
>  *What am I doing wrong ?  OR am I conceptually wrong ?*
>
>
>
>
> OUTPUT on executing step 5.
>
> [root@service01 ~]# sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo
> --delta --set=3D20240729-160137F_20240801-142433I  --target-timeline=3Dcu=
rrent
> --type=3Dtime  --target=3D"01-AUG-24 14:23:22.085076+05:30"
> --target-action=3Dpromote restore
>
> 2024-08-01 14:30:03.535 P00   INFO: restore command begin 2.52.1: --delta
> --exec-id=3D82738-b5fe7415 --log-level-console=3Dinfo --log-level-file=3D=
debug
> --pg1-path=3D/var/lib/edb/as16/data --pg-version-force=3D16
> --repo1-host=3D10.10.20.7 --repo1-host-user=3Dpostgres
> --set=3D20240729-160137F_20240801-142433I --stanza=3DDemo_Repo
> --target=3D"01-AUG-24 14:23:22.085076+05:30" --target-action=3Dpromote
> --target-timeline=3Dcurrent --type=3Dtime
> 2024-08-01 14:30:03.880 P00   INFO: repo1: restore backup set
> 20240729-160137F_20240801-142433I, recovery will start at 2024-08-01
> 14:24:33
> 2024-08-01 14:30:03.881 P00   INFO: remove invalid files/links/paths from
> '/var/lib/edb/as16/data'
> 2024-08-01 14:30:04.567 P00   INFO: write updated
> /var/lib/edb/as16/data/postgresql.auto.conf
> 2024-08-01 14:30:04.569 P00   INFO: restore global/pg_control (performed
> last to ensure aborted restores cannot be started)
> 2024-08-01 14:30:04.569 P00   INFO: restore size =3D 75.9MB, file total =
=3D
> 2171
> 2024-08-01 14:30:04.569 P00   INFO: restore command end: completed
> successfully (1035ms)
> *[root@service01 ~]# systemctl  start edb-as-16.service*
>
> *Now  If I check the server  status  :   Its dead *
>
> [root@service01 ~]# systemctl  status edb-as-16.service
> =C3=97 edb-as-16.service - EDB Postgres Advanced Server 16
>      Loaded: loaded (/etc/systemd/system/edb-as-16.service; disabled;
> preset: disabled)
>      *Active: failed* (Result: exit-code) since Thu 2024-08-01 14:30:58
> IST; 4s ago
>    Duration: 228ms
>     Process: 82752 ExecStartPre=3D/usr/edb/as16/bin/edb-as-16-check-db-di=
r
> ${PGDATA} (code=3Dexited, status=3D0/SUCCESS)
>     Process: 82757 ExecStart=3D/usr/edb/as16/bin/edb-postgres -D ${PGDATA=
}
> (code=3Dexited, status=3D1/FAILURE)
>    Main PID: 82757 (code=3Dexited, status=3D1/FAILURE)
>         CPU: 325ms
>
> Aug 01 14:30:56 service01 systemd[1]: Starting EDB Postgres Advanced
> Server 16...
> Aug 01 14:30:56 service01 edb-postgres[82757]: 2024-08-01 14:30:56 IST
> LOG:  redirecting log output to logging collector process
> Aug 01 14:30:56 rservice01 edb-postgres[82757]: 2024-08-01 14:30:56 IST
> HINT:  Future log output will appear in directory "log".
> Aug 01 14:30:58 service01 systemd[1]: Started EDB Postgres Advanced Serve=
r
> 16.
> Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Main process
> exited, code=3Dexited, status=3D1/FAILURE
> Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Killing process
> 82758 (edb-postgres) with signal SIGKILL.
> A*ug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Failed with
> result 'exit-code'.*
> [root@service01 ~]#
>
> Any hints/guidance most welcome.
>
> Can you please share the DB Server log as it contains the exact error
which is causing the server not to start.

Thanks

--000000000000ba5f01061e9c47c8
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr">Hi</div><br><div class=3D"gmail_quote"><d=
iv dir=3D"ltr" class=3D"gmail_attr">On Thu, Aug 1, 2024 at 2:54=E2=80=AFPM =
KK CHN &lt;<a href=3D"mailto:kkchn.in@gmail.com">kkchn.in@gmail.com</a>&gt;=
 wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px =
0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=
=3D"ltr"><div>List,</div><div><br></div><div><b>Not working (start EPAS ser=
ver always fails):</b>=C2=A0</div><div><br></div><div>1. Testing PTR using=
=C2=A0 PgBackRest(2.52.1)=C2=A0 on RHEL9=C2=A0 EPAS-16, and RHEL9 ( Repo=C2=
=A0 =C2=A0 =C2=A0 =C2=A0Server)=C2=A0</div><div><br></div><div>=C2=A0 When =
I do a PTR</div><div><br></div><div>1.=C2=A0 After doing a table drop and t=
hen</div><div>2. Noting down the time stamp and then=C2=A0</div><div>3. Tak=
ing an incremental=C2=A0backup in hope that If I do a restore=C2=A0from thi=
s incr Backup, that won&#39;t=C2=A0 contain the=C2=A0 dropped table.=C2=A0<=
/div><div>4.=C2=A0Correct me=C2=A0 if I=C2=A0am=C2=A0 conceptually wrong he=
re.=C2=A0=C2=A0</div><div>5.=C2=A0 I am <b>never </b>successful in restorin=
g the EPAS server in this scenario.</div><div><br></div><div><b><br></b></d=
iv><div><b>I know the following=C2=A0will work for me, w</b>hy not the abov=
e one if I really want=C2=A0that state of cluster also=C2=A0 ?=C2=A0<br></d=
iv><div><br></div><div><b>This is Working.=C2=A0</b></div><div>=C2=A01. Cre=
ate table=C2=A0</div><div>2. Noting down the timestamp</div><div>3.=C2=A0 T=
aking incremental backup on RepoServer.</div><div>4. drop the created table=
 .</div><div>5. Then stop the EPAS server and do a=C2=A0 PTR, by the=C2=A0 =
--set=3Dstep 3 incr backup=C2=A0 and target=3D step 2 time stamp .. It fini=
shed the pgaback=C2=A0restore and promote=C2=A0command</div><div>6. I am ab=
le to start back the=C2=A0 EPAS server and see the dropped table recovered =
there.</div><div><br></div><div>But If I want a PTR as in the first section=
 it fails.. Why ?=C2=A0</div><div><br></div><div>Thank you,</div><div>Krish=
ane</div><div><br></div><div><br></div><div><br></div><div><br></div><div><=
br></div><div><b>What I have done and results obtained:=C2=A0</b></div><div=
><br></div><div>Created a table important_table4 in my EPAS and note down t=
he time after creation of this table it is=C2=A0 ( t1 :=C2=A0 &quot;01-AUG-=
24 14:08:32.447796+05:30&quot; )</div><div><br></div><div>Then I performed =
an Incremental backup=C2=A0 =C2=A0(incr backup: 20240729-160137F_20240801-1=
41148I )</div><div>timestamp start/stop: 2024-08-01 14:11:48+05:30 / 2024-0=
8-01 14:11:52+05:30<br></div><div><br></div><div><br></div><div>Now I dropp=
ed the table table4 from the EPAS and noted down the time=C2=A0</div><div><=
br></div><div><br></div><div>I want to=C2=A0 restore the table4,, so I stop=
ped EPAS and executed=C2=A0</div><div><br></div><div>$ sudo -u enterprisedb=
 pgbackrest --stanza=3DDemo_Repo --delta --set=3D20240729-160137F_20240801-=
141148I =C2=A0--target-timeline=3Dcurrent --type=3Dtime =C2=A0--target=3D&q=
uot;01-AUG-24 14:08:32.447796+05:30&quot; --target-action=3Dpromote restore=
<br></div><div><br></div><div>IT WORKS AS EXPECTED .. after restarting the =
EPAS I am able to get the important_table4 back.=C2=A0</div><div><br></div>=
<div>root@service01 ~]# sudo -u enterprisedb psql edb<br>psql (16.3.0)<br>T=
ype &quot;help&quot; for help.<br><br>edb=3D# \dt<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 List of relations<br>=C2=A0Schema | =C2=
=A0 =C2=A0 =C2=A0 Name =C2=A0 =C2=A0 =C2=A0 | Type =C2=A0| =C2=A0 =C2=A0Own=
er<br>--------+------------------+-------+--------------<br>=C2=A0public | =
important_table =C2=A0| table | enterprisedb<br>=C2=A0public | important_ta=
ble2 | table | enterprisedb<br>=C2=A0public | important_table3 | table | en=
terprisedb<br>=C2=A0public | important_table4 | table | enterprisedb<br>(4 =
rows)<br><br></div><div>SO all works fine !!!! .=C2=A0</div><div><br></div>=
<div><br></div><div><b><u>But Now the PROBLEM Statement.=C2=A0</u></b></div=
><div><br></div><div><b>1. I am dropping the table table 4=C2=A0again=C2=A0=
</b></div><div>edb=3D# \q<br>[root@service01 ~]# sudo -u enterprisedb psql =
-c &quot;begin; drop table important_table4; commit;&quot; edb<br>BEGIN<br>=
DROP TABLE<br>COMMIT<br><b>2 .=C2=A0 [root@service01 ~]#</b> sudo -u enterp=
risedb psql -Atc &quot;select current_timestamp&quot; edb=C2=A0 01-AUG-24 1=
4:23:22.085076 +05:30<br></div><div>Noting the time as :=C2=A0 =C2=A0(01-AU=
G-24 14:23:22.085076 +05:30 )<br></div><div><br></div><div>3. Now=C2=A0 I a=
m performing an incremental backup after step 2=C2=A0 on REPO SErver ( Hopi=
ng that this latest INCR Backup is without dropped important_table4, so tha=
t a recovery of the cluster=C2=A0 shouldn&#39;t show the table4 again. )=C2=
=A0</div><div><br></div><div>=C2=A0 =C2=A0 =C2=A0 =C2=A0 incr backup detail=
s. : 20240729-160137F_20240801-142433I<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0 timestamp start/stop<b>: 2024-08-01 14:24:33+05:30 / 2024-08-01 =
14:24:36+05:30</b><br></div><div><br></div><div>4. Now I want to test the d=
atabase recovery=C2=A0 after dropping the table4 in step1 to verify that my=
 EPAS restores from the backup in step 3 and time stamp (01-AUG-24 14:23:22=
.085076 +05:30,=C2=A0 =C2=A0so that=C2=A0 the restored EPAS cluster doesn&#=
39;t contain the important_table4.</div><div><br></div><div>5.=C2=A0 $ sudo=
 -u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta=C2=A0 =C2=A0--set=
=3D20240729-160137F_20240801-142433I =C2=A0--target-timeline=3Dcurrent --ty=
pe=3Dtime =C2=A0--target=3D&quot;01-AUG-24 14:23:22.085076+05:30&quot; --ta=
rget-action=3Dpromote restore</div><div>=C2=A0------------</div><div>------=
-------</div><div>INFO: restore command end: completed successfully (1035ms=
)<br></div><div><br></div><div><b>ISSUE:=C2=A0 =C2=A0 I am unable to get th=
e EPAS Server</b> in running state after step 5=C2=A0</div><div><br></div><=
div>=C2=A0<b>What am I doing=C2=A0wrong ?=C2=A0 OR am I conceptually wrong =
?</b></div><div><br></div><div><br></div><div><br></div><div><br></div><div=
>OUTPUT on executing step 5.=C2=A0</div><div><br></div><div>[root@service01=
 ~]# sudo -u enterprisedb pgbackrest --stanza=3DDemo_Repo --delta --set=3D2=
0240729-160137F_20240801-142433I =C2=A0--target-timeline=3Dcurrent --type=
=3Dtime =C2=A0--target=3D&quot;01-AUG-24 14:23:22.085076+05:30&quot; --targ=
et-action=3Dpromote restore</div><div><br>2024-08-01 14:30:03.535 P00 =C2=
=A0 INFO: restore command begin 2.52.1: --delta --exec-id=3D82738-b5fe7415 =
--log-level-console=3Dinfo --log-level-file=3Ddebug --pg1-path=3D/var/lib/e=
db/as16/data --pg-version-force=3D16 --repo1-host=3D10.10.20.7 --repo1-host=
-user=3Dpostgres --set=3D20240729-160137F_20240801-142433I --stanza=3DDemo_=
Repo --target=3D&quot;01-AUG-24 14:23:22.085076+05:30&quot; --target-action=
=3Dpromote --target-timeline=3Dcurrent --type=3Dtime<br>2024-08-01 14:30:03=
.880 P00 =C2=A0 INFO: repo1: restore backup set 20240729-160137F_20240801-1=
42433I, recovery will start at 2024-08-01 14:24:33<br>2024-08-01 14:30:03.8=
81 P00 =C2=A0 INFO: remove invalid files/links/paths from &#39;/var/lib/edb=
/as16/data&#39;<br>2024-08-01 14:30:04.567 P00 =C2=A0 INFO: write updated /=
var/lib/edb/as16/data/postgresql.auto.conf<br>2024-08-01 14:30:04.569 P00 =
=C2=A0 INFO: restore global/pg_control (performed last to ensure aborted re=
stores cannot be started)<br>2024-08-01 14:30:04.569 P00 =C2=A0 INFO: resto=
re size =3D 75.9MB, file total =3D 2171<br>2024-08-01 14:30:04.569 P00 =C2=
=A0 INFO: restore command end: completed successfully (1035ms)<br><b>[root@=
service01 ~]# systemctl =C2=A0start edb-as-16.service</b><br></div><div><br=
></div><div><b>Now=C2=A0 If I check the server=C2=A0 status=C2=A0 :=C2=A0 =
=C2=A0Its dead=C2=A0</b><br></div><div><br></div><div>[root@service01 ~]# s=
ystemctl =C2=A0status edb-as-16.service<br>=C3=97 edb-as-16.service - EDB P=
ostgres Advanced Server 16<br>=C2=A0 =C2=A0 =C2=A0Loaded: loaded (/etc/syst=
emd/system/edb-as-16.service; disabled; preset: disabled)<br>=C2=A0 =C2=A0 =
=C2=A0<b>Active: failed</b> (Result: exit-code) since Thu 2024-08-01 14:30:=
58 IST; 4s ago<br>=C2=A0 =C2=A0Duration: 228ms<br>=C2=A0 =C2=A0 Process: 82=
752 ExecStartPre=3D/usr/edb/as16/bin/edb-as-16-check-db-dir ${PGDATA} (code=
=3Dexited, status=3D0/SUCCESS)<br>=C2=A0 =C2=A0 Process: 82757 ExecStart=3D=
/usr/edb/as16/bin/edb-postgres -D ${PGDATA} (code=3Dexited, status=3D1/FAIL=
URE)<br>=C2=A0 =C2=A0Main PID: 82757 (code=3Dexited, status=3D1/FAILURE)<br=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 CPU: 325ms<br><br>Aug 01 14:30:56 service01 sy=
stemd[1]: Starting EDB Postgres Advanced Server 16...<br>Aug 01 14:30:56 se=
rvice01 edb-postgres[82757]: 2024-08-01 14:30:56 IST LOG: =C2=A0redirecting=
 log output to logging collector process<br>Aug 01 14:30:56 rservice01 edb-=
postgres[82757]: 2024-08-01 14:30:56 IST HINT: =C2=A0Future log output will=
 appear in directory &quot;log&quot;.<br>Aug 01 14:30:58 service01 systemd[=
1]: Started EDB Postgres Advanced Server 16.<br>Aug 01 14:30:58 service01 s=
ystemd[1]: edb-as-16.service: Main process exited, code=3Dexited, status=3D=
1/FAILURE<br>Aug 01 14:30:58 service01 systemd[1]: edb-as-16.service: Killi=
ng process 82758 (edb-postgres) with signal SIGKILL.<br>A<b>ug 01 14:30:58 =
service01 systemd[1]: edb-as-16.service: Failed with result &#39;exit-code&=
#39;.</b><br>[root@service01 ~]#<br></div><div><br></div><div>Any hints/gui=
dance most welcome.=C2=A0</div><div><br></div></div></blockquote><div>Can y=
ou please share the DB Server log as it contains the exact error which is c=
ausing the server not to start.</div><div><br></div><div>Thanks=C2=A0</div>=
</div></div>

--000000000000ba5f01061e9c47c8--