From: Nick Renders <postgres@arcict.com>
To: pgsql-general@lists.postgresql.org
Subject: Re: could not open file "global/pg_filenode.map": Operation not
 permitted
Date: Tue, 12 Mar 2024 10:57:19 +0100
Message-ID: <19556056-40E7-4FA3-A2A1-0A345AEBFD9E@arcict.com>
In-Reply-To: <a08f7e54-0a6f-466b-b3fc-08165076c605@aklaver.com>
References: <4D67E594-098F-4234-87D8-68F827AF2531@arcict.com>
 <d947712004aa39633639bd5a23d6a6d6b6b2665a.camel@cybertec.at>
 <2E2F11F8-718A-4E6A-81E0-4F5CC1F1273A@arcict.com>
 <a08f7e54-0a6f-466b-b3fc-08165076c605@aklaver.com>
MIME-Version: 1.0
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/19556056-40E7-4FA3-A2A1-0A345AEBFD9E%40arcict.com>
Precedence: bulk

On 11 Mar 2024, at 16:04, Adrian Klaver wrote:

> On 3/11/24 03:11, Nick Renders wrote:
>> Thank you for your reply Laurenz.
>> I don't think it is related to any third party security software. We h=
ave several other machines with a similar setup, but this is the only ser=
ver that has this issue.
>>
>> The one thing different about this machine however, is that it runs 2 =
instances of Postgres:
>> - cluster A on port 165
>> - cluster B on port 164
>> Cluster A is actually a backup from another Postgres server that is re=
stored on a daily basis via Barman. This means that we login remotely fro=
m the Barman server over SSH, stop cluster A's service (port 165), clear =
the Data folder, restore the latest back into the Data folder, and start =
up the service again.
>> Cluster B's Data and service (port 164) remain untouched during all th=
is time. This is the cluster that experiences the intermittent "operation=
 not permitted" issue.
>>
>> Over the past 2 weeks, I have suspended our restore script and the iss=
ue did not occur.
>> I have just performed another restore on cluster A and now cluster B i=
s throwing errors in the log again.
>
> Since it seems to be the trigger, what are the contents of the restore =
script?
>
>>
>> Any idea why this is happening? It does not occur with every restore, =
but it seems to be related anyway.
>>
>> Thanks,
>>
>> Nick Renders
>>
>
>
> -- =

> Adrian Klaver
> adrian.klaver@aklaver.com


> ...how are A and B connected?

The 2 cluster are not connected. They run on the same macOS 14 machine wi=
th a single Postgres installation ( /Library/PostgreSQL/16/ ) and their r=
espective Data folders are located on the same volume ( /Volumes/Postgres=
_Data/PostgreSQL/16/data and /Volumes/Postgres_Data/PostgreSQL/16-DML/dat=
a ). Beside that, they run independently on 2 different ports, specified =
in the postgresql.conf.


> ...run them under different users on the system.

Are you referring to the "postgres" user / role? Does that also mean sett=
ing up 2 postgres installation directories?


> ...what are the contents of the restore script?

## stop cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Post=
gres_Data/PostgreSQL/16/data stop'

## save config files (ARC_postgresql_16.conf is included in postgresql.co=
nf and contains cluster-specific information like the port number)
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp=
 ARC_postgresql_16.conf ../ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp=
 pg_hba.conf ../pg_hba.conf'

## clear data directory
ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'=


## transfer recovery (this will copy the backup "20240312T040106" and any=
 lingering WAL files into the Data folder)
barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T0=
40106 /Volumes/Postgres_Data/PostgreSQL/16/data

## restore config files
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd=
 .. && mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/dat=
a/ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd=
 .. && mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.co=
nf'

## start cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Post=
gres_Data/PostgreSQL/16/data start > /dev/null'


This script runs on a daily basis at 4:30 AM. It did so this morning and =
there was no issue with cluster B. So even though the issue is most likel=
y related to the script, it does not cause it every time.


Best regards,

Nick Renders