public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrian Klaver <[email protected]>
To: Nick Renders <[email protected]>
To: [email protected]
Subject: Re: could not open file "global/pg_filenode.map": Operation not permitted
Date: Tue, 12 Mar 2024 07:58:29 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On 3/12/24 02:57, Nick Renders wrote:
> On 11 Mar 2024, at 16:04, Adrian Klaver wrote:
> 
>> On 3/11/24 03:11, Nick Renders wrote:
>>> Thank you for your reply Laurenz.
>>> I don't think it is related to any third party security software. We have several other machines with a similar setup, but this is the only server that has this issue.
>>>
>>> The one thing different about this machine however, is that it runs 2 instances of Postgres:
>>> - cluster A on port 165
>>> - cluster B on port 164
>>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means that we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restore the latest back into the Data folder, and start up the service again.
>>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the intermittent "operation not permitted" issue.
>>>
>>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
>>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>>
>> Since it seems to be the trigger, what are the contents of the restore script?
>>
>>>
>>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
>>>
>>> Thanks,
>>>
>>> Nick Renders
>>>
>>
>>
>> -- 
>> Adrian Klaver
>> [email protected]
> 
> 
> 
>> ...how are A and B connected?
> 
> The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation ( /Library/PostgreSQL/16/ ) and their respective Data folders are located on the same volume ( /Volumes/Postgres_Data/PostgreSQL/16/data and /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run independently on 2 different ports, specified in the postgresql.conf.
> 
> 
>> ...run them under different users on the system.
> 
> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?
> 
> 
>> ...what are the contents of the restore script?
> 
> ## stop cluster A
> ssh [email protected] '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop'
> 
> ## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information like the port number)
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf ../ARC_postgresql_16.conf'
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf'
> 
> ## clear data directory
> ssh [email protected] 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'
> 
> ## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder)
> barman recover --remote-ssh-command 'ssh [email protected]' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data
> 
> ## restore config files
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'
> 
> ## start cluster A
> ssh [email protected] '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null'
> 
> 
> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even though the issue is most likely related to the script, it does not cause it every time.

I'm not seeing anything obvious, caveat I'm on my first cup of coffee.

 From your first post:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file 
"postmaster.pid": Operation not permitted; continuing anyway

For now the only suggestion I have is note the presence, ownership and 
privileges of the above files in the present working setup. Then when it 
fails do the same and see if there is a difference. My hunch it is in 
this step:

barman recover --remote-ssh-command 'ssh [email protected]' pg 
20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data

If not the step itself then in the process that creates 20240312T040106.

> 
> 
> Best regards,
> 
> Nick Renders
> 
> 
> 

-- 
Adrian Klaver
[email protected]







view thread (15+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: could not open file "global/pg_filenode.map": Operation not permitted
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox