public inbox for [email protected]  
help / color / mirror / Atom feed
Re: could not open file "global/pg_filenode.map": Operation not permitted
15+ messages / 6 participants
[nested] [flat]

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-11 10:11  Nick Renders <[email protected]>
  0 siblings, 3 replies; 15+ messages in thread

From: Nick Renders @ 2024-03-11 10:11 UTC (permalink / raw)
  To: [email protected]

Thank you for your reply Laurenz.
I don't think it is related to any third party security software. We have several other machines with a similar setup, but this is the only server that has this issue.

The one thing different about this machine however, is that it runs 2 instances of Postgres:
- cluster A on port 165
- cluster B on port 164
Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means that we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restore the latest back into the Data folder, and start up the service again.
Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the intermittent "operation not permitted" issue.

Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.

Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.

Thanks,

Nick Renders


On 26 Feb 2024, at 16:29, Laurenz Albe wrote:

> On Mon, 2024-02-26 at 15:14 +0100, Nick Renders wrote:
>> We have a Postgres server that intermittently logs the following:
>>
>> 2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
>> 2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway
>>
>> This has happened 3 times over the last 2 weeks now, without any indication what caused it.
>> The privileges of those 2 files are all in order.
>> When this happens, the server is no longer accessible, and we need to restart the service (pg_ctl restart).
>> Once restarted, Popstgres runs fine again for a couple of days.
>>
>> We are running PostgreSQL 16.2 on macOS 14.3.1.
>
> Perhaps that is some kind of virus checker or something else that locks files.
>
> Yours,
> Laurenz Albe






^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-11 11:10  Laurenz Albe <[email protected]>
  parent: Nick Renders <[email protected]>
  2 siblings, 0 replies; 15+ messages in thread

From: Laurenz Albe @ 2024-03-11 11:10 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; [email protected]

On Mon, 2024-03-11 at 11:11 +0100, Nick Renders wrote:
> We have several other machines with a similar setup, but this is the only server that has this issue.
> 
> [...] Cluster A is actually a backup from another Postgres server that is restored on a
> daily basis via Barman. This means that we login remotely from the Barman server over SSH,
> stop cluster A's service (port 165), clear the Data folder, restore the latest back into
> the Data folder, and start up the service again.
> Cluster B's Data and service (port 164) remain untouched during all this time. This is
> the cluster that experiences the intermittent "operation not permitted" issue.
> 
> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
> 
> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.

I don't know Barman, but with that incomplete description anybody will have
problems determining the cause.  For example, how are A and B connected?

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-11 11:56  Stephen Frost <[email protected]>
  parent: Nick Renders <[email protected]>
  2 siblings, 0 replies; 15+ messages in thread

From: Stephen Frost @ 2024-03-11 11:56 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; +Cc: [email protected]

Greetings,

* Nick Renders ([email protected]) wrote:
> The one thing different about this machine however, is that it runs 2 instances of Postgres:
> - cluster A on port 165
> - cluster B on port 164
> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means that we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restore the latest back into the Data folder, and start up the service again.
> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the intermittent "operation not permitted" issue.
> 
> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
> 
> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.

Not sure why it's happening but they certainly sound related based on
the correlation.

One thing that I suggest doing when you're running multiple independent
PG clusters on the same host is to run them under different users on the
system.  Perhaps if you move cluster B to a distinct user account,
you'll have better luck figuring out what's going on since something
will presumably start throwing permission denied errors.

Not sure if it's an option or not, but you might also consider using
Linux instead of MacOS..

Thanks,

Stephen


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-11 15:04  Adrian Klaver <[email protected]>
  parent: Nick Renders <[email protected]>
  2 siblings, 1 reply; 15+ messages in thread

From: Adrian Klaver @ 2024-03-11 15:04 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; [email protected]

On 3/11/24 03:11, Nick Renders wrote:
> Thank you for your reply Laurenz.
> I don't think it is related to any third party security software. We have several other machines with a similar setup, but this is the only server that has this issue.
> 
> The one thing different about this machine however, is that it runs 2 instances of Postgres:
> - cluster A on port 165
> - cluster B on port 164
> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means that we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restore the latest back into the Data folder, and start up the service again.
> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the intermittent "operation not permitted" issue.
> 
> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.

Since it seems to be the trigger, what are the contents of the restore 
script?

> 
> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
> 
> Thanks,
> 
> Nick Renders
> 


-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-12 09:57  Nick Renders <[email protected]>
  parent: Adrian Klaver <[email protected]>
  0 siblings, 2 replies; 15+ messages in thread

From: Nick Renders @ 2024-03-12 09:57 UTC (permalink / raw)
  To: [email protected]

On 11 Mar 2024, at 16:04, Adrian Klaver wrote:

> On 3/11/24 03:11, Nick Renders wrote:
>> Thank you for your reply Laurenz.
>> I don't think it is related to any third party security software. We have several other machines with a similar setup, but this is the only server that has this issue.
>>
>> The one thing different about this machine however, is that it runs 2 instances of Postgres:
>> - cluster A on port 165
>> - cluster B on port 164
>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means that we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restore the latest back into the Data folder, and start up the service again.
>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the intermittent "operation not permitted" issue.
>>
>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>
> Since it seems to be the trigger, what are the contents of the restore script?
>
>>
>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
>>
>> Thanks,
>>
>> Nick Renders
>>
>
>
> -- 
> Adrian Klaver
> [email protected]



> ...how are A and B connected?

The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation ( /Library/PostgreSQL/16/ ) and their respective Data folders are located on the same volume ( /Volumes/Postgres_Data/PostgreSQL/16/data and /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run independently on 2 different ports, specified in the postgresql.conf.


> ...run them under different users on the system.

Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?


> ...what are the contents of the restore script?

## stop cluster A
ssh [email protected] '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop'

## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information like the port number)
ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf ../ARC_postgresql_16.conf'
ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf'

## clear data directory
ssh [email protected] 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'

## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder)
barman recover --remote-ssh-command 'ssh [email protected]' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data

## restore config files
ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'

## start cluster A
ssh [email protected] '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null'


This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even though the issue is most likely related to the script, it does not cause it every time.


Best regards,

Nick Renders







^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-12 14:58  Adrian Klaver <[email protected]>
  parent: Nick Renders <[email protected]>
  1 sibling, 0 replies; 15+ messages in thread

From: Adrian Klaver @ 2024-03-12 14:58 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; [email protected]

On 3/12/24 02:57, Nick Renders wrote:
> On 11 Mar 2024, at 16:04, Adrian Klaver wrote:
> 
>> On 3/11/24 03:11, Nick Renders wrote:
>>> Thank you for your reply Laurenz.
>>> I don't think it is related to any third party security software. We have several other machines with a similar setup, but this is the only server that has this issue.
>>>
>>> The one thing different about this machine however, is that it runs 2 instances of Postgres:
>>> - cluster A on port 165
>>> - cluster B on port 164
>>> Cluster A is actually a backup from another Postgres server that is restored on a daily basis via Barman. This means that we login remotely from the Barman server over SSH, stop cluster A's service (port 165), clear the Data folder, restore the latest back into the Data folder, and start up the service again.
>>> Cluster B's Data and service (port 164) remain untouched during all this time. This is the cluster that experiences the intermittent "operation not permitted" issue.
>>>
>>> Over the past 2 weeks, I have suspended our restore script and the issue did not occur.
>>> I have just performed another restore on cluster A and now cluster B is throwing errors in the log again.
>>
>> Since it seems to be the trigger, what are the contents of the restore script?
>>
>>>
>>> Any idea why this is happening? It does not occur with every restore, but it seems to be related anyway.
>>>
>>> Thanks,
>>>
>>> Nick Renders
>>>
>>
>>
>> -- 
>> Adrian Klaver
>> [email protected]
> 
> 
> 
>> ...how are A and B connected?
> 
> The 2 cluster are not connected. They run on the same macOS 14 machine with a single Postgres installation ( /Library/PostgreSQL/16/ ) and their respective Data folders are located on the same volume ( /Volumes/Postgres_Data/PostgreSQL/16/data and /Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run independently on 2 different ports, specified in the postgresql.conf.
> 
> 
>> ...run them under different users on the system.
> 
> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?
> 
> 
>> ...what are the contents of the restore script?
> 
> ## stop cluster A
> ssh [email protected] '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data stop'
> 
> ## save config files (ARC_postgresql_16.conf is included in postgresql.conf and contains cluster-specific information like the port number)
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp ARC_postgresql_16.conf ../ARC_postgresql_16.conf'
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp pg_hba.conf ../pg_hba.conf'
> 
> ## clear data directory
> ssh [email protected] 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'
> 
> ## transfer recovery (this will copy the backup "20240312T040106" and any lingering WAL files into the Data folder)
> barman recover --remote-ssh-command 'ssh [email protected]' pg 20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data
> 
> ## restore config files
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
> ssh [email protected] 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'
> 
> ## start cluster A
> ssh [email protected] '/Library/PostgreSQL/16/bin/pg_ctl -D /Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null'
> 
> 
> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even though the issue is most likely related to the script, it does not cause it every time.

I'm not seeing anything obvious, caveat I'm on my first cup of coffee.

 From your first post:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file 
"postmaster.pid": Operation not permitted; continuing anyway

For now the only suggestion I have is note the presence, ownership and 
privileges of the above files in the present working setup. Then when it 
fails do the same and see if there is a difference. My hunch it is in 
this step:

barman recover --remote-ssh-command 'ssh [email protected]' pg 
20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data

If not the step itself then in the process that creates 20240312T040106.

> 
> 
> Best regards,
> 
> Nick Renders
> 
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-13 11:35  Stephen Frost <[email protected]>
  parent: Nick Renders <[email protected]>
  1 sibling, 1 reply; 15+ messages in thread

From: Stephen Frost @ 2024-03-13 11:35 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; +Cc: [email protected]

Greetings,

* Nick Renders ([email protected]) wrote:
> > ...run them under different users on the system.
> 
> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?

Yes, two separate MacOS user accounts is what I was suggesting.  You
could use the same postgres binaries though, no need to have two
installation of them.  You'd need seperate data directories, of course,
as you have currently.

> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even though the issue is most likely related to the script, it does not cause it every time.

Seems likely that it's some sort of race condition.

Thanks,

Stephen


Attachments:

  [application/pgp-signature] signature.asc (833B, 2-signature.asc)
  download

^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-22 14:01  Nick Renders <[email protected]>
  parent: Stephen Frost <[email protected]>
  0 siblings, 3 replies; 15+ messages in thread

From: Nick Renders @ 2024-03-22 14:01 UTC (permalink / raw)
  To: [email protected]

On 13 Mar 2024, at 12:35, Stephen Frost wrote:

> Greetings,
>
> * Nick Renders ([email protected]) wrote:
>>> ...run them under different users on the system.
>>
>> Are you referring to the "postgres" user / role? Does that also mean setting up 2 postgres installation directories?
>
> Yes, two separate MacOS user accounts is what I was suggesting.  You
> could use the same postgres binaries though, no need to have two
> installation of them.  You'd need seperate data directories, of course,
> as you have currently.
>
>> This script runs on a daily basis at 4:30 AM. It did so this morning and there was no issue with cluster B. So even though the issue is most likely related to the script, it does not cause it every time.
>
> Seems likely that it's some sort of race condition.
>
> Thanks,
>
> Stephen

We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances running.

I installed Postgres yesterday and restored a copy from our live database in the Data directory. The Postgres process started up without problems, but after 40 minutes it started throwing the same errors in the log:

	2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
	2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
	2024-03-21 11:50:07.398 CET [965] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway

I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I wasn't doing anything on the machine at that time, so I have no idea what might have triggered it.

Is there perhaps some feature that I can enable that logs which processes use these 2 files?

Thanks,

Nick Renders






^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-22 15:33  Adrian Klaver <[email protected]>
  parent: Nick Renders <[email protected]>
  2 siblings, 0 replies; 15+ messages in thread

From: Adrian Klaver @ 2024-03-22 15:33 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; [email protected]

On 3/22/24 07:01, Nick Renders wrote:
> On 13 Mar 2024, at 12:35, Stephen Frost wrote:

> We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
> This one only has a single Data directory, so there are no multiple instances running.
> 
> I installed Postgres yesterday and restored a copy from our live database in the Data directory. The Postgres process started up without problems, but after 40 minutes it started throwing the same errors in the log:
> 
> 	2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
> 	2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file "global/pg_filenode.map": Operation not permitted
> 	2024-03-21 11:50:07.398 CET [965] LOG:  could not open file "postmaster.pid": Operation not permitted; continuing anyway
> 
> I stopped and started the process, and it continued working again until around 21:20, when the issue popped up again. I wasn't doing anything on the machine at that time, so I have no idea what might have triggered it.

Have you looked at the OS system logs?

> 
> Is there perhaps some feature that I can enable that logs which processes use these 2 files?
> 
> Thanks,
> 
> Nick Renders
> 
> 

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-22 16:00  Alban Hertroys <[email protected]>
  parent: Nick Renders <[email protected]>
  2 siblings, 1 reply; 15+ messages in thread

From: Alban Hertroys @ 2024-03-22 16:00 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; +Cc: [email protected]

On Fri, 22 Mar 2024 at 15:01, Nick Renders <[email protected]> wrote:

>
> We now have a second machine with this issue: it is an Intel Mac mini
> running macOS Sonoma (14.4) and PostgreSQL 16.2.
> This one only has a single Data directory, so there are no multiple
> instances running.
>

I don't think that having a single Data directory prevents multiple
instances from running. That's more of a matter of how often pg_ctl was
called with the start command for that particular data directory.


> I installed Postgres yesterday and restored a copy from our live database
> in the Data directory.


How did you restore that copy? Was that a file-based copy perhaps? Your
files may have incorrect owners or permissions in that case.


> The Postgres process started up without problems, but after 40 minutes it
> started throwing the same errors in the log:
>
>         2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
>         2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
>         2024-03-21 11:50:07.398 CET [965] LOG:  could not open file
> "postmaster.pid": Operation not permitted; continuing anyway
>

It's possible that some other process put a lock on these files. Spotlight
perhaps? Or TimeMachine?


> I stopped and started the process, and it continued working again until
> around 21:20, when the issue popped up again. I wasn't doing anything on
> the machine at that time, so I have no idea what might have triggered it.
>
> Is there perhaps some feature that I can enable that logs which processes
> use these 2 files?
>

IIRC, MacOS comes shipped with the lsof command, which will tell you which
processes have a given file open. See man lsof.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-28 15:47  Nick Renders <[email protected]>
  parent: Alban Hertroys <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: Nick Renders @ 2024-03-28 15:47 UTC (permalink / raw)
  To: [email protected]

On 22 Mar 2024, at 17:00, Alban Hertroys wrote:

> On Fri, 22 Mar 2024 at 15:01, Nick Renders <[email protected]> 
> wrote:
>
>>
>> We now have a second machine with this issue: it is an Intel Mac mini
>> running macOS Sonoma (14.4) and PostgreSQL 16.2.
>> This one only has a single Data directory, so there are no multiple
>> instances running.
>>
>
> I don't think that having a single Data directory prevents multiple
> instances from running. That's more of a matter of how often pg_ctl 
> was
> called with the start command for that particular data directory.
>
>
>> I installed Postgres yesterday and restored a copy from our live 
>> database
>> in the Data directory.
>
>
> How did you restore that copy? Was that a file-based copy perhaps? 
> Your
> files may have incorrect owners or permissions in that case.
>
>
>> The Postgres process started up without problems, but after 40 
>> minutes it
>> started throwing the same errors in the log:
>>
>>         2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open 
>> file
>> "global/pg_filenode.map": Operation not permitted
>>         2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open 
>> file
>> "global/pg_filenode.map": Operation not permitted
>>         2024-03-21 11:50:07.398 CET [965] LOG:  could not open file
>> "postmaster.pid": Operation not permitted; continuing anyway
>>
>
> It's possible that some other process put a lock on these files. 
> Spotlight
> perhaps? Or TimeMachine?
>
>
>> I stopped and started the process, and it continued working again 
>> until
>> around 21:20, when the issue popped up again. I wasn't doing anything 
>> on
>> the machine at that time, so I have no idea what might have triggered 
>> it.
>>
>> Is there perhaps some feature that I can enable that logs which 
>> processes
>> use these 2 files?
>>
>
> IIRC, MacOS comes shipped with the lsof command, which will tell you 
> which
> processes have a given file open. See man lsof.
>
> -- 
> If you can't see the forest for the trees,
> Cut the trees and you'll see there is no forest.


I have tried the lsof command, but it returns no info about the 
postmaster.pid and global/pg_filenode.map files, so I take they are not 
open at that moment.

Spotlight indexing has been disabled, and TimeMachine takes no snapshots 
of the volume where the data resides.

Looking at the 2 machines that are having this issue (and the others 
that don't), I think it is somehow related to the following setup:
- macOS Sonoma (14.4 and 14.4.1)
- data directory on an external drive

That external drive (a Promise RAID system in one case, a simple SSD in 
the other) has the option "ignore ownership" on by default. I have tried 
disabling that, and updating the data directory to have owner + 
read/write access for the postgres user. It seemed to work at first, but 
just now the issue re-appeared again.

Any other suggestions?

Thanks,

Nick Renders

^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-03-29 03:25  Thomas Munro <[email protected]>
  parent: Nick Renders <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: Thomas Munro @ 2024-03-29 03:25 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; +Cc: [email protected]

On Fri, Mar 29, 2024 at 4:47 AM Nick Renders <[email protected]> wrote:
> Looking at the 2 machines that are having this issue (and the others that don't), I think it is somehow related to the following setup:
> - macOS Sonoma (14.4 and 14.4.1)
> - data directory on an external drive
>
> That external drive (a Promise RAID system in one case, a simple SSD in the other) has the option "ignore ownership" on by default. I have tried disabling that, and updating the data directory to have owner + read/write access for the postgres user. It seemed to work at first, but just now the issue re-appeared again.
>
> Any other suggestions?

I don't have any specific ideas and I have no idea what "ignore
ownership" means ... what kind of filesystem is running on it?  For
the simple SSD, is it directly connected, running a normal Apple APFS
filesystem, or something more complicated?

I wonder if this could be related to the change in 16 which started to
rename that file:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0...

Did you ever run 15 or earlier on that system?






^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-04-03 14:11  Nick Renders <[email protected]>
  parent: Thomas Munro <[email protected]>
  0 siblings, 1 reply; 15+ messages in thread

From: Nick Renders @ 2024-04-03 14:11 UTC (permalink / raw)
  To: [email protected]

On 29 Mar 2024, at 4:25, Thomas Munro wrote:
>
> I don't have any specific ideas and I have no idea what "ignore
> ownership" means ... what kind of filesystem is running on it?  For
> the simple SSD, is it directly connected, running a normal Apple APFS
> filesystem, or something more complicated?
>
> I wonder if this could be related to the change in 16 which started to
> rename that file:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0...
>
> Did you ever run 15 or earlier on that system?


In the macOS Finder, when you show the Info (command+i) for an external drive (or any partition that is not the boot drive), there is a checkbox "Ignore ownership on this volume" in the Permissions section. I think it is by default "on" for external drives.

The external SSD is an Orico drive that is connected with USB-C. It is initialised as a GUID Partition Map with a single AFPS partition.

We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came out last year. We didn't have any problems with 16 until recently, after upgrading to Sonoma.


Nick






^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-04-03 20:56  Thomas Munro <[email protected]>
  parent: Nick Renders <[email protected]>
  0 siblings, 0 replies; 15+ messages in thread

From: Thomas Munro @ 2024-04-03 20:56 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; +Cc: [email protected]

On Thu, Apr 4, 2024 at 3:11 AM Nick Renders <[email protected]> wrote:
> In the macOS Finder, when you show the Info (command+i) for an external drive (or any partition that is not the boot drive), there is a checkbox "Ignore ownership on this volume" in the Permissions section. I think it is by default "on" for external drives.

Hmm.  Sounds suspicious, but why would only this file be affected?

> The external SSD is an Orico drive that is connected with USB-C. It is initialised as a GUID Partition Map with a single AFPS partition.
>
> We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came out last year. We didn't have any problems with 16 until recently, after upgrading to Sonoma.

Interesting.  So the rename might have something to do with it, though
I don't have a theory for how,.

Can you show what the permissions and ownership looks like for pg_*
under there, normally, and once the system reaches this state?
Something like:

tmunro@phonebox postgresql % ls -slap pgdata/global/pg_*
16 -rw-------  1 tmunro  staff  8192  4 Apr 09:50 pgdata/global/pg_control
 8 -rw-------  1 tmunro  staff   524  4 Apr 09:50 pgdata/global/pg_filenode.map

I'm asking for "pg_*" because I want to see pg_control as well, to
understand the permissions for the other files in the cluster, and
because I want to see if there are any stray remnants of a temporary
file, which would be called pg_filenode.map.tmp.






^ permalink  raw  reply  [nested|flat] 15+ messages in thread

* Re: could not open file "global/pg_filenode.map": Operation not permitted
@ 2024-04-03 21:14  Thomas Munro <[email protected]>
  parent: Nick Renders <[email protected]>
  2 siblings, 0 replies; 15+ messages in thread

From: Thomas Munro @ 2024-04-03 21:14 UTC (permalink / raw)
  To: Nick Renders <[email protected]>; +Cc: [email protected]

On Sat, Mar 23, 2024 at 3:01 AM Nick Renders <[email protected]> wrote:
> We now have a second machine with this issue: it is an Intel Mac mini running macOS Sonoma (14.4) and PostgreSQL 16.2.
> This one only has a single Data directory, so there are no multiple instances running.

BTW if you're running databases on mains-powered Macs, I have a patch
that you might be interested in, which so far hasn't attracted any
reviews.  The short version is that I bet you can at least lose many
seconds of commits (because WAL doesn't really hit durable part of
disk), and possibly also fail to recover (pg_control hits disk before
WAL, not sure if this is really possible), if you yank the power and
you're using the default settings for wal_sync_method.  I'd like to
rationalise the settings for that stuff and make it safe by default.

I don't know anything about the USB storage pathway but I'd be
surprised if it's different.

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BF0EL4Up6yVYbbcWse4xKaqW4wc2xpw67Pq9FjmByWVg%4...






^ permalink  raw  reply  [nested|flat] 15+ messages in thread


end of thread, other threads:[~2024-04-03 21:14 UTC | newest]

Thread overview: 15+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-03-11 10:11 Re: could not open file "global/pg_filenode.map": Operation not permitted Nick Renders <[email protected]>
2024-03-11 11:10 ` Laurenz Albe <[email protected]>
2024-03-11 11:56 ` Stephen Frost <[email protected]>
2024-03-11 15:04 ` Adrian Klaver <[email protected]>
2024-03-12 09:57   ` Nick Renders <[email protected]>
2024-03-12 14:58     ` Adrian Klaver <[email protected]>
2024-03-13 11:35     ` Stephen Frost <[email protected]>
2024-03-22 14:01       ` Nick Renders <[email protected]>
2024-03-22 15:33         ` Adrian Klaver <[email protected]>
2024-03-22 16:00         ` Alban Hertroys <[email protected]>
2024-03-28 15:47           ` Nick Renders <[email protected]>
2024-03-29 03:25             ` Thomas Munro <[email protected]>
2024-04-03 14:11               ` Nick Renders <[email protected]>
2024-04-03 20:56                 ` Thomas Munro <[email protected]>
2024-04-03 21:14         ` Thomas Munro <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox