public inbox for [email protected]  
help / color / mirror / Atom feed
pg_restore causing ENOSPACE on the WAL partition
2+ messages / 1 participants
[nested] [flat]

* pg_restore causing ENOSPACE on the WAL partition
@ 2025-04-10 11:19 Dimitrios Apostolou <[email protected]>
  2025-04-10 12:11 ` Re: pg_restore causing ENOSPACE on the WAL partition Dimitrios Apostolou <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Dimitrios Apostolou @ 2025-04-10 11:19 UTC (permalink / raw)
  To: [email protected]

Hello list,

last night I got ENOSPACE on the WAL partition while running a huge
pg_restore on an empty and idle database.
The checkpoint that started 7 minutes earlier never finished:

   04:31:09 LOG:  checkpoint starting: wal
   ...
   04:38:04 PANIC:  could not write to file "pg_wal/xlogtemp.546204": No space left on device
            :CONTEXT:  COPY table_partition_123, line 120872534

This is most likely because most of the data is written to a very slow
"archival" tablespace. No I/O errors on the system logs, I know things go
slow on that device (iSCSI drive over a shared spinning-disks pool and
busy network).

This happened even though I'm keeping the WAL in its own partition (128GB
partition size), with a lot of free space above max_wal_size (64GB).
Somehow it managed to grow above max_wal_size and fill 100% of the
partition.

I'm running latest PostgreSQL 17 and the settings have been temporarily
tweaked for fastest pg_restore:

max_wal_size=64GB
max_replication_slots = 0
max_logical_replication_workers = 0
max_wal_senders = 0
wal_level = minimal
autovacuum = off

Several things seem to have gone wrong here. Questions:

+ The WAL partition is much faster than the archival tablespace. Am I in
   constant danger of overruning max_wal_size? How to make 100% sure this
   never happens again?

+ After recovery, with the database idling, I notice that WAL space usage
   is constant at 64GB. Why doesn't it free up space down to min_wal_size (1GB)?

+ I just created a 13GB zstd-compressed tarball of those 64GB WAL
   files. This indicates that the files are compressible despite using
   wal_compression=zstd setting. Could it be that postgres ignores the flag
   and does not compress the WAL? How to check?

+ I'm using parallel pg_restore --data-only, can't avoid that for now.
   Even though all the tables are empty (I truncated everything before
   starting pg_restore), I can't find a way to avoid going through the WAL.
   Ideas?


Thanks in advance,
Dimitris







^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: pg_restore causing ENOSPACE on the WAL partition
  2025-04-10 11:19 pg_restore causing ENOSPACE on the WAL partition Dimitrios Apostolou <[email protected]>
@ 2025-04-10 12:11 ` Dimitrios Apostolou <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Dimitrios Apostolou @ 2025-04-10 12:11 UTC (permalink / raw)
  To: [email protected]

Forgot an important setting I have during pg_restore:

On Thu, 10 Apr 2025, Dimitrios Apostolou wrote:

> max_wal_size=64GB
> max_replication_slots = 0
> max_logical_replication_workers = 0
> max_wal_senders = 0
> wal_level = minimal
> autovacuum = off

checkpoint_completion_target = 0

so the checkpoint should finish ASAP and any delays are purely because of
I/O limitations. And for the sake of completion:

checkpoint_timeout = 600s


Regards,
Dimitris






^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2025-04-10 12:11 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-04-10 11:19 pg_restore causing ENOSPACE on the WAL partition Dimitrios Apostolou <[email protected]>
2025-04-10 12:11 ` Dimitrios Apostolou <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox