Date: Fri, 11 Apr 2025 13:38:26 +0200 (CEST)
From: Dimitrios Apostolou <jimis@gmx.net>
To: pgsql-general@lists.postgresql.org
Subject: Re: pg_restore causing ENOSPACE on the WAL partition
In-Reply-To: <076464ad-3d70-dd25-9e8f-e84f27decfba@gmx.net>
Message-ID: <738db902-9462-8566-3133-8f5f127352d0@gmx.net>
References: <076464ad-3d70-dd25-9e8f-e84f27decfba@gmx.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII; format=flowed
UI-OutboundReport: notjunk:1;M01:P0:uorlErIsibU=;0vpB2BKir6QFhUcgX8gd9gF/Wr6
 q6u1lq91jLmHR/F+OzGkQszvBLBJNk9pxQUY+xvJj0Jfi5DW9SjXaNlnRFl1TxI8ajSaJ6z0h
 5WcnO2Rq7Z8KQuyBBa15Fut+C+tK3UlSXRFG8i95ii0MrU6JZ27F6i1Ds/sdvH0ZmJKDLx/hU
 c9Bpykr3C4wWdcMOoOxMCaTS5+5zPl91ScRrMCF5jdBenGMVm23Ig8CL5YYy8+K3KEq5fMrht
 NB01MxiPu0GvPO3qbOnJI1Aek3EuJVz3OX4nuAJEnhIw8+fxiwUqgxePNhQg3AJzYu+BXk21F
 S7TijR7sLmk+dklV7ytXloftLxBqXMiwRvi0P7oMxnBa6o3jUtRht6/av5DmLPTtdKrP8MYLa
 Dabny48bIZM1mbhOfTtubTeGltQq2qHCm1EQyJ038O7OQEl5WJ0+i/p08zlMTfbOr7oJyi0VQ
 uCpn7DEQZV73c+6hnk+vsAMjpizLoOxUWjeKcLGIMxedcEIAoVWIQeyxgA9+M7Rfd9T/+U2Qi
 2vyMx3qrZI5T2Z3ZhyXaQUMh4wLqnGqgla1fb76Ks/3IN8xBN/S0NwNE8BN6J02ojSwAp7z5S
 pJ7VvXAUFBbZwHZnwt0/w2W7crNMPHN2tuvk241FdhVU5Gl2yKjiCQTtm4N2aOAT4LK4aW4mU
 fcy30Y74fA5hk2tDtoylIbwP5zicI35ygN2eBaE7tIwtQSUHwB+WWkAIu2/U9aZiHJVdV9zZS
 ihmJ5iG/Fxx+Xnbud+5kNxT4di6cqaiMtibpZwz4TVGlKP3mje8LRNwP36sidjO9cStGInCqH
 TMY7SjVnX1rzjPpwSE+QZX8/EMOrqNqRlFiuArh/mrfMuUttc2b9C8ZqArn+a3YCgxxnDMmyO
 82JVIKRNrGd7G+Fo7ZmA3cDCdiQRNBTc8dSaWlT3nQwnzrSOnZemcc16wzQltHGpetf3gucYw
 qibxoxHF3VGTPzAAylkXcQEJGJtwCJX5Tl3hvoDNXPV1vbuvzzBelzWPlfI5NVT4VFdeVhCYy
 IXeGp/64cfUy8IxCkwGiqlksWsWYigVZj0vCZ4Jghkxv3WNcHlIArwKwqxDqnIJpi4rD8QZUD
 lxcgh7ubJoMZi+HZz8cezdEBZPUX4LHsOO7X/X2sPSfKb3ycQSBWtfij7e3hEOcDMhUdbuN+D
 59XuVKKRdH5Je6s0CJn2qWiObP0Vv458MpJ7BUyqPGA8re9SFemkls+MZUfPqOQKl3yn/9nBw
 Y6mbNlMeMIzCR0K53ZSNHxh8lzVXbIiedZu5Uv7Z6EL9AmFHq6zNJUNsnXBXgrRnDW6m6nAxT
 /hHmDD+3E+NoM3Sk4aQ3RTnb7W/NDiWWvWOHgc+KbqgkdlPWEAORwRIOqtKVuZoTiHuLjQcUP
 jJUNywAzXV8JexZu/l5h+vfSfqqvFWm20OZDssgU8CwXWJsbodT0sPtEPjxb9z1AGjgSABug4
 MukvZWFKa/Gv3LnCDBQ8AntVb4QnS8MX2ntx+hq0Cwkyps/XVBNYRnnj5U9gpOmt+IjL6MopZ
 zWspNdRImGUzw+L5oPf7pJBX/InEKD3uMF1skpcDKDoXxtYDhPLq4teyf6a6lPOaKU4tbtb91
 1evYP0WXubeiufJzBgpLOSoeLBH8Zp1yyuuGmTl0rYzl1Omy93/OcP6wOjsN6G/gc3URRXwvG
 CJ27/ta5ywgGgTM1vpRciRupQkmd5MW2/BGY4bwu0yWEg7b64qawz+bj7+2L/2Ptknarynq7X
 LtTcpbY76pIr+yWMZoSYiItIwLXTXRVOIgLTtPSNMtV7Sc+WDf9o9eubkaJQXPgm2oJ1wF76f
 GFaVc9b/a/ryJfJYHGZq1Oot9KVZDjWqmNizL/V5ogvKauXMBLUasW6e/lTvw9nC6GEvDNSRx
 09TLH3VgdMMkIjZPmo2vhNWX4mRhlci9DRoxn6e18K/hjjbPC7IVibcfQY0jq5HkHnvcFORHK
 y9yqCWAo/KhWP6qQ1IZh1jeVRcjibf/kGj/rnSfJKg0LaTXk1KsRMRsZmVIqpIRjMCtBlAmM4
 6c2Ew82fn3RWF3kf/9bJK8Z+GW0XLMT47WE7slFi7CfgAq6e/MdHucHgvDgs49H8PBOSQHTMQ
 nqdAkAE+ybqJGqGysLHKqH9uA9lJco6EITTHwf5lhWiMkypq6Kl0MSHN/l9nnmVA/VLTqfaAU
 bGVT0jYT7BHLYUkA2BHmBJnatbHRa2cBL4iuqrN/jWg3UATVldXoQKAsI0pNs+2bWEakq4909
 bOskqzXg/j9laSopUPmr5b5cHeEjn0tX0rQTTY4ZQ+RrBSRApWiSAy+waU1V1hW9dgM+SBOmb
 n8ZUHnHK2LR+Mox2AJlJz65b26vceqywEed5RARoNAp+v24SicSuHDjft8dI/88uEmRCA4x1w
 95WzSc9dDo9YVusAVJ30FqtiSZmm4+4uFxwAwPb5+7fpXIanmM82KU0MS9SZYjrIwbCEyHb8s
 squtK4F9rAb1W4qmrsihjAc3rbi7/a4XLZhOZJHRSP5ZvAxX2lgrnd31dcOYzxpR8xoUulXCy
 VNlf6lwKuRZTgF5xdsVdGv23sBjlDEuG9oz3HUDZ40BQOq3VmRf9zpKUfYsSNWDQ5NNDxwAWm
 uZDeOUjMbFWz9V4oRLc18SMGu4KYMKQ7/eFbHJlZ0XMgSH0fVWdG8y3vC1qQNxH/yDEdUCufC
 Guy3mjrRnFsRtHdHZ0gxJXWQ6xbS5Ex9R1uHSQAxaVNK2IdUYwuGGKbGI3qKl/EgnBsz0Pr23
 sNF7yUCDDsQPTaab8+GwbrClE8xR7Cnkym1Dps4rVcz6cSlrOAsjYxVcaiHLr4mMrN5mEtdNX
 O2KHitsWOBbFZOC1yW40MIdnYBDYs6uv8EV8EpD3nWMF4Ly5JddaDqY2XHyyjJ5odmUnz5pz9
 jdog+ZfXnXi0Myj3jm2ilxeeeqC87y6VDJ7tX4w45sLVEQ1Hk+/0XzlZR60rVPX0wCjJHeK2Q
 1GGaTGaAtQ==
Content-Transfer-Encoding: quoted-printable
Archived-At: <https://www.postgresql.org/message-id/738db902-9462-8566-3133-8f5f127352d0%40gmx.net>
Precedence: bulk

Happened again even with max_wal_size=3D32GB on the 128GB WAL partition. A=
t
this point I'm quite sure what happens:

+ a checkpoint is going on for several minutes (because of I/O contention
   on the archival drive)

+ meanwhile data keeps coming in through several processes of pg_restore

+ data is coming in much faster because the WAL partition is high perf

+ The checkpoint makes it before full and finishes and frees up 32GB
   (max_wal_size) from the WAL drive. Lets say the WAL partition is now at
   80/128GB full.

+ Immediately another checkpoint starts. This one has to checkpoint 80GB
   of data.

+ This doesn't make it in time and the WAL partition fills up.

+ Painful recovery follows, the database server won't even succeed in
   doing recovery when starting up...


I'm surprised that there is no mechanism for the backends to block while
the WAL is overflowing.

Dimitris


On Thu, 10 Apr 2025, Dimitrios Apostolou wrote:

> Hello list,
>
> last night I got ENOSPACE on the WAL partition while running a huge
> pg_restore on an empty and idle database.
> The checkpoint that started 7 minutes earlier never finished:
>
>   04:31:09 LOG:  checkpoint starting: wal
>   ...
>   04:38:04 PANIC:  could not write to file "pg_wal/xlogtemp.546204": No
>   space left on device
>           : CONTEXT:  COPY table_partition_123, line 120872534
>
> This is most likely because most of the data is written to a very slow
> "archival" tablespace. No I/O errors on the system logs, I know things g=
o
> slow on that device (iSCSI drive over a shared spinning-disks pool and b=
usy
> network).
>
> This happened even though I'm keeping the WAL in its own partition (128G=
B
> partition size), with a lot of free space above max_wal_size (64GB). Som=
ehow
> it managed to grow above max_wal_size and fill 100% of the partition.
>
> I'm running latest PostgreSQL 17 and the settings have been temporarily
> tweaked for fastest pg_restore:
>
> max_wal_size=3D64GB
> max_replication_slots =3D 0
> max_logical_replication_workers =3D 0
> max_wal_senders =3D 0
> wal_level =3D minimal
> autovacuum =3D off
>
> Several things seem to have gone wrong here. Questions:
>
> + The WAL partition is much faster than the archival tablespace. Am I in
>   constant danger of overruning max_wal_size? How to make 100% sure this
>   never happens again?
>
> + After recovery, with the database idling, I notice that WAL space usag=
e
>   is constant at 64GB. Why doesn't it free up space down to min_wal_size
>   (1GB)?
>
> + I just created a 13GB zstd-compressed tarball of those 64GB WAL
>   files. This indicates that the files are compressible despite using
>   wal_compression=3Dzstd setting. Could it be that postgres ignores the =
flag
>   and does not compress the WAL? How to check?
>
> + I'm using parallel pg_restore --data-only, can't avoid that for now.
>   Even though all the tables are empty (I truncated everything before
>   starting pg_restore), I can't find a way to avoid going through the WA=
L.
>   Ideas?
>
>
> Thanks in advance,
> Dimitris
>
>
>