Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCipM-00BUOm-G0 for pgsql-admin@arkaria.postgresql.org; Sun, 17 Nov 2024 17:12:48 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tCipJ-000XIs-M2 for pgsql-admin@arkaria.postgresql.org; Sun, 17 Nov 2024 17:12:46 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCipJ-000XIk-7p for pgsql-admin@lists.postgresql.org; Sun, 17 Nov 2024 17:12:45 +0000 Received: from mail-ed1-x531.google.com ([2a00:1450:4864:20::531]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tCipG-002RQ3-4R for pgsql-admin@postgresql.org; Sun, 17 Nov 2024 17:12:44 +0000 Received: by mail-ed1-x531.google.com with SMTP id 4fb4d7f45d1cf-5cef772621eso3514410a12.3 for ; Sun, 17 Nov 2024 09:12:43 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731863561; x=1732468361; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=jC0faYpvq8FTwe7SYILqgcc8T6mQtwEaJ26Ho9Aa96Q=; b=dSTWxupJsP8qpva2KVzptQkSarVy1u6eMBn1/EgUI3DlLD076miV/1Uws9vntY/F+K fKnLkg63NSrqjex2xmRiro6iAs2auvpAnU61Hzgsp0a+M7Fdifa7D8KDQprJsVz06Fmm gtUmkCjg1Tezb6918t0EEnBuunvlsvkyUuvnVhYWzMQGeeAKrdau0mDttgDViPd0yeXx sKf9XicWgHDlXiVVMg0yNQg2tIF6EXGlkhPT0RoeWO2iesYHmh3h/1SF39vxmo1LuA4T y91LZ5UzMfJxCWYphs/VMjBgr8k+LSdJNJCvsWP93JVhRIdOZR1BvP3An/Ljlbz1Z6Lo BBbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731863561; x=1732468361; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=jC0faYpvq8FTwe7SYILqgcc8T6mQtwEaJ26Ho9Aa96Q=; b=m3BhWqwDK/Q8jwIEQwxIXdcfvG5Qv+INcO+N3DNvMr+4lzBkgfA9diEGUy55e7MXlK 2uxElxlvjmhgkAeF4H2ZLhWt3Cj3xrNP0iGqhp/we5C6erpJmYKjWBmBlYOHqfl0a9+o MtHmK1Ae+Ra3hkI4YBU+aT/+kgoemk7a4Jsyg0707Kn9VMZWp8XGogOjYvUwXKhryAvf vQFbzZiWrr2obAZe3R1j5EljFBzMZXYrc3Kth2e0yHHX4jA2HYTHnFCKoQu6ufpxU5hP VP8vIzK8cmS7FTKEUf7pUf/iz646VoV7zfP/7xN+muDDNqlrqdNtIHozLqUIqhEK0nig 4H3Q== X-Gm-Message-State: AOJu0YxAbt6p2s/cEmpsxYWaJoSH/RZE0T+VJAVZV5BK8O5ie51Hehya mSPvx8o5jY33BaZPXgKRk++K2l8KIzj84U8syE7s1V9NjItIwGDX1kqWFgzqXanhtB9Cm/xJAAz 3O+xeRR4mgfuzfMb7I5TIzFf7lJ0= X-Google-Smtp-Source: AGHT+IGkzE9YMDn7fE0XhWCNVDIvy5PfvVXfBW04fE4IGPMqDbtpCiOOHjV7WZBykV8ttSJoUXz4z+cEpQ7DbN3Kj5E= X-Received: by 2002:a17:907:7ea5:b0:a9e:85f8:2a35 with SMTP id a640c23a62f3a-aa483509986mr762843366b.37.1731863561244; Sun, 17 Nov 2024 09:12:41 -0800 (PST) MIME-Version: 1.0 References: <14770c231bf27ea6d22376395ac8f02e41462ed5.camel@cybertec.at> In-Reply-To: <14770c231bf27ea6d22376395ac8f02e41462ed5.camel@cybertec.at> From: Wells Oliver Date: Sun, 17 Nov 2024 09:12:05 -0800 Message-ID: Subject: Re: RDS restore failed due to WAL log and disk space-- any tidy fixes? To: Laurenz Albe Cc: pgsql-admin Content-Type: multipart/alternative; boundary="0000000000005e173d06271ee839" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000005e173d06271ee839 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Interesting. I am migrating a pg_dump archive to a new server, in a single go. Does it make sense to disable (or speed up?) WAL archiving during the restore, then reenable it after the restore so a future replica could work? What would be the steps here? Would disabling or "speeding up" be faster? max_slot_wal_keep_size is -1 at the moment so I think that's why it kept a ton of WAL and ran out of space. On Sun, Nov 17, 2024 at 7:41=E2=80=AFAM Laurenz Albe wrote: > On Sat, 2024-11-16 at 16:33 -0800, Wells Oliver wrote: > > I provisioned an RDS instance with 2500GB space and began the restore o= f > a database I know to be about 1750 GB using 16 jobs. > > > > Unfortunately, it died very near the end when it ran out of disk space > due to WAL log usage. Lots of: > > > > 2024-11-17 00:07:09 UTC::@:[19861]:PANIC: could not write to file > "pg_wal/xlogtemp.19861": No space left on device > > > > > > And then kaboom. > > > > I'm wondering what my course of action should be. Can I disable/reduce > WAL during a restore? > > wal_level is set to replica, can this temporarily be set to minimal? > Should I just eat the extra > > costs to add headroom for the WAL? Would using fewer jobs during a > restore reduce the amount of WAL > > created? > > If you are using minimal WAL logging and you restore the dump in a single > transaction, you > should see way less WAL generated, because data inserted into the table i= n > the same transaction > as the CREATE TABLE statement need not be WAL logged. > > But you might more easily solve the problem by speeding up or disabling > the WAL archiver, > so that PostgreSQL removes old WAL after the next checkpoint. > > Yours, > Laurenz Albe > --=20 Wells Oliver wells.oliver@gmail.com --0000000000005e173d06271ee839 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Int= eresting. I am migrating a pg_dump archive to a new server,=C2=A0in a singl= e go. Does it make=C2=A0sense to disable (or speed up?) WAL archiving durin= g the restore, then reenable it after the restore so a future replica could= work? What would be the steps here? Would disabling or "speeding up&q= uot; be faster?

max_slot_= wal_keep_size is -1 at the moment so I think that's why it kept a ton o= f WAL and ran out of space.

On Sun, Nov 17, 2024 at 7:41=E2=80=AFAM La= urenz Albe <laurenz.albe@cyb= ertec.at> wrote:
On Sat, 2024-11-16 at 16:33 -0800, Wells Oliver wrote:
> I provisioned an RDS instance with 2500GB space and began the restore = of a database I know to be about 1750 GB using 16 jobs.
>
> Unfortunately, it died very near the end when it ran out of disk space= due to WAL log usage. Lots of:
>
> 2024-11-17 00:07:09 UTC::@:[19861]:PANIC:=C2=A0 could not write to fil= e "pg_wal/xlogtemp.19861": No space left on device
>
>
> And then kaboom.
>
> I'm wondering what my course of action should be. Can I disable/re= duce WAL during a restore?
> wal_level is set to replica, can this temporarily be set to minimal? S= hould I just eat the extra
> costs to add headroom for the WAL? Would using fewer jobs during a res= tore reduce the amount of WAL
> created?

If you are using minimal WAL logging and you restore the dump in a single t= ransaction, you
should see way less WAL generated, because data inserted into the table in = the same transaction
as the CREATE TABLE statement need not be WAL logged.

But you might more easily solve the problem by speeding up or disabling the= WAL archiver,
so that PostgreSQL removes old WAL after the next checkpoint.

Yours,
Laurenz Albe


--
--0000000000005e173d06271ee839--