Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCixe-00BYwK-Bl for pgsql-admin@arkaria.postgresql.org; Sun, 17 Nov 2024 17:21:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tCixb-000eQj-Q2 for pgsql-admin@arkaria.postgresql.org; Sun, 17 Nov 2024 17:21:20 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCixb-000eQb-9a for pgsql-admin@lists.postgresql.org; Sun, 17 Nov 2024 17:21:20 +0000 Received: from mail-ot1-x335.google.com ([2607:f8b0:4864:20::335]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tCixZ-002NoF-BW for pgsql-admin@postgresql.org; Sun, 17 Nov 2024 17:21:18 +0000 Received: by mail-ot1-x335.google.com with SMTP id 46e09a7af769-71811c7eb8dso376077a34.0 for ; Sun, 17 Nov 2024 09:21:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731864076; x=1732468876; darn=postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=ebTcnxOybS2IpAy8KbsO2t5x6TBZZvoyIKmkGzUn/fA=; b=QbQvH4X5uaYab1feIciLwFvQQbr2yHa6mEhxQz/sCfXvOvkeneBydSbyNIWqNHZ3T1 v5Dvdz6rcw870Wjnwk+7RqL66J29pBtYvdfua1hAyPN6S3GvZofryrJPONeYXa66ShIt 1B3+mTEdIUyMxBdIjA++Y/r2lnrMqlBkfDrb3JRpXWIYHlJsVoB9kEmKtVnyqYxybeEm g2Lwyx6WQm8HjDsECMY8U4xRDaJkNOe31l3Cy/fwe6qOydUf21aqZBFaQFRw96sAqlMG o0x3gLs48HPu5uuugT7aW+OiGyhmyGDEy1SmFachdCrwGl8B9tQGi4kQGKp1f7sPDtDT E8gQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731864076; x=1732468876; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ebTcnxOybS2IpAy8KbsO2t5x6TBZZvoyIKmkGzUn/fA=; b=RYlY/dnAGu9SXWg3Cl+c75Mwzrqo7q5fPD9SF5qqpawmXGEjL6lUzyuVcbG7sAe89E SqCJ0569khJ7fHznK/rSV2zsadtYa+AkKEG8C6z1O4sXR0YSEZdgsb2lBL7UgYgdtdjA B7+BBL77JrhzT9if/58+b2IWgftsQ3NZ/QwT/lGZbhktGzfAy3RmgFfmMC4jUkUncumq 1dmLS8gxYYMo7O8plsmj+55SXtNiCEHmtn903dkgsba7jXvOqDt08B2Xnse8//CSRzyS yipsTb1KNXsi6XZiMsgc5GbJSCEY3rWUvmdeIqkkOYD2CsP5N6Zt91ib0PmfOn6Qsj2H v12w== X-Gm-Message-State: AOJu0Yyz9RB9FO/pIjO1EAB7EpoHkxV2tL6unxP8ZYwb/vvJD8pzOZlF V9Tqi4z9uYT3MswEsiqilRJPDI0No7BDuV+eBulyyzGvHK6Rfvzr0wwI4vR8wavPSmDLSde4TiH C996CtU5tXJG7ZvQOviHjZWxQPjVA0DG5 X-Google-Smtp-Source: AGHT+IGcqCH9F9s6qwYcnQVBqOkQaleipwNZ1a+ugl8pkTCOwIY56NdTZrCWmfJ9iAys7BIf8JoBYHoUIbCIa2yVFLU= X-Received: by 2002:a05:6830:6811:b0:710:ec4a:b394 with SMTP id 46e09a7af769-71a77a075f2mr8422034a34.29.1731864076047; Sun, 17 Nov 2024 09:21:16 -0800 (PST) MIME-Version: 1.0 References: <14770c231bf27ea6d22376395ac8f02e41462ed5.camel@cybertec.at> In-Reply-To: From: Ron Johnson Date: Sun, 17 Nov 2024 12:21:05 -0500 Message-ID: Subject: Re: RDS restore failed due to WAL log and disk space-- any tidy fixes? To: pgsql-admin Content-Type: multipart/alternative; boundary="0000000000000d5bfe06271f0700" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000d5bfe06271f0700 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Doesn't RDS have its own replication? Anyway, for pg_restore, I'd absolutely set archive_mode=3Doff and wal_level=3Dminimal, then set them to their production values when it's finished. On Sun, Nov 17, 2024 at 12:12=E2=80=AFPM Wells Oliver wrote: > Interesting. I am migrating a pg_dump archive to a new server, in a singl= e > go. Does it make sense to disable (or speed up?) WAL archiving during the > restore, then reenable it after the restore so a future replica could wor= k? > What would be the steps here? Would disabling or "speeding up" be faster? > > max_slot_wal_keep_size is -1 at the moment so I think that's why it kept = a > ton of WAL and ran out of space. > > On Sun, Nov 17, 2024 at 7:41=E2=80=AFAM Laurenz Albe > wrote: > >> On Sat, 2024-11-16 at 16:33 -0800, Wells Oliver wrote: >> > I provisioned an RDS instance with 2500GB space and began the restore >> of a database I know to be about 1750 GB using 16 jobs. >> > >> > Unfortunately, it died very near the end when it ran out of disk space >> due to WAL log usage. Lots of: >> > >> > 2024-11-17 00:07:09 UTC::@:[19861]:PANIC: could not write to file >> "pg_wal/xlogtemp.19861": No space left on device >> > >> > >> > And then kaboom. >> > >> > I'm wondering what my course of action should be. Can I disable/reduce >> WAL during a restore? >> > wal_level is set to replica, can this temporarily be set to minimal? >> Should I just eat the extra >> > costs to add headroom for the WAL? Would using fewer jobs during a >> restore reduce the amount of WAL >> > created? >> >> If you are using minimal WAL logging and you restore the dump in a singl= e >> transaction, you >> should see way less WAL generated, because data inserted into the table >> in the same transaction >> as the CREATE TABLE statement need not be WAL logged. >> >> But you might more easily solve the problem by speeding up or disabling >> the WAL archiver, >> so that PostgreSQL removes old WAL after the next checkpoint. >> >> Yours, >> Laurenz Albe >> > > > -- > Wells Oliver > wells.oliver@gmail.com > --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --0000000000000d5bfe06271f0700 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Doesn't RDS have its own replication?
<= br>
Anyway, for pg_restore, I'd absolutely set=C2=A0archive_m= ode=3Doff and=C2=A0wal_level=3Dminimal, then set them to their production v= alues when it's finished.

On Sun, = Nov 17, 2024 at 12:12=E2=80=AFPM Wells Oliver <wells.oliver@gmail.com> wrote:
Interesting. I am migrating a pg_du= mp archive to a new server,=C2=A0in a single go. Does it make=C2=A0sense to= disable (or speed up?) WAL archiving during the restore, then reenable it = after the restore so a future replica could work? What would be the steps h= ere? Would disabling or "speeding up" be faster?

max_slot_wal_= keep_size is -1 at the moment so I think that's why it kept a ton of WA= L and ran out of space.

On Sun, Nov 17, 2024 at 7:41=E2=80=AFAM Laure= nz Albe <l= aurenz.albe@cybertec.at> wrote:
On Sat, 2024-11-16 at 16:33 -0800, Wells Oliver wrot= e:
> I provisioned an RDS instance with 2500GB space and began the restore = of a database I know to be about 1750 GB using 16 jobs.
>
> Unfortunately, it died very near the end when it ran out of disk space= due to WAL log usage. Lots of:
>
> 2024-11-17 00:07:09 UTC::@:[19861]:PANIC:=C2=A0 could not write to fil= e "pg_wal/xlogtemp.19861": No space left on device
>
>
> And then kaboom.
>
> I'm wondering what my course of action should be. Can I disable/re= duce WAL during a restore?
> wal_level is set to replica, can this temporarily be set to minimal? S= hould I just eat the extra
> costs to add headroom for the WAL? Would using fewer jobs during a res= tore reduce the amount of WAL
> created?

If you are using minimal WAL logging and you restore the dump in a single t= ransaction, you
should see way less WAL generated, because data inserted into the table in = the same transaction
as the CREATE TABLE statement need not be WAL logged.

But you might more easily solve the problem by speeding up or disabling the= WAL archiver,
so that PostgreSQL removes old WAL after the next checkpoint.

Yours,
Laurenz Albe


--


--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!
--0000000000000d5bfe06271f0700--