Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCjBJ-00Bfkq-5H for pgsql-admin@arkaria.postgresql.org; Sun, 17 Nov 2024 17:35:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tCjBF-000sjb-N1 for pgsql-admin@arkaria.postgresql.org; Sun, 17 Nov 2024 17:35:26 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCjBF-000sjT-AB for pgsql-admin@lists.postgresql.org; Sun, 17 Nov 2024 17:35:25 +0000 Received: from mail-ej1-x62a.google.com ([2a00:1450:4864:20::62a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tCjB8-002RbP-AO for pgsql-admin@postgresql.org; Sun, 17 Nov 2024 17:35:25 +0000 Received: by mail-ej1-x62a.google.com with SMTP id a640c23a62f3a-a9ed7d8c86cso478663866b.2 for ; Sun, 17 Nov 2024 09:35:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731864918; x=1732469718; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=04r422g17h4r+hpd8hxNZRacfb9sAaBuR+fkCnDBSSg=; b=UjCzgVC4yzYy3xu5Llqu8St/DISgo3ueUYElriI1gJkWTOoXOD6b635dAp6wiT5XvP IRED2fE6BH4UMf2xYjRrPm6tWDmVD+TAMe6kCIoXJrBvy8tqqUjAZNPZHqISsyeoFR/K BGSiYDyE+LTTH8hJbgDjmjxOtv1uX0fsuGlItjB1MFV+NiaNC74TkPQqa96HDHcSx5bk 5zeaRCD5s5y+qTbhNEJ3PpF3DJ6k7zLsJISw4/XM60pMv2/qEq0SHXK8OfsRQ8DV3luE 8hIeV5JgK7GMSHI0jElFWZSvbDMdGnckxuerO6qcKMtUeJQgaHedvHw63OIZw7dPzvmY brjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731864918; x=1732469718; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=04r422g17h4r+hpd8hxNZRacfb9sAaBuR+fkCnDBSSg=; b=YlQxXmSiowToWggoz/Ox8HH39sLejWQBh2HxVXItspoFV2DnbKbZGK8KOOfYZLRT5H u9XSDY6DEbX3QS7tcbZe+9Bgemzs6JgKafieuvKN6S/Skio8+trd4XQApmg3VhbEfzuJ q5JG+tO4op6W+aEN0emoCSgzvFCMKY8bgi69OPfjjpBbQDVqGXcJ+1YPZs6jYcnck5jb Km7nTQDEpu5Y6XPXrULQXccfGo/nUi4f8GR8CbiwVdYPH8JM24u1TyEFv9vFZ8Fo2sZ5 Kg4sCY+nGvw9PfGXuursdZWQuzoZDz219+kU9e/Vp7R4s++VSanLKd1NfWrdRsnIYwwa ykAQ== X-Gm-Message-State: AOJu0Yz0quCenTfFFuxGQrZ7rK5iVZETYvxfM9j43ro7iPlS5KVKCZLE fT7oqta7QKNNOwN6R9jZEazQstmb4dCWPR47I+YpPyXrvtZoCQG2JxojFYYdD9qb+01j0CYoFpK IukiXGlL6+EQ7wQXHfe3vN2fPUMstUXrL X-Google-Smtp-Source: AGHT+IHEW/bE65nHG/aAsnNNlg+ysrMCWJnaMlF0SpYrBfz1wPQS0Kv1v5q+HT//KfLABfL3XCP1Y0Xe8F7yIbetkig= X-Received: by 2002:a17:906:6a20:b0:a99:e4db:4909 with SMTP id a640c23a62f3a-aa48340fc2dmr1011807966b.15.1731864918109; Sun, 17 Nov 2024 09:35:18 -0800 (PST) MIME-Version: 1.0 References: <14770c231bf27ea6d22376395ac8f02e41462ed5.camel@cybertec.at> In-Reply-To: From: Wells Oliver Date: Sun, 17 Nov 2024 09:34:42 -0800 Message-ID: Subject: Re: RDS restore failed due to WAL log and disk space-- any tidy fixes? To: Ron Johnson Cc: pgsql-admin Content-Type: multipart/alternative; boundary="0000000000003e33cd06271f3908" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000003e33cd06271f3908 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Would setting max_slot_wal_keep_size to something like 1GB ensure that WAL logs don't cause runaway disk use during restore? It's currently -1... On Sun, Nov 17, 2024 at 9:31=E2=80=AFAM Wells Oliver wrote: > Actually, in RDS it seems you cannot set archive_mode either. > > On Sun, Nov 17, 2024 at 9:23=E2=80=AFAM Wells Oliver > wrote: > >> It does. I think it uses WAL behind the scenes. In RDS unfortunately >> cannot set wal_level, but you can set archive_mode. >> >> On Sun, Nov 17, 2024 at 9:21=E2=80=AFAM Ron Johnson >> wrote: >> >>> Doesn't RDS have its own replication? >>> >>> Anyway, for pg_restore, I'd absolutely set archive_mode=3Doff >>> and wal_level=3Dminimal, then set them to their production values when = it's >>> finished. >>> >>> On Sun, Nov 17, 2024 at 12:12=E2=80=AFPM Wells Oliver >>> wrote: >>> >>>> Interesting. I am migrating a pg_dump archive to a new server, in a >>>> single go. Does it make sense to disable (or speed up?) WAL archiving >>>> during the restore, then reenable it after the restore so a future rep= lica >>>> could work? What would be the steps here? Would disabling or "speeding= up" >>>> be faster? >>>> >>>> max_slot_wal_keep_size is -1 at the moment so I think that's why it >>>> kept a ton of WAL and ran out of space. >>>> >>>> On Sun, Nov 17, 2024 at 7:41=E2=80=AFAM Laurenz Albe >>>> wrote: >>>> >>>>> On Sat, 2024-11-16 at 16:33 -0800, Wells Oliver wrote: >>>>> > I provisioned an RDS instance with 2500GB space and began the >>>>> restore of a database I know to be about 1750 GB using 16 jobs. >>>>> > >>>>> > Unfortunately, it died very near the end when it ran out of disk >>>>> space due to WAL log usage. Lots of: >>>>> > >>>>> > 2024-11-17 00:07:09 UTC::@:[19861]:PANIC: could not write to file >>>>> "pg_wal/xlogtemp.19861": No space left on device >>>>> > >>>>> > >>>>> > And then kaboom. >>>>> > >>>>> > I'm wondering what my course of action should be. Can I >>>>> disable/reduce WAL during a restore? >>>>> > wal_level is set to replica, can this temporarily be set to minimal= ? >>>>> Should I just eat the extra >>>>> > costs to add headroom for the WAL? Would using fewer jobs during a >>>>> restore reduce the amount of WAL >>>>> > created? >>>>> >>>>> If you are using minimal WAL logging and you restore the dump in a >>>>> single transaction, you >>>>> should see way less WAL generated, because data inserted into the >>>>> table in the same transaction >>>>> as the CREATE TABLE statement need not be WAL logged. >>>>> >>>>> But you might more easily solve the problem by speeding up or >>>>> disabling the WAL archiver, >>>>> so that PostgreSQL removes old WAL after the next checkpoint. >>>>> >>>>> Yours, >>>>> Laurenz Albe >>>>> >>>> >>>> >>>> -- >>>> Wells Oliver >>>> wells.oliver@gmail.com >>>> >>> >>> >>> -- >>> Death to , and butter sauce. >>> Don't boil me, I'm still alive. >>> lobster! >>> >> >> >> -- >> Wells Oliver >> wells.oliver@gmail.com >> > > > -- > Wells Oliver > wells.oliver@gmail.com > --=20 Wells Oliver wells.oliver@gmail.com --0000000000003e33cd06271f3908 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Wou= ld setting=C2=A0max_slot_wal_keep_size to something like 1GB ensure that WA= L logs don't cause runaway disk use during restore? It's currently = -1...

On Sun, Nov 17, 2024 at 9:31=E2=80=AFAM Wells Oliver <wells.oliver@gmail.com> wrote:
<= div class=3D"gmail_default" style=3D"font-size:small">Actually, in RDS it s= eems you cannot set=C2=A0archive_mode either.

On Sun, Nov 17, 2024 at = 9:23=E2=80=AFAM Wells Oliver <wells.oliver@gmail.com> wrote:
It does. I think it uses WAL behind th= e scenes. In RDS unfortunately cannot set wal_level, but you can set archiv= e_mode.

On Sun, Nov 17, 2024 at 9:21=E2=80=AFAM Ron Johnson <ronljohnsonjr@gmail.= com> wrote:
Doesn't RDS have its own replication?

Anyway, for pg_restore, I'd absolutely set=C2=A0archiv= e_mode=3Doff and=C2=A0wal_level=3Dminimal, then set them to their productio= n values when it's finished.

On Su= n, Nov 17, 2024 at 12:12=E2=80=AFPM Wells Oliver <wells.oliver@gmail.com> wrote:=
Interesting. I = am migrating a pg_dump archive to a new server,=C2=A0in a single go. Does i= t make=C2=A0sense to disable (or speed up?) WAL archiving during the restor= e, then reenable it after the restore so a future replica could work? What = would be the steps here? Would disabling or "speeding up" be fast= er?

max_slot_wal_keep_size is -1 at the moment so I think that's why = it kept a ton of WAL and ran out of space.

On Sun, Nov 17, 2024 at 7:4= 1=E2=80=AFAM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
On Sat, 2024-11-16 at 16:33 -0800,= Wells Oliver wrote:
> I provisioned an RDS instance with 2500GB space and began the restore = of a database I know to be about 1750 GB using 16 jobs.
>
> Unfortunately, it died very near the end when it ran out of disk space= due to WAL log usage. Lots of:
>
> 2024-11-17 00:07:09 UTC::@:[19861]:PANIC:=C2=A0 could not write to fil= e "pg_wal/xlogtemp.19861": No space left on device
>
>
> And then kaboom.
>
> I'm wondering what my course of action should be. Can I disable/re= duce WAL during a restore?
> wal_level is set to replica, can this temporarily be set to minimal? S= hould I just eat the extra
> costs to add headroom for the WAL? Would using fewer jobs during a res= tore reduce the amount of WAL
> created?

If you are using minimal WAL logging and you restore the dump in a single t= ransaction, you
should see way less WAL generated, because data inserted into the table in = the same transaction
as the CREATE TABLE statement need not be WAL logged.

But you might more easily solve the problem by speeding up or disabling the= WAL archiver,
so that PostgreSQL removes old WAL after the next checkpoint.

Yours,
Laurenz Albe


--


--
Death to <Redacted>, and butter sauce.Don't boil me, I'm still alive.
<Redacted> lobs= ter!


--


--


--
--0000000000003e33cd06271f3908--