public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrian Klaver <[email protected]>
To: Dimitrios Apostolou <[email protected]>
To: [email protected]
Subject: Re: Experience and feedback on pg_restore --data-only
Date: Sun, 23 Mar 2025 08:37:11 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

On 3/20/25 15:48, Dimitrios Apostolou wrote:
> Rationale:
> 
> When restoring a backup in an emergency situation, it's fine to run
> pg_restore as superuser and get an exact replica of the dumped db.
> AFAICT pg_restore (without --data-only) is optimised for such case.
> 
> But pg_dump/restore can be used as a generic data-copying utility, and in
> those cases it makes often sense to get rid of the churn and create a
> clean database by running the SQL schema definition from version control,
> and then copy the data for only the tables created.
> 
> For this case, I choose to run pg_restore --data-only, and run it as the
> user who owns the database (dbowner), not as a superuser, in order to
> avoid changes being introduced under the radar.
> 
> Things that made my life hard:
> 
> * plenty of permission denials for both ALTER OWNER or SET SESSION
>    AUTHORIZATION (depending on command line switches).  Both of these
>    require superuser privilege, but in my case this is not really needed.
>    Dbowner has CREATEROLE and is the one who creates all the roles (WITH
>    SET TRUE), and their private schemata in the specific database.  Things
>    would work if pg_restore did "SET ROLE" instead of "SET SESSION
>    AUTHORIZATION" to switch user. Is this a straightforward change or there
>    are issues I don't see?

If this is --data-only what are the ALTER OWNER and SET SESSION 
AUTHORIZATION for?

> 
> * After each failed attempt, I need to issue a TRUNCATE table1,table2,...
>    before I try again.  I wrote my own function for that. It would help if
>    pg_restore would optionally truncate before COPY.  I believe it would
>    require superuser privilege for it, that could achieve using the
>    --superuser=username option used today for disabling the triggers.

That is what --clean is for, though it needs to have the objects(tables) 
be in the restore e.g. not just --data-only.


> 
> Performance issues: (important as my db size is >5TB)
> 
> * WAL writes: I didn't manage to avoid writing to the WAL, despite having
>    setting wal_level=minimal. I even wrote my own function to ALTER all
>    tables to UNLOGGED, but failed with "could not change table T to
>    unlogged because it references logged table".  I'm out of ideas on this
>    one.
> 
> * Indices: Could pg_restore have a switch to DROP indices before each
>    COPY, and re-CREATE them after, exactly as they were?  This would speed
>    up the process quite a bit.
> 
> 
> Any feedback for improving my process? Should I put these ideas somewhere
> as ideas for improvement on pg_restore?
> 
> Thank you in advance,
> Dimitris
> 
> 
> 

-- 
Adrian Klaver
[email protected]







view thread (2+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Experience and feedback on pg_restore --data-only
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox