public inbox for [email protected]
help / color / mirror / Atom feedRe: Experience and feedback on pg_restore --data-only
7+ messages / 4 participants
[nested] [flat]
* Re: Experience and feedback on pg_restore --data-only
@ 2025-03-24 15:31 Adrian Klaver <[email protected]>
2025-03-24 15:51 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
2025-03-24 16:15 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
0 siblings, 2 replies; 7+ messages in thread
From: Adrian Klaver @ 2025-03-24 15:31 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; Laurenz Albe <[email protected]>; +Cc: [email protected]
On 3/24/25 07:24, Dimitrios Apostolou wrote:
> On Sun, 23 Mar 2025, Laurenz Albe wrote:
>
>> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
>>> Performance issues: (important as my db size is >5TB)
>>>
>>> * WAL writes: I didn't manage to avoid writing to the WAL, despite
>>> having
>>> setting wal_level=minimal. I even wrote my own function to ALTER all
>>> tables to UNLOGGED, but failed with "could not change table T to
>>> unlogged because it references logged table". I'm out of ideas on
>>> this
>>> one.
>>
>> You'd have to create an load the table in the same transaction, that is,
>> you'd have to run pg_restore with --single-transaction.
>
> That would restore the schema from the dump, while I want to create the
> schema from the SQL code in version control.
I am not following, from your original post:
"
... create a
clean database by running the SQL schema definition from version
control, and then copy the data for only the tables created.
For this case, I choose to run pg_restore --data-only, and run it as the
user who owns the database (dbowner), not as a superuser, in order to
avoid changes being introduced under the radar.
"
You are running the process in two steps, where the first does not
involve pg_restore. Not sure why doing the pg_restore --data-only
portion in single transaction is not possible?
>
> Something that might work, would be for pg_restore to issue a TRUNCATE
> before the COPY. I believe this would require superuser privelege though,
> that I would prefer to avoid. Currently I issue TRUNCATE for all tables
> manually before running pg_restore, but of course this is in a different
> transaction so it doesn't help.
>
> By the way do you see potential problems with using --single-transaction
> to restore billion-rows tables?
COPY is all or none(version 17+ caveat(see
https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so if
the data dump fails in --single-transaction everything rolls back.
>
>
> Thank you,
> Dimitris
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only
2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
@ 2025-03-24 15:51 ` Dimitrios Apostolou <[email protected]>
2025-03-24 16:00 ` Re: Experience and feedback on pg_restore --data-only Ron Johnson <[email protected]>
2025-03-24 16:09 ` Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
2025-03-24 18:41 ` Re: Experience and feedback on pg_restore --data-only Laurenz Albe <[email protected]>
1 sibling, 3 replies; 7+ messages in thread
From: Dimitrios Apostolou @ 2025-03-24 15:51 UTC (permalink / raw)
To: Adrian Klaver <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected]
On Mon, 24 Mar 2025, Adrian Klaver wrote:
> On 3/24/25 07:24, Dimitrios Apostolou wrote:
>> On Sun, 23 Mar 2025, Laurenz Albe wrote:
>>
>>> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
>>>> Performance issues: (important as my db size is >5TB)
>>>>
>>>> * WAL writes: I didn't manage to avoid writing to the WAL, despite
>>>> having
>>>> setting wal_level=minimal. I even wrote my own function to ALTER all
>>>> tables to UNLOGGED, but failed with "could not change table T to
>>>> unlogged because it references logged table". I'm out of ideas on
>>>> this
>>>> one.
>>>
>>> You'd have to create an load the table in the same transaction, that is,
>>> you'd have to run pg_restore with --single-transaction.
>>
>> That would restore the schema from the dump, while I want to create the
>> schema from the SQL code in version control.
>
>
> I am not following, from your original post:
>
> "
> ... create a
> clean database by running the SQL schema definition from version control, and
> then copy the data for only the tables created.
>
> For this case, I choose to run pg_restore --data-only, and run it as the user
> who owns the database (dbowner), not as a superuser, in order to avoid
> changes being introduced under the radar.
> "
>
> You are running the process in two steps, where the first does not involve
> pg_restore. Not sure why doing the pg_restore --data-only portion in single
> transaction is not possible?
Laurenz informed me that I could avoid writing to the WAL if I "create and
load the table in a single transaction".
I haven't tried, but here is what I would do to try --single-transaction:
Transaction 1: manually issuing all of CREATE TABLE etc.
Transaction 2: pg_restore --single-transaction --data-only
The COPY command in transaction 2 would still need to write to WAL, since
it's separate from the CREATE TABLE.
Am I wrong somewhere?
>> Something that might work, would be for pg_restore to issue a TRUNCATE
>> before the COPY. I believe this would require superuser privelege though,
>> that I would prefer to avoid. Currently I issue TRUNCATE for all tables
>> manually before running pg_restore, but of course this is in a different
>> transaction so it doesn't help.
>>
>> By the way do you see potential problems with using --single-transaction
>> to restore billion-rows tables?
>
> COPY is all or none(version 17+ caveat(see
> https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so if the
> data dump fails in --single-transaction everything rolls back.
So if I restore all tables, then an error about a "table not found" would
not roll back already copied tables, since it's not part of a COPY?
Thank you for the feedback,
Dimitris
^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only
2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
2025-03-24 15:51 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
@ 2025-03-24 16:00 ` Ron Johnson <[email protected]>
2 siblings, 0 replies; 7+ messages in thread
From: Ron Johnson @ 2025-03-24 16:00 UTC (permalink / raw)
To: pgsql-general
Why are you regularly having emergencies requiring the restoration of
multi-TB tables to databases with lots of cruft?
Fixing that would go a long way towards eliminating your problems with
pg_restore.
On Mon, Mar 24, 2025 at 11:51 AM Dimitrios Apostolou <[email protected]> wrote:
> On Mon, 24 Mar 2025, Adrian Klaver wrote:
>
> > On 3/24/25 07:24, Dimitrios Apostolou wrote:
> >> On Sun, 23 Mar 2025, Laurenz Albe wrote:
> >>
> >>> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
> >>>> Performance issues: (important as my db size is >5TB)
> >>>>
> >>>> * WAL writes: I didn't manage to avoid writing to the WAL, despite
> >>>> having
> >>>> setting wal_level=minimal. I even wrote my own function to ALTER
> all
> >>>> tables to UNLOGGED, but failed with "could not change table T to
> >>>> unlogged because it references logged table". I'm out of ideas on
> >>>> this
> >>>> one.
> >>>
> >>> You'd have to create an load the table in the same transaction, that
> is,
> >>> you'd have to run pg_restore with --single-transaction.
> >>
> >> That would restore the schema from the dump, while I want to create the
> >> schema from the SQL code in version control.
> >
> >
> > I am not following, from your original post:
> >
> > "
> > ... create a
> > clean database by running the SQL schema definition from version
> control, and
> > then copy the data for only the tables created.
> >
> > For this case, I choose to run pg_restore --data-only, and run it as the
> user
> > who owns the database (dbowner), not as a superuser, in order to avoid
> > changes being introduced under the radar.
> > "
> >
> > You are running the process in two steps, where the first does not
> involve
> > pg_restore. Not sure why doing the pg_restore --data-only portion in
> single
> > transaction is not possible?
>
> Laurenz informed me that I could avoid writing to the WAL if I "create and
> load the table in a single transaction".
> I haven't tried, but here is what I would do to try --single-transaction:
>
> Transaction 1: manually issuing all of CREATE TABLE etc.
>
> Transaction 2: pg_restore --single-transaction --data-only
>
> The COPY command in transaction 2 would still need to write to WAL, since
> it's separate from the CREATE TABLE.
>
> Am I wrong somewhere?
>
> >> Something that might work, would be for pg_restore to issue a TRUNCATE
> >> before the COPY. I believe this would require superuser privelege
> though,
> >> that I would prefer to avoid. Currently I issue TRUNCATE for all tables
> >> manually before running pg_restore, but of course this is in a
> different
> >> transaction so it doesn't help.
> >>
> >> By the way do you see potential problems with using
> --single-transaction
> >> to restore billion-rows tables?
> >
> > COPY is all or none(version 17+ caveat(see
> > https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so
> if the
> > data dump fails in --single-transaction everything rolls back.
>
> So if I restore all tables, then an error about a "table not found" would
> not roll back already copied tables, since it's not part of a COPY?
>
>
> Thank you for the feedback,
> Dimitris
>
>
--
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!
^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only
2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
2025-03-24 15:51 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
@ 2025-03-24 16:09 ` Adrian Klaver <[email protected]>
2 siblings, 0 replies; 7+ messages in thread
From: Adrian Klaver @ 2025-03-24 16:09 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected]
On 3/24/25 08:51, Dimitrios Apostolou wrote:
> On Mon, 24 Mar 2025, Adrian Klaver wrote:
>
>> On 3/24/25 07:24, Dimitrios Apostolou wrote:
>>> On Sun, 23 Mar 2025, Laurenz Albe wrote:
>>>
>>>> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
>>>>> Performance issues: (important as my db size is >5TB)
>>>>>
>>>>> * WAL writes: I didn't manage to avoid writing to the WAL, despite
>>>>> having
>>>>> setting wal_level=minimal. I even wrote my own function to
>>>>> ALTER all
>>>>> tables to UNLOGGED, but failed with "could not change table T to
>>>>> unlogged because it references logged table". I'm out of ideas on
>>>>> this
>>>>> one.
>>>>
>>>> You'd have to create an load the table in the same transaction,
>>>> that is,
>>>> you'd have to run pg_restore with --single-transaction.
>>>
>>> That would restore the schema from the dump, while I want to create the
>>> schema from the SQL code in version control.
>>
>>
>> I am not following, from your original post:
>>
>> "
>> ... create a
>> clean database by running the SQL schema definition from version
>> control, and
>> then copy the data for only the tables created.
>>
>> For this case, I choose to run pg_restore --data-only, and run it as
>> the user
>> who owns the database (dbowner), not as a superuser, in order to avoid
>> changes being introduced under the radar.
>> "
>>
>> You are running the process in two steps, where the first does not
>> involve
>> pg_restore. Not sure why doing the pg_restore --data-only portion in
>> single
>> transaction is not possible?
>
> Laurenz informed me that I could avoid writing to the WAL if I "create and
> load the table in a single transaction".
From here:
https://www.postgresql.org/message-id/455d28421ae33c73b73a6f527d2f72816ca5dd29.camel%40cybertec.at
What he said was:
"You'd have to create an load the table in the same transaction, that
is, you'd have to run pg_restore with --single-transaction."
Where I assume he meant '... create and load ...'. That is not the same
as what you are doing below.
> I haven't tried, but here is what I would do to try --single-transaction:
>
> Transaction 1: manually issuing all of CREATE TABLE etc.
>
> Transaction 2: pg_restore --single-transaction --data-only
>
> The COPY command in transaction 2 would still need to write to WAL, since
> it's separate from the CREATE TABLE.
>
> Am I wrong somewhere?
>
>> COPY is all or none(version 17+ caveat(see
>> https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so
>> if the
>> data dump fails in --single-transaction everything rolls back.
>
> So if I restore all tables, then an error about a "table not found" would
> not roll back already copied tables, since it's not part of a COPY?
If you are following what you show above then the tables and other
objects would be created manually from the version control outside of
pg_restore and on successful completion and commit of that transaction
they would persist until such time as you change them. The second step
pg_restore --single-transaction --data-only is where you could 99% of
the way through and have a failure that rolls back all the data entered
in the tables.
>
>
> Thank you for the feedback,
> Dimitris
>
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only
2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
2025-03-24 15:51 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
@ 2025-03-24 18:41 ` Laurenz Albe <[email protected]>
2 siblings, 0 replies; 7+ messages in thread
From: Laurenz Albe @ 2025-03-24 18:41 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; Adrian Klaver <[email protected]>; +Cc: [email protected]
On Mon, 2025-03-24 at 16:51 +0100, Dimitrios Apostolou wrote:
> Laurenz informed me that I could avoid writing to the WAL if I "create and
> load the table in a single transaction".
> I haven't tried, but here is what I would do to try --single-transaction:
>
> Transaction 1: manually issuing all of CREATE TABLE etc.
>
> Transaction 2: pg_restore --single-transaction --data-only
>
> The COPY command in transaction 2 would still need to write to WAL, since
> it's separate from the CREATE TABLE.
>
> Am I wrong somewhere?
No, that is correct.
Yours,
Laurenz Albe
^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only
2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
@ 2025-03-24 16:15 ` Dimitrios Apostolou <[email protected]>
2025-03-25 15:27 ` Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
1 sibling, 1 reply; 7+ messages in thread
From: Dimitrios Apostolou @ 2025-03-24 16:15 UTC (permalink / raw)
To: Ron Johnson <[email protected]>; +Cc: Laurenz Albe <[email protected]>; Adrian Klaver <[email protected]>; [email protected]
Hi Ron,
I read your reply in the mailing list archives as I'm not subscribed to
the list, and I'm copy-pasting a response here. Please include me as a
recipient in further replies.
> Why are you regularly having emergencies requiring the restoration of
> multi-TB tables to databases with lots of cruft?
>
> Fixing that would go a long way towards eliminating your problems with
> pg_restore.
I don't have emergencies yet. I'm testing the process of restoring the
database dump, and it takes more than 24 hours currently. A successful
test is vital to approve the process.
But the primary usage of pg_restore that I have is not to save me from
emergencies but to populate the dev database with recent data.
Regards,
Dimitris
^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only
2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
2025-03-24 16:15 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
@ 2025-03-25 15:27 ` Adrian Klaver <[email protected]>
0 siblings, 0 replies; 7+ messages in thread
From: Adrian Klaver @ 2025-03-25 15:27 UTC (permalink / raw)
To: Dimitrios Apostolou <[email protected]>; Ron Johnson <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected]
On 3/24/25 09:15, Dimitrios Apostolou wrote:
> Hi Ron,
>
> I read your reply in the mailing list archives as I'm not subscribed to
> the list, and I'm copy-pasting a response here. Please include me as a
> recipient in further replies.
>
>> Why are you regularly having emergencies requiring the restoration of
>> multi-TB tables to databases with lots of cruft?
>>
>> Fixing that would go a long way towards eliminating your problems with
>> pg_restore.
>
> I don't have emergencies yet. I'm testing the process of restoring the
> database dump, and it takes more than 24 hours currently. A successful
> test is vital to approve the process.
It is doubtful that pg_dump/pg_restore will meet the requirements. You
are probably looking at some process that does incremental updates and
then restores from that. Something like pgbackrest:
https://pgbackrest.org/
comes to mind.
>
> But the primary usage of pg_restore that I have is not to save me from
> emergencies but to populate the dev database with recent data.
>
>
> Regards,
> Dimitris
>
--
Adrian Klaver
[email protected]
^ permalink raw reply [nested|flat] 7+ messages in thread
end of thread, other threads:[~2025-03-25 15:27 UTC | newest]
Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]>
2025-03-24 15:51 ` Dimitrios Apostolou <[email protected]>
2025-03-24 16:00 ` Ron Johnson <[email protected]>
2025-03-24 16:09 ` Adrian Klaver <[email protected]>
2025-03-24 18:41 ` Laurenz Albe <[email protected]>
2025-03-24 16:15 ` Dimitrios Apostolou <[email protected]>
2025-03-25 15:27 ` Adrian Klaver <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox