public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Experience and feedback on  pg_restore --data-only
3+ messages / 2 participants
[nested] [flat]

* Re: Experience and feedback on  pg_restore --data-only
@ 2025-03-23 16:00 Laurenz Albe <[email protected]>
  2025-03-24 14:24 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
  2025-04-02 17:42 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
  0 siblings, 2 replies; 3+ messages in thread

From: Laurenz Albe @ 2025-03-23 16:00 UTC (permalink / raw)
  To: Dimitrios Apostolou <[email protected]>; [email protected]

On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
> Performance issues: (important as my db size is >5TB)
> 
> * WAL writes: I didn't manage to avoid writing to the WAL, despite having
>    setting wal_level=minimal. I even wrote my own function to ALTER all
>    tables to UNLOGGED, but failed with "could not change table T to
>    unlogged because it references logged table".  I'm out of ideas on this
>    one.

You'd have to create an load the table in the same transaction, that is,
you'd have to run pg_restore with --single-transaction.

> Any feedback for improving my process? Should I put these ideas somewhere
> as ideas for improvement on pg_restore?

You put your ideas in the right place.

Yours,
Laurenz Albe






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Experience and feedback on pg_restore --data-only
  2025-03-23 16:00 Re: Experience and feedback on  pg_restore --data-only Laurenz Albe <[email protected]>
@ 2025-03-24 14:24 ` Dimitrios Apostolou <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Dimitrios Apostolou @ 2025-03-24 14:24 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; +Cc: [email protected]

On Sun, 23 Mar 2025, Laurenz Albe wrote:

> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
>> Performance issues: (important as my db size is >5TB)
>>
>> * WAL writes: I didn't manage to avoid writing to the WAL, despite having
>>    setting wal_level=minimal. I even wrote my own function to ALTER all
>>    tables to UNLOGGED, but failed with "could not change table T to
>>    unlogged because it references logged table".  I'm out of ideas on this
>>    one.
>
> You'd have to create an load the table in the same transaction, that is,
> you'd have to run pg_restore with --single-transaction.

That would restore the schema from the dump, while I want to create the
schema from the SQL code in version control.

Something that might work, would be for pg_restore to issue a TRUNCATE
before the COPY. I believe this would require superuser privelege though,
that I would prefer to avoid. Currently I issue TRUNCATE for all tables
manually before running pg_restore, but of course this is in a different
transaction so it doesn't help.

By the way do you see potential problems with using --single-transaction
to restore billion-rows tables?


Thank you,
Dimitris


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Experience and feedback on pg_restore --data-only
  2025-03-23 16:00 Re: Experience and feedback on  pg_restore --data-only Laurenz Albe <[email protected]>
@ 2025-04-02 17:42 ` Dimitrios Apostolou <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Dimitrios Apostolou @ 2025-04-02 17:42 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; +Cc: [email protected]

Hi Laurenz,

On Sun, 23 Mar 2025, Laurenz Albe wrote:

> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote:
>> Performance issues: (important as my db size is >5TB)
>>
>> * WAL writes: I didn't manage to avoid writing to the WAL, despite having
>>    setting wal_level=minimal. I even wrote my own function to ALTER all
>>    tables to UNLOGGED, but failed with "could not change table T to
>>    unlogged because it references logged table".  I'm out of ideas on this
>>    one.
>
> You'd have to create an load the table in the same transaction, that is,
> you'd have to run pg_restore with --single-transaction.

Do you know why --single-transaction can't be combined with -j ?
I'm thinking it's because every process would need to have its own
transaction, which apparently results to more than one transaction.

Do you think this could be alleviated somehow with subtransactions?

I'm trying to think of a way to enjoy the performance benefits of
--single-transaction, while also parallelizing the pg_restore. For this
use case I don't care about the content appearing all in one transaction.
Would it be possible then to have dependency resolution in such a way that
we split the tasks into one transaction per worker process?

Just thinking out loud here, I'd be interested in feedback.

Thanks,
Dimitris


^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2025-04-02 17:42 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-03-23 16:00 Re: Experience and feedback on  pg_restore --data-only Laurenz Albe <[email protected]>
2025-03-24 14:24 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>
2025-04-02 17:42 ` Re: Experience and feedback on pg_restore --data-only Dimitrios Apostolou <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox