public inbox for [email protected]help / color / mirror / Atom feed
Re: Experience and feedback on pg_restore --data-only 7+ messages / 4 participants [nested] [flat]
* Re: Experience and feedback on pg_restore --data-only @ 2025-03-24 15:31 Adrian Klaver <[email protected]> 0 siblings, 2 replies; 7+ messages in thread From: Adrian Klaver @ 2025-03-24 15:31 UTC (permalink / raw) To: Dimitrios Apostolou <[email protected]>; Laurenz Albe <[email protected]>; +Cc: [email protected] On 3/24/25 07:24, Dimitrios Apostolou wrote: > On Sun, 23 Mar 2025, Laurenz Albe wrote: > >> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote: >>> Performance issues: (important as my db size is >5TB) >>> >>> * WAL writes: I didn't manage to avoid writing to the WAL, despite >>> having >>> setting wal_level=minimal. I even wrote my own function to ALTER all >>> tables to UNLOGGED, but failed with "could not change table T to >>> unlogged because it references logged table". I'm out of ideas on >>> this >>> one. >> >> You'd have to create an load the table in the same transaction, that is, >> you'd have to run pg_restore with --single-transaction. > > That would restore the schema from the dump, while I want to create the > schema from the SQL code in version control. I am not following, from your original post: " ... create a clean database by running the SQL schema definition from version control, and then copy the data for only the tables created. For this case, I choose to run pg_restore --data-only, and run it as the user who owns the database (dbowner), not as a superuser, in order to avoid changes being introduced under the radar. " You are running the process in two steps, where the first does not involve pg_restore. Not sure why doing the pg_restore --data-only portion in single transaction is not possible? > > Something that might work, would be for pg_restore to issue a TRUNCATE > before the COPY. I believe this would require superuser privelege though, > that I would prefer to avoid. Currently I issue TRUNCATE for all tables > manually before running pg_restore, but of course this is in a different > transaction so it doesn't help. > > By the way do you see potential problems with using --single-transaction > to restore billion-rows tables? COPY is all or none(version 17+ caveat(see https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so if the data dump fails in --single-transaction everything rolls back. > > > Thank you, > Dimitris -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only @ 2025-03-24 15:51 Dimitrios Apostolou <[email protected]> parent: Adrian Klaver <[email protected]> 1 sibling, 3 replies; 7+ messages in thread From: Dimitrios Apostolou @ 2025-03-24 15:51 UTC (permalink / raw) To: Adrian Klaver <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected] On Mon, 24 Mar 2025, Adrian Klaver wrote: > On 3/24/25 07:24, Dimitrios Apostolou wrote: >> On Sun, 23 Mar 2025, Laurenz Albe wrote: >> >>> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote: >>>> Performance issues: (important as my db size is >5TB) >>>> >>>> * WAL writes: I didn't manage to avoid writing to the WAL, despite >>>> having >>>> setting wal_level=minimal. I even wrote my own function to ALTER all >>>> tables to UNLOGGED, but failed with "could not change table T to >>>> unlogged because it references logged table". I'm out of ideas on >>>> this >>>> one. >>> >>> You'd have to create an load the table in the same transaction, that is, >>> you'd have to run pg_restore with --single-transaction. >> >> That would restore the schema from the dump, while I want to create the >> schema from the SQL code in version control. > > > I am not following, from your original post: > > " > ... create a > clean database by running the SQL schema definition from version control, and > then copy the data for only the tables created. > > For this case, I choose to run pg_restore --data-only, and run it as the user > who owns the database (dbowner), not as a superuser, in order to avoid > changes being introduced under the radar. > " > > You are running the process in two steps, where the first does not involve > pg_restore. Not sure why doing the pg_restore --data-only portion in single > transaction is not possible? Laurenz informed me that I could avoid writing to the WAL if I "create and load the table in a single transaction". I haven't tried, but here is what I would do to try --single-transaction: Transaction 1: manually issuing all of CREATE TABLE etc. Transaction 2: pg_restore --single-transaction --data-only The COPY command in transaction 2 would still need to write to WAL, since it's separate from the CREATE TABLE. Am I wrong somewhere? >> Something that might work, would be for pg_restore to issue a TRUNCATE >> before the COPY. I believe this would require superuser privelege though, >> that I would prefer to avoid. Currently I issue TRUNCATE for all tables >> manually before running pg_restore, but of course this is in a different >> transaction so it doesn't help. >> >> By the way do you see potential problems with using --single-transaction >> to restore billion-rows tables? > > COPY is all or none(version 17+ caveat(see > https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so if the > data dump fails in --single-transaction everything rolls back. So if I restore all tables, then an error about a "table not found" would not roll back already copied tables, since it's not part of a COPY? Thank you for the feedback, Dimitris ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only @ 2025-03-24 16:00 Ron Johnson <[email protected]> parent: Dimitrios Apostolou <[email protected]> 2 siblings, 0 replies; 7+ messages in thread From: Ron Johnson @ 2025-03-24 16:00 UTC (permalink / raw) To: pgsql-general Why are you regularly having emergencies requiring the restoration of multi-TB tables to databases with lots of cruft? Fixing that would go a long way towards eliminating your problems with pg_restore. On Mon, Mar 24, 2025 at 11:51 AM Dimitrios Apostolou <[email protected]> wrote: > On Mon, 24 Mar 2025, Adrian Klaver wrote: > > > On 3/24/25 07:24, Dimitrios Apostolou wrote: > >> On Sun, 23 Mar 2025, Laurenz Albe wrote: > >> > >>> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote: > >>>> Performance issues: (important as my db size is >5TB) > >>>> > >>>> * WAL writes: I didn't manage to avoid writing to the WAL, despite > >>>> having > >>>> setting wal_level=minimal. I even wrote my own function to ALTER > all > >>>> tables to UNLOGGED, but failed with "could not change table T to > >>>> unlogged because it references logged table". I'm out of ideas on > >>>> this > >>>> one. > >>> > >>> You'd have to create an load the table in the same transaction, that > is, > >>> you'd have to run pg_restore with --single-transaction. > >> > >> That would restore the schema from the dump, while I want to create the > >> schema from the SQL code in version control. > > > > > > I am not following, from your original post: > > > > " > > ... create a > > clean database by running the SQL schema definition from version > control, and > > then copy the data for only the tables created. > > > > For this case, I choose to run pg_restore --data-only, and run it as the > user > > who owns the database (dbowner), not as a superuser, in order to avoid > > changes being introduced under the radar. > > " > > > > You are running the process in two steps, where the first does not > involve > > pg_restore. Not sure why doing the pg_restore --data-only portion in > single > > transaction is not possible? > > Laurenz informed me that I could avoid writing to the WAL if I "create and > load the table in a single transaction". > I haven't tried, but here is what I would do to try --single-transaction: > > Transaction 1: manually issuing all of CREATE TABLE etc. > > Transaction 2: pg_restore --single-transaction --data-only > > The COPY command in transaction 2 would still need to write to WAL, since > it's separate from the CREATE TABLE. > > Am I wrong somewhere? > > >> Something that might work, would be for pg_restore to issue a TRUNCATE > >> before the COPY. I believe this would require superuser privelege > though, > >> that I would prefer to avoid. Currently I issue TRUNCATE for all tables > >> manually before running pg_restore, but of course this is in a > different > >> transaction so it doesn't help. > >> > >> By the way do you see potential problems with using > --single-transaction > >> to restore billion-rows tables? > > > > COPY is all or none(version 17+ caveat(see > > https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so > if the > > data dump fails in --single-transaction everything rolls back. > > So if I restore all tables, then an error about a "table not found" would > not roll back already copied tables, since it's not part of a COPY? > > > Thank you for the feedback, > Dimitris > > -- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive. <Redacted> lobster! ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only @ 2025-03-24 16:09 Adrian Klaver <[email protected]> parent: Dimitrios Apostolou <[email protected]> 2 siblings, 0 replies; 7+ messages in thread From: Adrian Klaver @ 2025-03-24 16:09 UTC (permalink / raw) To: Dimitrios Apostolou <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected] On 3/24/25 08:51, Dimitrios Apostolou wrote: > On Mon, 24 Mar 2025, Adrian Klaver wrote: > >> On 3/24/25 07:24, Dimitrios Apostolou wrote: >>> On Sun, 23 Mar 2025, Laurenz Albe wrote: >>> >>>> On Thu, 2025-03-20 at 23:48 +0100, Dimitrios Apostolou wrote: >>>>> Performance issues: (important as my db size is >5TB) >>>>> >>>>> * WAL writes: I didn't manage to avoid writing to the WAL, despite >>>>> having >>>>> setting wal_level=minimal. I even wrote my own function to >>>>> ALTER all >>>>> tables to UNLOGGED, but failed with "could not change table T to >>>>> unlogged because it references logged table". I'm out of ideas on >>>>> this >>>>> one. >>>> >>>> You'd have to create an load the table in the same transaction, >>>> that is, >>>> you'd have to run pg_restore with --single-transaction. >>> >>> That would restore the schema from the dump, while I want to create the >>> schema from the SQL code in version control. >> >> >> I am not following, from your original post: >> >> " >> ... create a >> clean database by running the SQL schema definition from version >> control, and >> then copy the data for only the tables created. >> >> For this case, I choose to run pg_restore --data-only, and run it as >> the user >> who owns the database (dbowner), not as a superuser, in order to avoid >> changes being introduced under the radar. >> " >> >> You are running the process in two steps, where the first does not >> involve >> pg_restore. Not sure why doing the pg_restore --data-only portion in >> single >> transaction is not possible? > > Laurenz informed me that I could avoid writing to the WAL if I "create and > load the table in a single transaction". From here: https://www.postgresql.org/message-id/455d28421ae33c73b73a6f527d2f72816ca5dd29.camel%40cybertec.at What he said was: "You'd have to create an load the table in the same transaction, that is, you'd have to run pg_restore with --single-transaction." Where I assume he meant '... create and load ...'. That is not the same as what you are doing below. > I haven't tried, but here is what I would do to try --single-transaction: > > Transaction 1: manually issuing all of CREATE TABLE etc. > > Transaction 2: pg_restore --single-transaction --data-only > > The COPY command in transaction 2 would still need to write to WAL, since > it's separate from the CREATE TABLE. > > Am I wrong somewhere? > >> COPY is all or none(version 17+ caveat(see >> https://www.postgresql.org/docs/current/sql-copy.html ON_ERROR)), so >> if the >> data dump fails in --single-transaction everything rolls back. > > So if I restore all tables, then an error about a "table not found" would > not roll back already copied tables, since it's not part of a COPY? If you are following what you show above then the tables and other objects would be created manually from the version control outside of pg_restore and on successful completion and commit of that transaction they would persist until such time as you change them. The second step pg_restore --single-transaction --data-only is where you could 99% of the way through and have a failure that rolls back all the data entered in the tables. > > > Thank you for the feedback, > Dimitris > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only @ 2025-03-24 16:15 Dimitrios Apostolou <[email protected]> parent: Adrian Klaver <[email protected]> 1 sibling, 1 reply; 7+ messages in thread From: Dimitrios Apostolou @ 2025-03-24 16:15 UTC (permalink / raw) To: Ron Johnson <[email protected]>; +Cc: Laurenz Albe <[email protected]>; Adrian Klaver <[email protected]>; [email protected] Hi Ron, I read your reply in the mailing list archives as I'm not subscribed to the list, and I'm copy-pasting a response here. Please include me as a recipient in further replies. > Why are you regularly having emergencies requiring the restoration of > multi-TB tables to databases with lots of cruft? > > Fixing that would go a long way towards eliminating your problems with > pg_restore. I don't have emergencies yet. I'm testing the process of restoring the database dump, and it takes more than 24 hours currently. A successful test is vital to approve the process. But the primary usage of pg_restore that I have is not to save me from emergencies but to populate the dev database with recent data. Regards, Dimitris ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only @ 2025-03-24 18:41 Laurenz Albe <[email protected]> parent: Dimitrios Apostolou <[email protected]> 2 siblings, 0 replies; 7+ messages in thread From: Laurenz Albe @ 2025-03-24 18:41 UTC (permalink / raw) To: Dimitrios Apostolou <[email protected]>; Adrian Klaver <[email protected]>; +Cc: [email protected] On Mon, 2025-03-24 at 16:51 +0100, Dimitrios Apostolou wrote: > Laurenz informed me that I could avoid writing to the WAL if I "create and > load the table in a single transaction". > I haven't tried, but here is what I would do to try --single-transaction: > > Transaction 1: manually issuing all of CREATE TABLE etc. > > Transaction 2: pg_restore --single-transaction --data-only > > The COPY command in transaction 2 would still need to write to WAL, since > it's separate from the CREATE TABLE. > > Am I wrong somewhere? No, that is correct. Yours, Laurenz Albe ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Experience and feedback on pg_restore --data-only @ 2025-03-25 15:27 Adrian Klaver <[email protected]> parent: Dimitrios Apostolou <[email protected]> 0 siblings, 0 replies; 7+ messages in thread From: Adrian Klaver @ 2025-03-25 15:27 UTC (permalink / raw) To: Dimitrios Apostolou <[email protected]>; Ron Johnson <[email protected]>; +Cc: Laurenz Albe <[email protected]>; [email protected] On 3/24/25 09:15, Dimitrios Apostolou wrote: > Hi Ron, > > I read your reply in the mailing list archives as I'm not subscribed to > the list, and I'm copy-pasting a response here. Please include me as a > recipient in further replies. > >> Why are you regularly having emergencies requiring the restoration of >> multi-TB tables to databases with lots of cruft? >> >> Fixing that would go a long way towards eliminating your problems with >> pg_restore. > > I don't have emergencies yet. I'm testing the process of restoring the > database dump, and it takes more than 24 hours currently. A successful > test is vital to approve the process. It is doubtful that pg_dump/pg_restore will meet the requirements. You are probably looking at some process that does incremental updates and then restores from that. Something like pgbackrest: https://pgbackrest.org/ comes to mind. > > But the primary usage of pg_restore that I have is not to save me from > emergencies but to populate the dev database with recent data. > > > Regards, > Dimitris > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 7+ messages in thread
end of thread, other threads:[~2025-03-25 15:27 UTC | newest] Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-03-24 15:31 Re: Experience and feedback on pg_restore --data-only Adrian Klaver <[email protected]> 2025-03-24 15:51 ` Dimitrios Apostolou <[email protected]> 2025-03-24 16:00 ` Ron Johnson <[email protected]> 2025-03-24 16:09 ` Adrian Klaver <[email protected]> 2025-03-24 18:41 ` Laurenz Albe <[email protected]> 2025-03-24 16:15 ` Dimitrios Apostolou <[email protected]> 2025-03-25 15:27 ` Adrian Klaver <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox