Re: pg_upgrade failing for 200+ million Large Objects

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Tom Lane <[email protected]>
To: Kumar, Sachin <[email protected]>
Cc: Robins Tharakan <[email protected]>
Cc: Nathan Bossart <[email protected]>
Cc: Jan Wieck <[email protected]>
Cc: Bruce Momjian <[email protected]>
Cc: Zhihong Yu <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Magnus Hagander <[email protected]>
Cc: Peter Eisentraut <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: pg_upgrade failing for 200+ million Large Objects
Date: Fri, 05 Jan 2024 15:02:34 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<20220825003227.GA1456581@nathanxps13>
	<[email protected]>
	<20220908231807.GA2242918@nathanxps13>
	<CAAWbhmgUb8p7ff_ZX5jCvqM=ipPxbbDJTXMNVzH-Ho_CXVkRHA@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

I wrote:
> "Kumar, Sachin" <[email protected]> writes:
>> I was not able to find email thread which details why we are not using
>> parallel pg_restore for pg_upgrade.

> Well, it's pretty obvious isn't it?  The parallelism is being applied
> at the per-database level instead.

On further reflection, there is a very good reason why it's done like
that.  Because pg_upgrade is doing schema-only dump and restore,
there's next to no opportunity for parallelism within either pg_dump
or pg_restore.  There's no data-loading steps, and there's no
index-building either, so the time-consuming stuff that could be
parallelized just isn't happening in pg_upgrade's usage.

Now it's true that my 0003 patch moves the needle a little bit:
since it makes BLOB creation (as opposed to loading) parallelizable,
there'd be some hope for parallel pg_restore doing something useful in
a database with very many blobs.  But it makes no sense to remove the
existing cross-database parallelism in pursuit of that; you'd make
many more people unhappy than happy.

Conceivably something could be salvaged of your idea by having
pg_upgrade handle databases with many blobs differently from
those without, applying parallelism within pg_restore for the
first kind and then using cross-database parallelism for the
rest.  But that seems like a lot of complexity compared to the
possible win.

In any case I'd stay far away from using --section in pg_upgrade.
Too many moving parts there.

			regards, tom lane

view thread (18+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: pg_upgrade failing for 200+ million Large Objects
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox