Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lNest-0002Ty-Jr for pgsql-hackers@arkaria.postgresql.org; Sat, 20 Mar 2021 16:55:31 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1lNess-0003qo-HE for pgsql-hackers@arkaria.postgresql.org; Sat, 20 Mar 2021 16:55:30 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lNess-0003qh-4u for pgsql-hackers@lists.postgresql.org; Sat, 20 Mar 2021 16:55:30 +0000 Received: from mail-qk1-x733.google.com ([2607:f8b0:4864:20::733]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lNesp-0004bU-7i for pgsql-hackers@postgresql.org; Sat, 20 Mar 2021 16:55:29 +0000 Received: by mail-qk1-x733.google.com with SMTP id i9so6268744qka.2 for ; Sat, 20 Mar 2021 09:55:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wi3ck-info.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language:content-transfer-encoding; bh=svk3uVjsif9uh2dSfl4mCJbnHB6rSwm0fibFDdt0yWo=; b=ZLGCQW1hC5ZWBI24bkLdfVx4vKm6rsVNsigt7lky5OPZ+GNSdWKTomu8FY+DvPiOaH zYqoTl2jd2WZR4HjhK9tl40OnGrgaRVPPxTBDCSQm3tHVz9zewv4LKJ06HhjnU/9yxaJ w6JUwGnYKM/rpat13p9DRnLS0+S05MdpiTVc3mPSeoPGe+sacizaVbbMLt0cmepFuo+1 ytF+BeMHvXCyjiS0MM1zfwSb1JOFXpc9HepNIVDrbUEkR2NYEgoecvMjY4QSRDfwzsAw zaxHi0nFaFgSsP1Q6JaZ+1R+03aEa6j8W+eAUIdLbHQU8rf+U5+JYBkrxfbWA6FN1ZVm /QPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=svk3uVjsif9uh2dSfl4mCJbnHB6rSwm0fibFDdt0yWo=; b=dNHFijkdawZ6ybP5Qbk/5BAZPFSOqXer1H2t+iOUxOzS940NQsjx4PUX4xLEN5yePV ZTgLNu8AoDLtptYlGID5jb/AEz7PUZKKxlmB5vEnV8WLoLwsbBRiEfk5jbBCZoTYbv4p airQWyfxiVUILbH5y6ae1iqKGLtGJjQNt8DGEA3zgT29TxE2RQah42l1weEHPJqNUFwD jNs6iG+dpj2f0y7775vYLTb1mQK+wWXeIz1dSJEnsyaGOUsnRKKmd23l0/aGLT1b+cK9 gTNyAzlfrOfvZyY/Xlwyhjyfd6T14Hmoxg4Q0W59piIvDbQEtJ8lxvWmR7aA2hjcN6OH /Q/w== X-Gm-Message-State: AOAM530jk0M9yc6HfXkOc8Bd3tVa2Sa1xmeuQ3ePIfmSsFZQ/g7+e/oB 5qfbtG4eH77GYUwiFFLADR90Ahb90+UNzBxg X-Google-Smtp-Source: ABdhPJwBOf7QC1aGmD9wen/q2TcyViHJjiCMh8dZ78JTJTDF+wRZBXvUyLgzO4VkoOwKiNrAirP1zg== X-Received: by 2002:a05:620a:84a:: with SMTP id u10mr3755583qku.389.1616259325656; Sat, 20 Mar 2021 09:55:25 -0700 (PDT) Received: from jupiter.onmars.janwieck.no-ip.info (pool-98-114-241-134.phlapa.fios.verizon.net. [98.114.241.134]) by smtp.gmail.com with ESMTPSA id t21sm6004494qtr.67.2021.03.20.09.55.24 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 20 Mar 2021 09:55:25 -0700 (PDT) Subject: Re: pg_upgrade failing for 200+ million Large Objects To: Tom Lane Cc: Magnus Hagander , Robins Tharakan , Peter Eisentraut , "pgsql-hackers@postgresql.org" References: <1742698.1615221182@sss.pgh.pa.us> <1743618.1615222719@sss.pgh.pa.us> <4fbf92f2-ec91-f4fa-a259-f0968e34f3d7@wi3ck.info> <181907.1616253799@sss.pgh.pa.us> From: Jan Wieck Message-ID: <147fa478-510b-18ef-5323-9c1725b2493c@wi3ck.info> Date: Sat, 20 Mar 2021 12:55:24 -0400 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <181907.1616253799@sss.pgh.pa.us> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 3/20/21 11:23 AM, Tom Lane wrote: > Jan Wieck writes: >> All that aside, the entire approach doesn't scale. > > Yeah, agreed. When we gave large objects individual ownership and ACL > info, it was argued that pg_dump could afford to treat each one as a > separate TOC entry because "you wouldn't have that many of them, if > they're large". The limits of that approach were obvious even at the > time, and I think now we're starting to see people for whom it really > doesn't work. It actually looks more like some users have millions of "small objects". I am still wondering where that is coming from and why they are abusing LOs in that way, but that is more out of curiosity. Fact is that they are out there and that they cannot upgrade from their 9.5 databases, which are now past EOL. > > I wonder if pg_dump could improve matters cheaply by aggregating the > large objects by owner and ACL contents. That is, do > > select distinct lomowner, lomacl from pg_largeobject_metadata; > > and make just *one* BLOB TOC entry for each result. Then dump out > all the matching blobs under that heading. What I am currently experimenting with is moving the BLOB TOC entries into the parallel data phase of pg_restore "when doing binary upgrade". It seems to scale nicely with the number of cores in the system. In addition to that have options for pg_upgrade and pg_restore that cause the restore to batch them into transactions, like 10,000 objects at a time. There was a separate thread for that but I guess it is better to keep it all together here now. > > A possible objection is that it'd reduce the ability to restore blobs > selectively, so maybe we'd need to make it optional. I fully intend to make all this into new "options". I am afraid that there is no one-size-fits-all solution here. > > Of course, that just reduces the memory consumption on the client > side; it does nothing for the locks. Can we get away with releasing the > lock immediately after doing an ALTER OWNER or GRANT/REVOKE on a blob? I'm not very fond of the idea going lockless when at the same time trying to parallelize the restore phase. That can lead to really nasty race conditions. For now I'm aiming at batches in transactions. Regards, Jan -- Jan Wieck Principle Database Engineer Amazon Web Services