Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lJief-000140-Nf for pgsql-hackers@arkaria.postgresql.org; Tue, 09 Mar 2021 20:08:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1lJiee-0005DR-Hc for pgsql-hackers@arkaria.postgresql.org; Tue, 09 Mar 2021 20:08:32 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1lJiee-0005DK-6F for pgsql-hackers@lists.postgresql.org; Tue, 09 Mar 2021 20:08:32 +0000 Received: from mail-il1-x130.google.com ([2607:f8b0:4864:20::130]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1lJieW-0005l2-Ml for pgsql-hackers@postgresql.org; Tue, 09 Mar 2021 20:08:31 +0000 Received: by mail-il1-x130.google.com with SMTP id d5so13349507iln.6 for ; Tue, 09 Mar 2021 12:08:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=telsasoft-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=7UmrgEEaeannOMaTJ+BdoJY0HdcDzH9qXMzJJ4F+iL8=; b=D3bD+vSLprg+RPiT1DYt+OCGjGTqlGTPbyvmKsAyLUFhdUzwwIVQgFaG5VcFmdvJ1H zhua1NkIAo35So2Yg8RJ9ws0Q9NBu9HfAoLr4fy6BtACZJrwD3Ns1c7Kmj96LNq5UWKi s1A6kEjLszgmMlQPEn3awvvHVNv5ATFBaUIY+QoccbvPNCsRmtmNcS5t9hElYJaydRyZ nVJeqFIgKHvqyX1xYlRlEyMbnq+2Hgt13lk/NfgEFG8G7Exktf0EAcfKv/KYjMmTx6XR +tx2yo8DRpfr4ewtvqJxoRB1LtYfaHbiJd1fnj0OEH/YcMXP6Gx/0j5cOgzSjQTEKf9m +cRg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=7UmrgEEaeannOMaTJ+BdoJY0HdcDzH9qXMzJJ4F+iL8=; b=WzYcrrD/LfktYSjWcpdgqrD8W6dgcp51NFvLRjkShQoo8O2EU8S1JH2DGCO4TxHvyO GjsXWStYoxkCRd190oqlAxvpnPVwT8tsGjZUjH9h7SWTcw3s/EQ5WmEBhe6GsJJY3BoG JDA8yRT60eHrOKFQXC9vzk5FA5+3coP/wG4MoOMSefkmcEowwb4QWO+E7FPBSRstpAVz p8M+94ooZvoMkCVV3TH5mwqf+3wth8/MOqW0cdDHYYDUUXYlkUQsSjBzlmseiCXHwVP/ aP9thyce1n0N9qqPTAQhDoLWngRdCz0juOn+sgKl8ntmoQs+yjdv56R3py4GXJU4NpCw K8fA== X-Gm-Message-State: AOAM530U4vFY6hp+h1KwTyH5SC/3HGfdH81jdn1wqTxN60OX1gfAKAZw +VONqGa+RkRpNEdRO2lP/jeeYA== X-Google-Smtp-Source: ABdhPJy9QwGRuNRokB0PkHaooPo1zFPUfp/KPS43Uc+xrzT6jHPQBQDmBnQ7MQ9egtwPcBKgiF9+oA== X-Received: by 2002:a92:d084:: with SMTP id h4mr24990799ilh.290.1615320502539; Tue, 09 Mar 2021 12:08:22 -0800 (PST) Received: from pryzbyj.telsasoft (charmander.telsasoft.com. [50.244.222.1]) by smtp.gmail.com with ESMTPSA id g6sm8199845ilj.28.2021.03.09.12.08.21 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 09 Mar 2021 12:08:21 -0800 (PST) Received: by pryzbyj.telsasoft (Postfix, from userid 1000) id E42488009DB; Tue, 9 Mar 2021 14:08:19 -0600 (CST) Date: Tue, 9 Mar 2021 14:08:19 -0600 From: Justin Pryzby To: "Tharakan, Robins" Cc: pgsql-hackers@postgresql.org Subject: Re: pg_upgrade failing for 200+ million Large Objects Message-ID: <20210309200819.GO2021@telsasoft.com> References: <12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="gatW/ieO32f1wygP" Content-Disposition: inline In-Reply-To: <12601596dbbc4c01b86b4ac4d2bd4d48@EX13D05UWC001.ant.amazon.com> User-Agent: Mutt/1.9.4 (2018-02-28) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --gatW/ieO32f1wygP Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Wed, Mar 03, 2021 at 11:36:26AM +0000, Tharakan, Robins wrote: > While reviewing a failed upgrade from Postgres v9.5 (to v9.6) I saw that the > instance had ~200 million (in-use) Large Objects. I was able to reproduce > this on a test instance which too fails with a similar error. If pg_upgrade can't handle millions of objects/transactions/XIDs, that seems like a legitimate complaint, since apparently the system is working okay otherwise. But it also seems like you're using it outside the range of its intended use (See also [1]). I'm guessing that not many people are going to spend time running tests of pg_upgrade, each of which takes 25hr, not to mention some multiple of 128GB RAM+swap. Creating millions of large objects was too slow for me to test like this: | time { echo 'begin;'; for a in `seq 1 99999`; do echo '\lo_import /dev/null'; done; echo 'commit;'; } |psql -qh /tmp postgres& This seems to be enough for what's needed: | ALTER SYSTEM SET fsync=no; ALTER SYSTEM SET full_page_writes=no; SELECT pg_reload_conf(); | INSERT INTO pg_largeobject_metadata SELECT a, 0 FROM generate_series(100000, 200111222)a; Now, testing the pg_upgrade was killed after runnning 100min and using 60GB RAM, so you might say that's a problem too. I converted getBlobs() to use a cursor, like dumpBlobs(), but it was still killed. I think a test case and a way to exercizes this failure with a more reasonable amount of time and resources might be a prerequisite for a patch to fix it. pg_upgrade is meant for "immediate" upgrades, frequently allowing upgrade in minutes, where pg_dump |pg_restore might take hours or days. There's two components to consider: the catalog/metadata part, and the data part. If the data is large (let's say more than 100GB), then pg_upgrade is expected to be an improvement over the "dump and restore" process, which is usually infeasible for large DBs measure in TB. But the *catalog* part is large, and pg_upgrade still has to run pg_dump, and pg_restore. The time to do this might dominate over the data part. Our own customers DBs are 100s of GB to 10TB. For large customers, pg_upgrade takes 45min. In the past, we had tables with many column defaults, which caused the dump+restore to be slow at a larger fraction of customers. If it were me, in an EOL situation, I would look at either: 1) find a way to do dump+restore rather than pg_upgrade; and/or, 2) separately pg_dump the large objects, drop as many as you can, then pg_upgrade the DB, then restore the large objects. (And find a better way to store them in the future). I was able to hack pg_upgrade to call pg_restore --single (with a separate invocation to handle --create). That passes tests...but I can't say much beyond that. Regarding your existing patch: "make check" only tests SQL features. For development, you'll want to configure like: |./configure --enable-debug --enable-cassert --enable-tap-tests And then use "make check-world", and in particular: time make check -C src/bin/pg_resetwal time make check -C src/bin/pg_upgrade I don't think pg_restore needs a user-facing option for XIDs. I think it should "just work", since a user might be as likely to shoot themselves in the foot with a commandline option as they are to make an upgrade succeed that would otherwise fail. pg_upgrade has a --check mode, and if that passes, the upgrade is intended to work, and not fail halfway through between the schema dump and restore, with the expectation that the user know to rerun with some commandline flags. If you pursue the patch with setting a different XID threshold, maybe you could count the number of objects to be created, or transactions to be used, and use that as the argument to resetxlog ? I'm not sure, but pg_restore -l might be a good place to start looking. I think a goal for this patch should be to allow an increased number of objects to be handled by pg_upgrade. Large objects may be a special case, and increasing the number of other objects to be restored to the 100s of millions might be unimportant. -- Justin [1] https://www.postgresql.org/message-id/502641.1606334432%40sss.pgh.pa.us | Does pg_dump really have sane performance for that situation, or | are we soon going to be fielding requests to make it not be O(N^2) | in the number of listed tables? --gatW/ieO32f1wygP Content-Type: text/x-diff; charset=us-ascii Content-Disposition: attachment; filename="0001-pg_dump-use-a-cursor-in-getBlobs.patch"