Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgAI2-008ElM-W9 for pgsql-general@arkaria.postgresql.org; Thu, 06 Feb 2025 22:24:07 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tgAI1-00Ccfj-J6 for pgsql-general@arkaria.postgresql.org; Thu, 06 Feb 2025 22:24:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgAI0-00Ccfb-Fe for pgsql-general@lists.postgresql.org; Thu, 06 Feb 2025 22:24:05 +0000 Received: from fout-a7-smtp.messagingengine.com ([103.168.172.150]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tgAHw-0048aH-2B for pgsql-general@postgresql.org; Thu, 06 Feb 2025 22:24:03 +0000 Received: from phl-compute-01.internal (phl-compute-01.phl.internal [10.202.2.41]) by mailfout.phl.internal (Postfix) with ESMTP id 929DF13801E8; Thu, 6 Feb 2025 17:23:59 -0500 (EST) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-01.internal (MEProxy); Thu, 06 Feb 2025 17:23:59 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1738880639; x=1738967039; bh=XoZ+DIQaTBDZizy9uDJnFMmiOhEzmLEEuXTiSZw5l50=; b= nKHpWk9no/ywoCqBF3522HlmiArE+DDcGgNvtHWvYmv5Ja2g54GpE162Qc/QKIto 0o/70NWncoqS7Bg2vhjdl3GzBNYk8Zc4QiOTWvKai/XrMPlB/N2Wmk+h1ZPFc46y B0woBavboo/sIoIg6E293yiV1nWB+ZfPd08ZJyGAMC7p3zFNHlW/nd2Dffs9vMAH CcmNLiBHlU2Al/19zZKCNi3nG0m0tqKI/doHjNWQ/GOdGQU08FhHRK4g9RRZV7pR pR6S6MeOe8kyaYI0vsLxr7d6P1kpoXQYXazAuuBrAfUTHA6+752siIb3hmsbHyzC sAHfCFAyKsjL1ST5lOVWvA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1738880639; x= 1738967039; bh=XoZ+DIQaTBDZizy9uDJnFMmiOhEzmLEEuXTiSZw5l50=; b=v LWDz+QUSpDZyOYFQx3uivPqDwRAautxe1dveA73S07jHfmFWsRQyo7dW/4pnzvma fM7J/89b06xLea2QsdqpYBXDpAwlnEKTLCZ3Uon7qn+Yi+1BSRrLxbnV7TLIkjMT KP2IH00pXms1Fj6a1Uj+OXgrexkxjH++Sg8GQ5HUfXbTPus4NEGg+ceMw1spTkwm FvF2HOIfc4ik6tqQxnPkqnmR1l1heVdMZSLux9dYKm3ojsB5e3BnZrq44WFNs3rs MnN5jq8zmPZ1q4/5xXQp2koe5+WABhdRtSgniH7D9WzFkM1QQcsIvxnKra/S0t6o 6PKrjeXBKKsSlSFLLui6w== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvjeehjecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivg hnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfevfhfhjggtgfesthekredttddv jeenucfhrhhomheptegurhhirghnucfmlhgrvhgvrhcuoegrughrihgrnhdrkhhlrghvvg hrsegrkhhlrghvvghrrdgtohhmqeenucggtffrrghtthgvrhhnpefggeduffetvdekheej hfffteeiheegudetfedvudfhleeghfeifeeuffdvvdehtdenucffohhmrghinhepphhosh htghhrvghsqhhlrdhorhhgpdgthigsvghrthgvtgdqphhoshhtghhrvghsqhhlrdgtohhm necuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomheprggurh hirghnrdhklhgrvhgvrhesrghklhgrvhgvrhdrtghomhdpnhgspghrtghpthhtohepvddp mhhouggvpehsmhhtphhouhhtpdhrtghpthhtoheprhhlvggrtghhsehprhhinhgtvghtoh hnrdgvughupdhrtghpthhtohepphhgshhqlhdqghgvnhgvrhgrlhesphhoshhtghhrvghs qhhlrdhorhhg X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 6 Feb 2025 17:23:58 -0500 (EST) Message-ID: Date: Thu, 6 Feb 2025 14:23:57 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: How to perform a long running dry run transaction without blocking To: Robert Leach Cc: pgsql-general References: <88d60ace-45e6-4d41-afc4-113df7219c4d@aklaver.com> <4000D0EE-B250-4E9E-831F-00C034D6D0B5@princeton.edu> <6d833658-f461-4ad4-a3e1-86d3c515bc18@aklaver.com> <0FE9C709-A108-4ED5-8132-B802B8D9908F@princeton.edu> <00DF52D1-5ED0-4599-8480-8C671434EE4E@princeton.edu> Content-Language: en-US From: Adrian Klaver In-Reply-To: <00DF52D1-5ED0-4599-8480-8C671434EE4E@princeton.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2/6/25 12:08, Robert Leach wrote: >> Alright I am trying to reconcile this with from below, 'The largest >> studies take just under a minute'. > > The context of the 'The largest studies take just under a minute' > statement is that it's not loading the hefty/time-consuming raw data. >  It's only validating the metadata.  That's fast (5-60s).  And that > data is a portion of the transaction in the back-end load.  There are > errors that validation can miss that are due to not touching the raw > data, and in fact, those errors are addressed by curators editing the > excel sheets.  That's why it's all in the load transaction instead of As a scientist that makes me start to twitch. Is there an audit trail for that? > > I'm unfamiliar with retry functionality, but those options sound logical > to me as a good path forward, particularly using celery to spread out > validations and doing the back end loads at night (or using some sort of > fast dump/load).  The thing that bothers me about the celery solution is > that most of the time, 2 users validating different data will not block, > so I would be making users wait for no reason.  Ideally, I could > anticipate the block and only at that point, separate those validations. Aah, time travel. For fast dump/load on validated data see: https://www.postgresql.org/docs/current/sql-copy.html Though note in Postgres 16- COPY is all or nothing, so if there is an error nothing will be loaded. With version 17 you get ON_ERROR and LOG_VERBOSITY. One way to deal with is to load to a staging table and do your validation there and then move the data to the final table. As to retry that depends on where you want to do it. For subtransactions (SAVEPOINT) see: https://www.postgresql.org/docs/current/sql-savepoint.html https://www.cybertec-postgresql.com/en/subtransactions-and-performance-in-postgresql/ In Python there is try/except. > > This brings up a question though about a possibility I suspect is not > practical.  My initial read of the isolation levels documentation found > this section really promising: > > > The Repeatable Read isolation level only sees data committed before > the transaction began; it never sees either uncommitted data or changes > committed during transaction execution by concurrent transactions. > > This was before I realized that the actions of the previously started > transaction would include "locks" that would block validation even > though the load transaction hasn't committed yet: > > > a target row might have already been updated (or deleted or *locked*) > by another concurrent transaction by the time it is found. In this case, > the repeatable read transaction will wait for the first updating > transaction to commit or roll back > > Other documentation I read referred to the state of the DB (when a > transaction starts) as a "snapshot" and I thought... what if I could > save such a snapshot automatically just *before* a back-end load starts, > and use that snapshot for validation, such that my validation processes > could use that to validate against and not encounter any locks?  The > validation will never commit, so there's no risk. Hmm. I don't think so. > > I know Django's ORM wouldn't support that, but I kind of hoped that > someone in this email list might suggest a snapshot functionality as a > possible solution.  Since the validations never commit, the only > downside would be if the backend load changed something that introduces > a problem with the validated data that would not be fixed until we > actually attempt to load it. > > Is that too science-fictiony of an idea? -- Adrian Klaver adrian.klaver@aklaver.com