Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgRix-00AkOW-9B for pgsql-general@arkaria.postgresql.org; Fri, 07 Feb 2025 17:01:03 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tgRiw-003Iz4-Bo for pgsql-general@arkaria.postgresql.org; Fri, 07 Feb 2025 17:01:02 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgRiv-003Iyw-LJ for pgsql-general@lists.postgresql.org; Fri, 07 Feb 2025 17:01:02 +0000 Received: from fout-b6-smtp.messagingengine.com ([202.12.124.149]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tgRio-003nec-2W for pgsql-general@postgresql.org; Fri, 07 Feb 2025 17:01:00 +0000 Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfout.stl.internal (Postfix) with ESMTP id BF04511401B6; Fri, 7 Feb 2025 12:00:53 -0500 (EST) Received: from phl-mailfrontend-01 ([10.202.2.162]) by phl-compute-04.internal (MEProxy); Fri, 07 Feb 2025 12:00:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1738947653; x=1739034053; bh=Tqwa0+B4CKGNlt4miBd8+YUlwTf440fTarct4u+IqXg=; b= eR9afq0D1jdJ0DPxyyaYrwZutelgpelJfLTcn3jChAy70n3fgJaUKiayBz7xPwfw NbjLL18cQLCQzb4+U4Xepf/mLU2Gd5vNPDieiHRkHN4Rye/fOsGllu/DbHT6q+qG 6OPXrYoHygw5XIe65eZ/fDPOHvAOl3etImwONYav++q77Si6cxK2qD9GxDGpqobw tY4SOSTzJmxeFcXMZGVsd1x++9rkNljerWjA65PbHjf38PnDsQvZzXatMJXIj4bh U9v5yyhX96TabPSs1QP62ornd0ATHeclTlVibhJO5zMzk2814yxL6W3fZyOOiKlI ZWc/qCdD48zbkf/tOWPXIg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1738947653; x= 1739034053; bh=Tqwa0+B4CKGNlt4miBd8+YUlwTf440fTarct4u+IqXg=; b=z XYOiA3jtqVs5KRwZeKep3aTZewcTpU2T/uIwpw4DTblMWeTzkNnnXpoTo7gTgZ0Y 7di1CtCJxqvpyf/c8MsRT4vtyYTr66Ot1SHg+EPn6+Y+SsAHGnS3bgKmYROwRsiU si2tnahxss3Bst7L7/fkENFgdDEnkzYWaS82XbJ1Zv4wCiGjBYi16tmywjYn1Kl4 TSpGYjFfJZ66A6fvVSpQfhcgmCPrKZb+wp7aE/DV4XHCvRf9uZjYOitN73J5WtWR TIk/4M6Psb1E9v4gdc0PPFEsDw974HuAGCH4AsQL4A3JcfbAIbh32ZKJXYiV2JfG pB8w5X1PcbFegUGQ7popw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvleekvdcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivg hnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfhfvefhjggtgfesthejredttddv jeenucfhrhhomheptegurhhirghnucfmlhgrvhgvrhcuoegrughrihgrnhdrkhhlrghvvg hrsegrkhhlrghvvghrrdgtohhmqeenucggtffrrghtthgvrhhnpefgieevffffheefheei hfekvdeuheefleefjeefgeffudeuiefhvdeifefhvedvgfenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrughrihgrnhdrkhhlrghvvghrsegr khhlrghvvghrrdgtohhmpdhnsggprhgtphhtthhopedvpdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehrlhgvrggthhesphhrihhntggvthhonhdrvgguuhdprhgtphhtthho pehpghhsqhhlqdhgvghnvghrrghlsehpohhsthhgrhgvshhqlhdrohhrgh X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 7 Feb 2025 12:00:52 -0500 (EST) Message-ID: <047f05ca-1519-410b-abe6-3773b6533b52@aklaver.com> Date: Fri, 7 Feb 2025 09:00:52 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: How to perform a long running dry run transaction without blocking To: Robert Leach References: <88d60ace-45e6-4d41-afc4-113df7219c4d@aklaver.com> <4000D0EE-B250-4E9E-831F-00C034D6D0B5@princeton.edu> <6d833658-f461-4ad4-a3e1-86d3c515bc18@aklaver.com> <0FE9C709-A108-4ED5-8132-B802B8D9908F@princeton.edu> <00DF52D1-5ED0-4599-8480-8C671434EE4E@princeton.edu> <90368E5C-6F0C-45AE-B73A-5A467EC97043@princeton.edu> Content-Language: en-US Cc: pgsql-general From: Adrian Klaver In-Reply-To: <90368E5C-6F0C-45AE-B73A-5A467EC97043@princeton.edu> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2/7/25 05:49, Robert Leach wrote: Ccing list >>>> Alright I am trying to reconcile this with from below, 'The largest studies take just under a minute'. >>> The context of the 'The largest studies take just under a minute' statement is that it's not loading the hefty/time-consuming raw data. It's only validating the metadata. That's fast (5-60s). And that data is a portion of the transaction in the back-end load. There are errors that validation can miss that are due to not touching the raw data, and in fact, those errors are addressed by curators editing the excel sheets. That's why it's all in the load transaction instead of >> >> As a scientist that makes me start to twitch. >> >> Is there an audit trail for that? > > We have a well defined curation process. Original files are checked into a private data repo. There is a CHANGES.md files that details any/every change a curator makes, and the curator coordinates these changes with the researcher. > >> Aah, time travel. > > Sorry, "anticipate" wasn't the best word choice. What I was thinking was more along the lines of "detect" when a block happens and at that point, either stop and queue up the validation or provide a status/progress that shows it's waiting on validations in line before it. Probably not possible. It's just what I'm imagining would be the most efficient strategy with the least wait time. > >>> Other documentation I read referred to the state of the DB (when a transaction starts) as a "snapshot" and I thought... what if I could save such a snapshot automatically just *before* a back-end load starts, and use that snapshot for validation, such that my validation processes could use that to validate against and not encounter any locks? The validation will never commit, so there's no risk. >> >> Hmm. I don't think so. > > From a separate thread, which seems analogous...: > >> Seems to me this could be dealt with using a schema named validate that contains 'shadow' tables of those in the live schema(s). Import into their and see what fails. > > What is a "shadow table"? Is that a technical thing? Could these shadow tables be in the same database? (Trying to wrap my head around what this implies is possible.) No a concept of my own making. Basically create copies of the tables you are loading into now in a separate schema and load into them instead. Then do the validation against those tables, that would reduce the contention with the existing data. It is something I have done to validate data from another data source before loading into production tables. A possible issue to deal with is how much, if at all, you depend on other tables for the validation process. -- Adrian Klaver adrian.klaver@aklaver.com