Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgAVR-008GY9-7N for pgsql-general@arkaria.postgresql.org; Thu, 06 Feb 2025 22:37:57 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tgAVQ-00Cofm-7D for pgsql-general@arkaria.postgresql.org; Thu, 06 Feb 2025 22:37:56 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgAVP-00CofW-TH for pgsql-general@lists.postgresql.org; Thu, 06 Feb 2025 22:37:55 +0000 Received: from fhigh-a2-smtp.messagingengine.com ([103.168.172.153]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tgAVN-003cwo-2X for pgsql-general@postgresql.org; Thu, 06 Feb 2025 22:37:55 +0000 Received: from phl-compute-12.internal (phl-compute-12.phl.internal [10.202.2.52]) by mailfhigh.phl.internal (Postfix) with ESMTP id 5DFDB1140106; Thu, 6 Feb 2025 17:37:53 -0500 (EST) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-12.internal (MEProxy); Thu, 06 Feb 2025 17:37:53 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1738881473; x=1738967873; bh=L97NKbU6/pHL5YJQMm62OHGtLflJlsJgJj0VoJWtWMc=; b= fRhLPnKWuOJ/JKSiGPKTg0NmLxD2xeCwkSCGaexE+UFvKuXRpTbinPrhnfDFkouG 7NztfJiZrFLiTv7X1/fLhOVXNWxFlkkFVLUw0ZWjntD7VCvWdtb8U2HqMUW9MEoJ 3sxkT/HkB/ozzfqKj6p9jDFS96jNcJt1vpUIkdLAxQxvg19MuUkS2YmX1qCU9vfZ TLz3RYMO30QUe7g6BEPK7TqRDYT4m46aswaKDphmJyad62WxZko/I0N/89EDOYlY rpTrOpkwuccseD/MXbcHK8YrHtxBaw6nAwHYvMhnhoU+FSSkGGWBmKEhUKTLmzDG sy5b+xjucNb1eblILVyTBQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1738881473; x= 1738967873; bh=L97NKbU6/pHL5YJQMm62OHGtLflJlsJgJj0VoJWtWMc=; b=E 2z6+OqWZsgQ1eYdsYVpN6HVnhTtn6FlAF9cz1rKh3xZGYI409UfSfJC4Zqyhy+/x 3uIQIAplgLah4ZgoiYTZ7eKv9lQzkrV3Gz3ZpBluXwGoZjfe4GIRUGUVQAlxcsYA M+iObXQZYUAidmcGf27zU8H7YdndIrOwh4m63wEFJn2/GXyZafwHKHC80OWYJ0aR glfLGWgT+MDDgNuwvQil1JHiOL9AxikKQDEsJlOuHTEY2llLcadktmHCjoBFezIJ 1Chb8o431cp9Y0DcBL8yalGhB4jt3fwaYyJDXfoHFrxqsx1/IdirzTonFJIGrV2v eVNj6IpaEmv3aR+K9eixA== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefvddrtddtgddvjeeiudcutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpggftfghnshhusghstghrihgsvgdp uffrtefokffrpgfnqfghnecuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivg hnthhsucdlqddutddtmdenucfjughrpefkffggfgfuvfevfhfhjggtgfesthejredttddv jeenucfhrhhomheptegurhhirghnucfmlhgrvhgvrhcuoegrughrihgrnhdrkhhlrghvvg hrsegrkhhlrghvvghrrdgtohhmqeenucggtffrrghtthgvrhhnpefhveeihfevlefffedt hedukeevfefhfedtgeeifeevteffgfefhedtffetudettdenucevlhhushhtvghrufhiii gvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpegrughrihgrnhdrkhhlrghvvghrsegr khhlrghvvghrrdgtohhmpdhnsggprhgtphhtthhopeefpdhmohguvgepshhmthhpohhuth dprhgtphhtthhopehrlhgvrggthhesphhrihhntggvthhonhdrvgguuhdprhgtphhtthho pehrohgsjhhsrghrghgvnhhtsehgmhgrihhlrdgtohhmpdhrtghpthhtohepphhgshhqlh dqghgvnhgvrhgrlhesphhoshhtghhrvghsqhhlrdhorhhg X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 6 Feb 2025 17:37:52 -0500 (EST) Message-ID: Date: Thu, 6 Feb 2025 14:37:51 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: How to perform a long running dry run transaction without blocking To: Robert Leach , Rob Sargent Cc: pgsql-general References: <0FE9C709-A108-4ED5-8132-B802B8D9908F@princeton.edu> <798DA50B-7932-4CF5-A16F-9131881EDDF6@gmail.com> Content-Language: en-US From: Adrian Klaver In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 2/6/25 11:25, Robert Leach wrote: >> Have you considered a validation app? Have it read the inputs and look in db for conflicts, rather than attempt the insert. Zero transactions necessary > > > I did consider that about a year or two ago when I first conceived the data validation interface. Doing that now would solve the problem of concurrent validations blocking one another, but selects would still get blocked if their result is pending the outcome of a load running in the background. If I mitigate that issue by running those loads over night on a schedule, I still lose out on the benefits of having the loading code do the validation for me... > > I would have to explicitly find and report on problems that the load exceptions currently do for me, without any extra code. > > So what I'm saying is that the data being validated is inter-dependent. There are about a dozen different loading scripts (one for each sheet of the uploaded excel file) whose runs are orchestrated by a master load script that ensures they are run in the right order so that the interdependent data can be checked. For example, these are some relative orders of what needs to be loaded so that data can be validated: > > Study > Animals > Tracers > Infusates > Animals > Treatments > Animals > Tissues > Samples > > The Animal load script would fail if the new data in (the tables) Study, Tracers, Infusates, and Treatments aren't inserted, because it links to those newly created records. And there's no way to detect problems in those new relationships in the unchanged database if they aren't inserted. That's what doing this all in a transaction, and actually doing the inserts (for which I use Django `get_or_create` method calls) provides. > > In other words, I would have to save and explicitly check the inter-related sheet data in data structures independent of the database in order to find the equivalent of (for example) `ObjectDoesNotExist` errors that originate from the database. Right now, I get those errors caught "for free". All I have to do is tell the user what sheet/row/column is related to that error. And it saves me the overhead of having to maintain synchronicity between separate validation code and loading code when the loading code changes. Seems to me this could be dealt with using a schema named validate that contains 'shadow' tables of those in the live schema(s). Import into their and see what fails. > > Robert William Leach > Research Software Engineer > 133 Carl C. Icahn Lab > Lewis-Sigler Institute for Integrative Genomics > Princeton University > Princeton, NJ 08544 > -- Adrian Klaver adrian.klaver@aklaver.com