Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tg7Vy-007sMU-Fb for pgsql-general@arkaria.postgresql.org; Thu, 06 Feb 2025 19:26:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tg7Vu-00Anfn-SA for pgsql-general@arkaria.postgresql.org; Thu, 06 Feb 2025 19:26:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tg7Vu-00AnfW-Cr for pgsql-general@lists.postgresql.org; Thu, 06 Feb 2025 19:26:14 +0000 Received: from mail-qk1-x729.google.com ([2607:f8b0:4864:20::729]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tg7Vr-0044jf-0P for pgsql-general@postgresql.org; Thu, 06 Feb 2025 19:26:13 +0000 Received: by mail-qk1-x729.google.com with SMTP id af79cd13be357-7b6eb531e13so66766585a.0 for ; Thu, 06 Feb 2025 11:26:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=princeton.edu; s=google; t=1738869970; x=1739474770; darn=postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=LFEYDlhwukc68y7gxoH3/y/nGGobYM+AYW3qx676a+E=; b=gci7h3uxlbdiF2H0vMuWCGCcy7sHnRbkPetVXPTpuhfL8K96iFX+id/97qT6XUEnyV vXre1C38ClHcHtRKSmsRAVl1IOeyoBzlwJr6N1H1OgSwX8dFIZ50dKmYcOugTXB5NHhu 0SJp9cu6i5wkpGK3lawznPBDrXe5KtKjw7D5dwYFR2AzSazjDZPwTue5UmFh+28w4TKy FKbk8hgtAtEh+ocBCkJlXV1QIjGnfMC0jtomGXv1Kii5IFDGtzjycB8EdSQIYleDOcTM 0qGXISt9OFBy+2zJ+m6RZ9ZOQTu6C6TR3Ajp1c6zWDhvEUMmjbUgwlaf8xSKcOrD0ZI7 vprw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738869970; x=1739474770; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=LFEYDlhwukc68y7gxoH3/y/nGGobYM+AYW3qx676a+E=; b=MKLqf6MLC3wD+AsaPIjOjZFfoGWSdA1kymTL24uV6G99vC+/1NyE0kPbswqSOVp5Eh mbq20Ik5wkWGqg9zH1TGlmMi4XZ38hE4Ru5Cj6MF23SVe85yHlO+Hkj6e5NSap8jD8XB 6oX7FQrRenhskaF3idkpl8InFhJ4fR7Hoa4DqMe7fA9U4f41qxYIK3SL/ybpcCC6PPV4 EqrmEywMOsnZsN02d3EhmjkPBaGxUR7xYKVA+GCsigbPbhTLgdd66fTJcZH2+KY3ooxn lglvY4K8hOJMR+PjnHNSw8cbH01NKo6KNcxWUSfV2eL3dyE4L1VFkRstVjUeAW044Y2V rktQ== X-Forwarded-Encrypted: i=1; AJvYcCVZC9mGWwPuwYl8GQ2cBW1RkNSd7jcqqeCYBbyyEeKkgNZPPNRACiicPsdYTo7bsv/ixhyeO4U5b92MoGrE@postgresql.org X-Gm-Message-State: AOJu0YzYPWIQUJZsLwwVenZi8coYj7+5BkKVnz+1rVDFmpi33PPJVNJ6 5SIJF4xdUmm8jdNU/BLgXDKqr9WTMj9USr7X2f85o0chcRn2EbfNPTxKK2bo9Bo= X-Gm-Gg: ASbGncs1wTLctnAxcWo0CmR2WNECMLHDjsE2CDB98Wb6cWR1m4uK0vR2aQVn1q4QOOf wvCy1NON6saSYBidyeVHl8n5A8KieSiuSPUrj8KBK9w5jpOsjpumxO7zt0RUFyVo5gEp66YVaEV pY7Gb16ziC+CqNm+dCcsigiiW/rPErvEeZwi3T+9ucFeCEj6mOs60y544vsZQPJnz8htWwRKQFm cPPgLzbhienCYODLceLEMfILGztUC6h+Ud52TJE6o8eJ3plH/+ybm8QgQzLUcLBGao3QbvVh4DY N++ciwWBLMN1VTHjYYbZIDXWocv+k00= X-Google-Smtp-Source: AGHT+IE9wRLPKT8YGqCKWfxN/oMAe1n1IJIdiVrfdX03lRiP6zm94VQvAhjsguueILsRGz7Z8Th6Jg== X-Received: by 2002:a05:620a:4307:b0:7b6:d4b9:b5a6 with SMTP id af79cd13be357-7c047ba5582mr47208485a.11.1738869969980; Thu, 06 Feb 2025 11:26:09 -0800 (PST) Received: from smtpclient.apple ([165.1.205.67]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7c041e9f9b0sm92566585a.82.2025.02.06.11.26.09 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Feb 2025 11:26:09 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3774.500.171.1.1\)) Subject: Re: How to perform a long running dry run transaction without blocking From: Robert Leach In-Reply-To: <798DA50B-7932-4CF5-A16F-9131881EDDF6@gmail.com> Date: Thu, 6 Feb 2025 14:25:57 -0500 Cc: Adrian Klaver , pgsql-general Content-Transfer-Encoding: quoted-printable Message-Id: References: <0FE9C709-A108-4ED5-8132-B802B8D9908F@princeton.edu> <798DA50B-7932-4CF5-A16F-9131881EDDF6@gmail.com> To: Rob Sargent X-Mailer: Apple Mail (2.3774.500.171.1.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > Have you considered a validation app? Have it read the inputs and look = in db for conflicts, rather than attempt the insert. Zero transactions = necessary I did consider that about a year or two ago when I first conceived the = data validation interface. Doing that now would solve the problem of = concurrent validations blocking one another, but selects would still get = blocked if their result is pending the outcome of a load running in the = background. If I mitigate that issue by running those loads over night = on a schedule, I still lose out on the benefits of having the loading = code do the validation for me... I would have to explicitly find and report on problems that the load = exceptions currently do for me, without any extra code. So what I'm saying is that the data being validated is inter-dependent. = There are about a dozen different loading scripts (one for each sheet of = the uploaded excel file) whose runs are orchestrated by a master load = script that ensures they are run in the right order so that the = interdependent data can be checked. For example, these are some = relative orders of what needs to be loaded so that data can be = validated: Study > Animals Tracers > Infusates > Animals Treatments > Animals Tissues > Samples The Animal load script would fail if the new data in (the tables) Study, = Tracers, Infusates, and Treatments aren't inserted, because it links to = those newly created records. And there's no way to detect problems in = those new relationships in the unchanged database if they aren't = inserted. That's what doing this all in a transaction, and actually = doing the inserts (for which I use Django `get_or_create` method calls) = provides. In other words, I would have to save and explicitly check the = inter-related sheet data in data structures independent of the database = in order to find the equivalent of (for example) `ObjectDoesNotExist` = errors that originate from the database. Right now, I get those errors = caught "for free". All I have to do is tell the user what = sheet/row/column is related to that error. And it saves me the overhead = of having to maintain synchronicity between separate validation code and = loading code when the loading code changes. Robert William Leach Research Software Engineer 133 Carl C. Icahn Lab Lewis-Sigler Institute for Integrative Genomics Princeton University Princeton, NJ 08544