Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgSxk-00AvhF-Lx for pgsql-general@arkaria.postgresql.org; Fri, 07 Feb 2025 18:20:25 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tgSxj-004D7T-Fs for pgsql-general@arkaria.postgresql.org; Fri, 07 Feb 2025 18:20:23 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tgSxi-004D7L-OC for pgsql-general@lists.postgresql.org; Fri, 07 Feb 2025 18:20:23 +0000 Received: from mail-qt1-x82b.google.com ([2607:f8b0:4864:20::82b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tgSxf-003oMh-2J for pgsql-general@postgresql.org; Fri, 07 Feb 2025 18:20:21 +0000 Received: by mail-qt1-x82b.google.com with SMTP id d75a77b69052e-46785fbb949so27153291cf.3 for ; Fri, 07 Feb 2025 10:20:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=princeton.edu; s=google; t=1738952419; x=1739557219; darn=postgresql.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=3RqdWz+WlCFWy9R9LQG7njGg8BzeV0eMDZ5ajPG1mSE=; b=NdKoSm4VCm4BJg2atv52I25/R0vhxQcCV4bMZSepYL7L8NHKvq+QsJto/zRDbk7xDB Vyo7qxIWbb7OlzuQzSQzg9vHut/n7frvNtNrASG/sI0SOYCpqXNjlcX5LZwwVzm2nRjV naYAMahOpTv3ZYQ54GQbO6570IDsC3opB3XBr7sczE0283ABHIVhDnJcB/v0rkd1NfNd yt/eVO81XiwawddsNhO15NLrFZp9jnUVNgp+om0hwLb5LzYu57KTgXGRmIRd4dpgGvOc DU7A3xmjrs6uLmy2houXhcS6tNOOL2G5GbmkuE1mqJ8VRLVxyb2mhw9vtgFLIwpPnfUo DXww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738952419; x=1739557219; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3RqdWz+WlCFWy9R9LQG7njGg8BzeV0eMDZ5ajPG1mSE=; b=mwb2x5g94OoHE5o3tiYUi9nmcXYuQal653UG9OGmmIULmGGy3xsH+ScDTN1cM9JJ85 7pdiSMZYQ4x/Py0sp1OX5RL88LJY3kp2QZaP2/rJROK4p/aKxAc8la3sxLZbpk14S6vt 2jIUfr0FkYrFbZXLkw/WTSPC2xFHaozM23n/7AJEKuzhi6ggcdJUuo3Y8tMU+Yj+uaqd f7QoyJbyEz7MqJH93HNWikmLI31mJrs59j3DXhCnCbiDEjrMFyFM2A+nG9oK0m9U3tad OpO8s7TcYJw5Qpjjnvu8L0Nh3uIFdeenQoQ7mKbZQ5C1BTK784AsP4tT+9ML31qkS28a 5QAQ== X-Gm-Message-State: AOJu0YzGoHYI3wdgnX5HZ2Ce6r9aE2esNTz8bjeM0YjFxMaQatCscKgo /mWqS8qywEpxIiUMd6RTRJwRLBL6C+QvLNEWWlQ+zYdMGcpK9FqPNl3wcz4rhhk= X-Gm-Gg: ASbGncu7uqamxOqjbYw5/sLWDJITFcPeyaCz+2lPeOTSjyajmuen0j/GW2nkg9jTJrl qcaUEYLqyOyC2YSFitxOKOW5Ptw5fjVeQUqszTF+ezY3njltDKOb5maW/YGgucZd5gU1Asjoihe IHMim6FTHHeX1s27dyG4w78YHE6ruiNvZTY4f6jfQONhdBwhigwEbgXFO39EjfRTDHKYMTa3fDs ykLlv+szPgwlp5Tw5kJh7F6g4mo9eFZr/mfaHnshwR+HoEtTu1A5OkpGZZnKFqo9atzOtbaKkTh 9Ds3MU50AQGnfVQxHvMcYR+Il8V6z9uP00RNIFbZEEZT3cPOkZTUFbq6Gg== X-Google-Smtp-Source: AGHT+IGEcEgHolv4XiioIhnvEXSP4gfU6Eh21ILzgCZRqQGc8l4D9UiV07q9H20KRCKIomXjZg1rqQ== X-Received: by 2002:a05:622a:4890:b0:467:7401:ce95 with SMTP id d75a77b69052e-471679eeff9mr63418141cf.21.1738952418596; Fri, 07 Feb 2025 10:20:18 -0800 (PST) Received: from smtpclient.apple (gen-rl-imac.princeton.edu. [128.112.117.2]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4714928ebc7sm19503511cf.20.2025.02.07.10.20.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 07 Feb 2025 10:20:18 -0800 (PST) From: Robert Leach Message-Id: <67D44BCB-71FA-4EF6-8C41-53F852683CFA@princeton.edu> Content-Type: multipart/alternative; boundary="Apple-Mail=_8743ED4D-162F-4E0F-A95F-C73CE4BF27C0" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.1\)) Subject: Re: How to perform a long running dry run transaction without blocking Date: Fri, 7 Feb 2025 13:20:16 -0500 In-Reply-To: <5b7b7573-3b75-4061-a6fe-1387d1e0e51c@aklaver.com> Cc: pgsql-general@postgresql.org To: Adrian Klaver References: <88d60ace-45e6-4d41-afc4-113df7219c4d@aklaver.com> <4000D0EE-B250-4E9E-831F-00C034D6D0B5@princeton.edu> <6d833658-f461-4ad4-a3e1-86d3c515bc18@aklaver.com> <0FE9C709-A108-4ED5-8132-B802B8D9908F@princeton.edu> <00DF52D1-5ED0-4599-8480-8C671434EE4E@princeton.edu> <90368E5C-6F0C-45AE-B73A-5A467EC97043@princeton.edu> <5b7b7573-3b75-4061-a6fe-1387d1e0e51c@aklaver.com> X-Mailer: Apple Mail (2.3696.120.41.1.1) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_8743ED4D-162F-4E0F-A95F-C73CE4BF27C0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii >> I'd run across a stackexchange answer = that suggested setting a = statement timeout (specific to postgres) in a django view. If I did = that for the validation view, it wouldn't necessarily 100% correspond to = a blocking issue, but since the overall goal is the prevent a gateway = timeout, I'm fine with that. >=20 > When you do a SET statement_timeout is session/connection specific. >=20 > For what that means in Django see: >=20 > https://docs.djangoproject.com/en/5.1/ref/databases/ >=20 > Personally I think this will introduce more complexity then it's = worth. >=20 > Further down the page at above link it shows how you can use a = psycopg(2) pool, but that is another layer added to an already complex = setup. TBF, adding a celery strategy to the codebase to queue jobs is already = comparatively very complex to start out. It's a pretty heavy handed = solution to what seems to be a relatively rare occurrence (though rather = recurrent when it does happen). By that metric, I would think that = queuing jobs at all is too complex to be worth it. So I'm not sure that = catching a timeout exception to queue a job instead of queuing the job = at the outset is that much more complex. I can just create a view that = inherits from the current view (to which the statement timeout is = applied) which is *not* subject to the statement timeout when I want to = queue a longer running validation due to the block. I think I should be = able to redirect the request to that derived class view. Not sure = exactly how ATM, but it shouldn't be more than 1 line of code/statement, = I would think. Regarding statement timeouts in general, I found a pretty great article = that suggests setting site-wide statement timeouts as a = general rule, which I think makes a lot of sense. In fact, some of the = pages on our site (that I didn't write) are not going to scale well as = the DB grows. I already feel like the samples page is way too slow. = They should probably all gracefully handle timeouts so that users don't = encounter raw 500 errors. So whether or not I decide to "queue on = statement timeout", I think using statement timeouts is worthwhile. I take your point though. How many more lines of code do you think = would make the strategy worthwhile? I'd be willing to bet I could keep = it under 20 compared to straight up queuing every validation. In fact, = I'm leaning toward implementing your suggestion of queuing everything as = a "phase 1" of this endeavor, and then implement my proposed "queue on = timeout" idea as phase 2 (perhaps as a configurable option - at least = until I'm satisfied it works well). >> I don't know if that stack exchange suggestion does what it claims to = do or not. I haven't tried it. But they key claim of interest here is = that the timeout is specific to the view. I.e. any other views or = background processes wouldn't be subject to the timeout. I'm suspicious = as to whether this is actually true. My novice understanding of = Django's cursors is that it maintains a pool of connections, and I'm not=20= >=20 > Actually cursors are attributes of connections. Yeah, thus my suspicion that it may not apply to only the view in the = example. I suspected their "proof" that it didn't affect other views = was due to randomly getting a connection to which that statement timeout = had not yet been applied. *However*, I just read this article = which describes Django's connection management and I = realized that Django's default behavior (which is what we use) is to = create a new connection for every request, so I think that, with the = caveat that it wouldn't work with persistent connections, the stack = exchange claim is actually correct. I would just have to figure out how best to apply the timeout to = everything except those queued validations. And I think maybe a context = manager might be the simplest way to do it. Anyway, thanks so much for your help. This discussion has been very = useful, and I think I will proceed at first, exactly how you suggested, = by queuing every validation job (using celery). Then I will explore = whether or not I can apply the "on timeout" strategy in a small patch. Incidentally, during our Wednesday meeting this week, we actually opened = our public instance to the world for the first time, in preparation for = the upcoming publication. This discussion is about the data submission = interface, but that interface is actually disabled on the public-facing = instance. The other part of the codebase that I was primarily = responsible for was the advanced search. Everything else was primarily = by other team members. If you would like to check it out, let me know = what you think: http://tracebase.princeton.edu = Cheers, Rob Robert William Leach Research Software Engineer 133 Carl C. Icahn Lab Lewis-Sigler Institute for Integrative Genomics Princeton University Princeton, NJ 08544 --Apple-Mail=_8743ED4D-162F-4E0F-A95F-C73CE4BF27C0 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii
I'd run across a = stackexchange answer <https://stackoverflow.com/a/75037925/2057516> that = suggested setting a statement timeout (specific to postgres) in a = django view.  If I did that for the validation view, it wouldn't = necessarily 100% correspond to a blocking issue, but since the overall = goal is the prevent a gateway timeout, I'm fine with that.

When you do a SET = statement_timeout is session/connection specific.

For what that means in Django see:

https://docs.djangoproject.com/en/5.1/ref/databases/

Personally I think this will introduce more = complexity then it's worth.

Further down = the page at above link it shows how you can use a psycopg(2) pool, but = that is another layer added to an already complex setup.

TBF, = adding a celery strategy to the codebase to queue jobs is already = comparatively very complex to start out.  It's a pretty heavy = handed solution to what seems to be a relatively rare occurrence (though = rather recurrent when it does happen).  By that metric, I would = think that queuing jobs at all is too complex to be worth it.  So = I'm not sure that catching a timeout exception to queue a job instead of = queuing the job at the outset is that much more complex.  I can = just create a view that inherits from the current view (to which the = statement timeout is applied) which is *not* subject to the statement = timeout when I want to queue a longer running validation due to the = block.  I think I should be able to redirect the request to that = derived class view.  Not sure exactly how ATM, but it shouldn't be = more than 1 line of code/statement, I would think.

Regarding statement timeouts in general, I found a = pretty great article that suggests setting = site-wide statement timeouts as a general rule, which I think makes a = lot of sense.  In fact, some of the pages on our site (that I = didn't write) are not going to scale well as the DB grows.  I = already feel like the samples page is way too slow.  They should = probably all gracefully handle timeouts so that users don't encounter = raw 500 errors.  So whether or not I decide to "queue on statement = timeout", I think using statement timeouts is worthwhile.

I take your point though.  How many more = lines of code do you think would make the strategy worthwhile?  I'd = be willing to bet I could keep it under 20 compared to straight up = queuing every validation.  In fact, I'm leaning toward implementing = your suggestion of queuing everything as a "phase 1" of this endeavor, = and then implement my proposed "queue on timeout" idea as phase 2 = (perhaps as a configurable option - at least until I'm satisfied it = works well).

I don't = know if that stack exchange suggestion does what it claims to do or not. =  I haven't tried it.  But they key claim of interest here is = that the timeout is specific to the view.  I.e. any other views or = background processes wouldn't be subject to the timeout.  I'm = suspicious as to whether this is actually true.  My novice = understanding of Django's cursors is that it maintains a pool of = connections, and I'm not

Actually cursors are attributes of connections.

Yeah, thus my suspicion that it may not apply to only the = view in the example.  I suspected their "proof" that it didn't = affect other views was due to randomly getting a connection to which = that statement timeout had not yet been applied.  *However*, I just = read this article which describes = Django's connection management and I realized that Django's default = behavior (which is what we use) is to create a new connection for every = request, so I think that, with the caveat that it wouldn't work with = persistent connections, the stack exchange claim is actually = correct.

I = would just have to figure out how best to apply the timeout to = everything except those queued validations.  And I think maybe a = context manager might be the simplest way to do it.

Anyway, thanks so much = for your help.  This discussion has been very useful, and I think I = will proceed at first, exactly how you suggested, by queuing every = validation job (using celery).  Then I will explore whether or not = I can apply the "on timeout" strategy in a small patch.

Incidentally, during our = Wednesday meeting this week, we actually opened our public instance to = the world for the first time, in preparation for the upcoming = publication.  This discussion is about the data submission = interface, but that interface is actually disabled on the public-facing = instance.  The other part of the codebase that I was primarily = responsible for was the advanced search.  Everything else was = primarily by other team members.  If you would like to check it = out, let me know what you think: http://tracebase.princeton.edu

Cheers,
Rob


Robert = William Leach
Research Software Engineer
133 = Carl C. Icahn Lab
Lewis-Sigler Institute for Integrative = Genomics
Princeton University
Princeton, NJ = 08544

= --Apple-Mail=_8743ED4D-162F-4E0F-A95F-C73CE4BF27C0--