Re: How to perform a long running dry run transaction without blocking

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Robert Leach <[email protected]>
To: Adrian Klaver <[email protected]>
Cc: [email protected]
Subject: Re: How to perform a long running dry run transaction without blocking
Date: Fri, 7 Feb 2025 17:02:03 -0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>

>> Anyway, thanks so much for your help.  This discussion has been very useful, and I think I will proceed at first, exactly how you suggested, by queuing every validation job (using celery).  Then I will explore whether or not I can apply the "on timeout" strategy in a small patch.
>> Incidentally, during our Wednesday meeting this week, we actually opened our public instance to the world for the first time, in preparation for the upcoming publication.  This discussion is about the data submission interface, but that interface is actually disabled on the public-facing instance.  The other part of the codebase that I was primarily responsible for was the advanced search.  Everything else was primarily by other team members.  If you would like to check it out, let me know what you think: http://tracebase.princeton.edu <http://tracebase.princeton.edu;
> 
> I would have to hit the books again to understand all of what is going on here.

It's a mass spec tracing database.  Animals are infused with radio labeled compounds and mass spec is used to see what the animal's biochemistry turns those compounds into.  (My undergrad was biochem, so I've been resurrecting my biochem knowledge, as needed for this project.  I've been mostly doing RNA and DNA sequence analysis since undergrad, and most of that was prokaryotic.

> One quibble with the Download tab, there is no indication of the size of the datasets. I generally like to know what I am getting into before I start a download. Also, is there explicit throttling going on? I am seeing 10.2kb/sec, whereas from here https://www.nyc.gov/site/tlc/about/tlc-trip-record-data.page I downloaded a 47.65M file at 41.9MB/s

Thank you!  Not knowing the download size is exactly a complaint I had.  That download actually uses my advanced search interface (in browse mode).  There is the same issue with the download buttons on the advanced search.  With the streaming, we're not dealing with temp files, which is nice, at least for the advanced search, but we can't know the download size that way.  So I had wanted a progress bar to at least show progress (current record per total).  I could even estimate the size (an option I explored for a few days).  Eventually, I proposed a celery solution for that and I was overruled.

As for the download in the nav bar, we have an issue to change that to a listing of actual files broken down by study (3 files per study).  There's not much actual utility from a user perspective for downloading everything anyway.  We've just been focussed on other things.  In fact, we have a request from a user for that specific feature, done in a way that's compatible with curl/scp.  We just have to figure out how to not have to CAS authenticate each command, something I don't have experience with.

view thread (13+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: How to perform a long running dry run transaction without blocking
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox