public inbox for [email protected]  
help / color / mirror / Atom feed
From: Moreno Andreo <[email protected]>
To: [email protected]
Subject: Re: Architectural question
Date: Wed, 23 Mar 2016 19:06:20 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
	<[email protected]>
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

Il 23/03/2016 13:29, Mike Sofen ha scritto:
>> -----Original Message-----
>> Thomas Kellerer Wednesday, March 23, 2016 2:51 AM
>>
>> Jim Nasby schrieb am 11.03.2016 um 17:37:
>>> If the blob is in the database then you have nothing extra to do. It's handled
>> just like all your other data.
>>> If it's a file in a file system then you need to:
>>>
>>> - Have application code that knows how and where to get at the file
>>> - Have a way to make those files available on all your webservers
>>> - Have completely separate backup and recovery plans for those files
>>>
>>> That's a lot of extra work. Sometimes it's necessary, but many times it's not.
>> Don't forget the code you need to write to properly handle transactional access
>> (writing, deleting) to the files
>>
>> You usually also need to distribute the files over many directories.
>> Having millions of files in a single directory is usually not such a good idea.
>>
>> In my experience you also need some cleanup job that removes orphaned files
>> from the file system.
>> Because no matter how hard you try, to get updates/writes to the file system
>> right, at some point this fails.
>>
>> Also from a security point of view having this in the database is more robust
>> then in the file system.
>>
>> The downside of bytea is that you can't stream them to the client. The
>> application always needs to read the whole blob into memory before it can be
>> used. This might put some memory pressure on the application server.
>>
>> Thomas
> This is really an excellent conversation, and highlights the never-ending contemplation
> of blob storage.
That seems like discussing about politics or religion :-)
> I've had to go through this dialog in two different industries - healthcare
> and now genomics, creating a new EMR (electronic medical record) system and storing
> and manipulating huge genomic data sets.
>
> I have, in both cases, ended up leaving the blob-type data outside of the database.  Even
> though, as Thomas mentioned, it requires more database and app code to manage, it
> ends up allowing for both systems to be optimized for their respective duties.
Our approach, still mantaining BLOBs in databases, is quite an hybrid, 
because BLOBs are not spread among DB tables, but we have a dedicated 
table, with an appropriate indexing, where 95% of our blobs (and 99% of 
blob storage) reside, so if we need to have a quick dump, we can exclude 
BLOBs table or treat it in a separate way (i.e. backup util in our app 
is made of two separate steps, clinical data and blobs).

As I wrote in a previous post, we have our blobs encrypted, so it's more 
handy keeping them in DB rather than saving to a file (and, I think, 
quicker when the user request for any of these)
> In addition, the vastly smaller database sizes result in far faster backups and restores,
> transactional replication maintains it's speed, and in general, I find the fault tolerant
> behaviors to be excellent.
>
> Yes, losing track of a file would be very bad, and...we're only storing things like xray photos
> or ct scans (healthcare), or genomic processing results.  In both cases, usually, the results
> can be recreated.  That said, I've never lost a file so haven't needed to pull on that lever.
In our case we have to assume that blob contents cannot be recreated. 
Patients can change family doctor... if a trial arise and a critical 
document is lost, he's on his own. That's why we have a daily-based 
automatic backup policy on the customer local server.
> My latest model is placing large genomic data onto the AWS S3 file system, keeping all of
> the metadata inside the database.  It's working very well so far, but we're still in development.
>
> Mike
>
>
>




-- 
Sent via pgsql-performance mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: Architectural question
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox