public inbox for [email protected]  
help / color / mirror / Atom feed
From: Juho Saarikko <[email protected]>
To: [email protected]
Subject: Re: BUG #3965: UNIQUE constraint fails on long column values
Date: Tue, 19 Feb 2008 01:21:11 +0200
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

Tom Lane wrote:
> Bruce Momjian <[email protected]> writes:
>   
>> Juho Saarikko wrote:
>>     
>>> While I didn't test, I'd imagine that this would also mean that any attempt
>>> to insert such values to an already unique column would fail.
>>>       
>
>   
>> Works here in 8.3:
>>     
> 	
>   
>> 	test=> create table test (x text unique);
>> 	NOTICE:  CREATE TABLE / UNIQUE will create implicit index "test_x_key" for table "test"
>> 	CREATE TABLE
>> 	test=> insert into test values (repeat('a', 50000));
>> 	INSERT 0 1
>>     
>
> That test only works because it's eminently compressible.
>
>
> The short answer to this bug report is that we're not very concerned
> about fixing this because there is seldom a good reason to have an
> index (unique or not) on fields that can get so wide.  As was already
> noted, if you do need a uniqueness check you can easily make a 99.9999%
> solution by indexing the md5 hash (or some similar digest) of the
> column.  It doesn't really seem worthwhile to expend development work
> on something that would benefit so few people.
>
> 			regards, tom lane
>
>   
But the documentation needs to be updated to mention this nonetheless. 
It is a nasty surprise if it hits unawares.

Besides, it's not such an impossible scenario. I encountered this bug 
when making an Usenet image archival system. Since the same images tend 
to be reposted a lot, it makes sense to store them only once, and simply 
reference the stored image from each context it was posted in. Currently 
my program does the uniqueness constraining by itself; I was examining 
having the database enforce it when I ran into this issue.

Such applications are not exactly rare: bayimg, img.google.com, etc. and 
of course the innumerable Usenet archival sites could all conceivably 
want to do something like this. So could any application which monitors 
potentially repeating phenomena, for that matter. After all, saving a 
single state of the system only once not only reduces the amount of data 
stored, but could also help in actual analysis of it, since it becomes 
trivial to recognize most and least often recurring states.



view thread (20+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: BUG #3965: UNIQUE constraint fails on long column values
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox