Received: from localhost (unknown [200.46.204.184]) by postgresql.org (Postfix) with ESMTP id DD54C2E0047 for ; Mon, 18 Feb 2008 19:21:19 -0400 (AST) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024) with ESMTP id 80380-07 for ; Mon, 18 Feb 2008 19:21:11 -0400 (AST) Received: from emh07.mail.saunalahti.fi (emh07.mail.saunalahti.fi [62.142.5.117]) by postgresql.org (Postfix) with ESMTP id AD78C2E0042 for ; Mon, 18 Feb 2008 19:21:16 -0400 (AST) Received: from saunalahti-vams (vs3-10.mail.saunalahti.fi [62.142.5.94]) by emh07-2.mail.saunalahti.fi (Postfix) with SMTP id 2F27718DA65 for ; Tue, 19 Feb 2008 01:21:14 +0200 (EET) Received: from emh06.mail.saunalahti.fi ([62.142.5.116]) by vs3-10.mail.saunalahti.fi ([62.142.5.94]) with SMTP (gateway) id A055052B52B; Tue, 19 Feb 2008 01:21:14 +0200 Received: from a88-115-47-232.elisa-laajakaista.fi (a88-115-47-232.elisa-laajakaista.fi [88.115.47.232]) by emh06.mail.saunalahti.fi (Postfix) with ESMTP id 029F1E51B3 for ; Tue, 19 Feb 2008 01:21:12 +0200 (EET) Message-ID: <47BA12E7.4020506@mbnet.fi> Date: Tue, 19 Feb 2008 01:21:11 +0200 From: Juho Saarikko User-Agent: Thunderbird 2.0.0.9 (X11/20071031) MIME-Version: 1.0 To: pgsql-bugs@postgresql.org Subject: Re: BUG #3965: UNIQUE constraint fails on long column values References: <200802182212.m1IMCSD14587@momjian.us> <10241.1203373546@sss.pgh.pa.us> In-Reply-To: <10241.1203373546@sss.pgh.pa.us> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Antivirus: VAMS X-Virus-Scanned: Maia Mailguard 1.0.1 X-Archive-Number: 200802/170 X-Sequence-Number: 19737 Tom Lane wrote: > Bruce Momjian writes: > >> Juho Saarikko wrote: >> >>> While I didn't test, I'd imagine that this would also mean that any attempt >>> to insert such values to an already unique column would fail. >>> > > >> Works here in 8.3: >> > > >> test=> create table test (x text unique); >> NOTICE: CREATE TABLE / UNIQUE will create implicit index "test_x_key" for table "test" >> CREATE TABLE >> test=> insert into test values (repeat('a', 50000)); >> INSERT 0 1 >> > > That test only works because it's eminently compressible. > > > The short answer to this bug report is that we're not very concerned > about fixing this because there is seldom a good reason to have an > index (unique or not) on fields that can get so wide. As was already > noted, if you do need a uniqueness check you can easily make a 99.9999% > solution by indexing the md5 hash (or some similar digest) of the > column. It doesn't really seem worthwhile to expend development work > on something that would benefit so few people. > > regards, tom lane > > But the documentation needs to be updated to mention this nonetheless. It is a nasty surprise if it hits unawares. Besides, it's not such an impossible scenario. I encountered this bug when making an Usenet image archival system. Since the same images tend to be reposted a lot, it makes sense to store them only once, and simply reference the stored image from each context it was posted in. Currently my program does the uniqueness constraining by itself; I was examining having the database enforce it when I ran into this issue. Such applications are not exactly rare: bayimg, img.google.com, etc. and of course the innumerable Usenet archival sites could all conceivably want to do something like this. So could any application which monitors potentially repeating phenomena, for that matter. After all, saving a single state of the system only once not only reduces the amount of data stored, but could also help in actual analysis of it, since it becomes trivial to recognize most and least often recurring states.