Received: from localhost (unknown [200.46.204.184]) by postgresql.org (Postfix) with ESMTP id D7AA82E002E for ; Wed, 5 Mar 2008 12:20:48 -0400 (AST) Received: from postgresql.org ([200.46.204.71]) by localhost (mx1.hub.org [200.46.204.184]) (amavisd-maia, port 10024) with ESMTP id 59098-04 for ; Wed, 5 Mar 2008 12:20:39 -0400 (AST) X-Greylist: from auto-whitelisted by SQLgrey-1.7.5 Received: from momjian.us (momjian.us [70.90.9.53]) by postgresql.org (Postfix) with ESMTP id E0E022E0234 for ; Wed, 5 Mar 2008 12:20:38 -0400 (AST) Received: (from bruce@localhost) by momjian.us (8.11.6/8.11.6) id m25GKW328140; Wed, 5 Mar 2008 11:20:32 -0500 (EST) From: Bruce Momjian Message-Id: <200803051620.m25GKW328140@momjian.us> Subject: Re: [BUGS] BUG #3965: UNIQUE constraint fails on long column values In-Reply-To: <47BA12E7.4020506@mbnet.fi> To: Juho Saarikko Date: Wed, 5 Mar 2008 11:20:32 -0500 (EST) CC: PostgreSQL-documentation X-Mailer: ELM [version 2.4ME+ PL124 (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset="US-ASCII" X-Virus-Scanned: Maia Mailguard 1.0.1 X-Archive-Number: 200803/9 X-Sequence-Number: 4821 Does anyone think it is a good idea to document that our indexes cannot index arbirarily-long strings? I see nothing in the documentation now about it. --------------------------------------------------------------------------- Juho Saarikko wrote: > Tom Lane wrote: > > Bruce Momjian writes: > > > >> Juho Saarikko wrote: > >> > >>> While I didn't test, I'd imagine that this would also mean that any attempt > >>> to insert such values to an already unique column would fail. > >>> > > > > > >> Works here in 8.3: > >> > > > > > >> test=> create table test (x text unique); > >> NOTICE: CREATE TABLE / UNIQUE will create implicit index "test_x_key" for table "test" > >> CREATE TABLE > >> test=> insert into test values (repeat('a', 50000)); > >> INSERT 0 1 > >> > > > > That test only works because it's eminently compressible. > > > > > > The short answer to this bug report is that we're not very concerned > > about fixing this because there is seldom a good reason to have an > > index (unique or not) on fields that can get so wide. As was already > > noted, if you do need a uniqueness check you can easily make a 99.9999% > > solution by indexing the md5 hash (or some similar digest) of the > > column. It doesn't really seem worthwhile to expend development work > > on something that would benefit so few people. > > > > regards, tom lane > > > > > But the documentation needs to be updated to mention this nonetheless. > It is a nasty surprise if it hits unawares. > > Besides, it's not such an impossible scenario. I encountered this bug > when making an Usenet image archival system. Since the same images tend > to be reposted a lot, it makes sense to store them only once, and simply > reference the stored image from each context it was posted in. Currently > my program does the uniqueness constraining by itself; I was examining > having the database enforce it when I ran into this issue. > > Such applications are not exactly rare: bayimg, img.google.com, etc. and > of course the innumerable Usenet archival sites could all conceivably > want to do something like this. So could any application which monitors > potentially repeating phenomena, for that matter. After all, saving a > single state of the system only once not only reduces the amount of data > stored, but could also help in actual analysis of it, since it becomes > trivial to recognize most and least often recurring states. > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings -- Bruce Momjian http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +