Received: from maia.hub.org (maia-5.hub.org [200.46.204.29]) by mail.postgresql.org (Postfix) with ESMTP id 81876B5DC00 for ; Tue, 24 May 2011 04:57:09 -0300 (ADT) Received: from mail.postgresql.org ([200.46.204.86]) by maia.hub.org (mx1.hub.org [200.46.204.29]) (amavisd-maia, port 10024) with ESMTP id 25184-04-7 for ; Tue, 24 May 2011 07:57:02 +0000 (UTC) X-Greylist: from auto-whitelisted by SQLgrey-1.7.6 Received: from moutng.kundenserver.de (moutng.kundenserver.de [212.227.126.187]) by mail.postgresql.org (Postfix) with ESMTP id CAD5BB5DC0E for ; Tue, 24 May 2011 04:57:01 -0300 (ADT) Received: from [192.168.1.6] (mail.highperformancepostgresql.com [71.179.240.8]) by mrelayeu.kundenserver.de (node=mreu1) with ESMTP (Nemesis) id 0MPt6U-1QJqo53sLy-0051oS; Tue, 24 May 2011 09:57:00 +0200 Message-ID: <4DDB64CB.7070109@2ndQuadrant.com> Date: Tue, 24 May 2011 03:56:59 -0400 From: Greg Smith User-Agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.16) Gecko/20110505 Icedove/3.0.11 MIME-Version: 1.0 To: pgsql-docs Subject: Improve warnings around CREATE INDEX CONCURRENTLY Content-Type: multipart/mixed; boundary="------------050405050001070303010307" X-Provags-ID: V02:K0:GBO3HT3kAzJMT0AVAe+fM0W0LAhrROnH4awLKxNoG69 ZbRESVpixzkPK3KDBZaJwAgbV3+eiFEOiEN2wIqloWkckZSaTC wzYqvTZzkZcjpLNOMgPonJxSWlswTFI96me9EdYN6a/9lr2yPx mpvJAigYI/Skxx8pPDCsJweYHpNzvVIoQobH8owffWDN2oKQL7 my8gHah/A+d8BzaqcS8Dcq9QxPH3rtByXOqPI0peT8= X-Virus-Scanned: Maia Mailguard 1.0.1 X-Spam-Status: No, hits=-1.9 tagged_above=-5 required=5 tests=BAYES_00=-1.9, RCVD_IN_DNSWL_NONE=-0.0001 X-Spam-Level: X-Archive-Number: 201105/78 X-Sequence-Number: 6753 This is a multi-part message in MIME format. --------------050405050001070303010307 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit When running CREATE INDEX CONCURRENTLY, the DefineIndex() code in src/backend/commands/indexcmds.c does a few things that one would expect from the documentation. And then at the end it executes code described like this: "The index is now valid in the sense that it contains all currently interesting tuples. But since it might not contain tuples deleted just before the reference snap was taken, we have to wait out any transactions that might have older snapshots. Obtain a list of VXIDs of such transactions, and wait for them individually." It's possible to end up with a long series in pg_locks waiting for virtualxid entries at this point, for as long as some set of giant queries takes to execute, and you'll only see them one at a time. The documentation warns: "PostgreSQL must perform two scans of the table, and in addition it must wait for all existing transactions that could potentially use the index to terminate." That's correct, but easy to read the wrong way. I always assumed that this meant it was going to wait behind anything that had a shared lock or such on the table, things that had already accessed it. This is the case with some earlier parts of this same code path. But when it comes to the end here, the scope is actually broader than that. And since there's a session-level lock on the table the whole time this wait loop is executing, that makes considerable secondary havoc possible here. You can end up waiting an unbounded amount of time for some long-running transaction, one not even expected to enter into the rebuild work, to finish, which leaves you no idea what's happening unless you know just what to look for. (Watching such havoc actually happen is what prompted this investigation) What makes it worse is that the wait shows up as a virtualxid one, which doesn't pop up on many common samples of things to look for in pg_locks. It would be reasonable but also incorrect for admins to assume a table one would be visible if running into the case alluded to in the docs. The serial way locks are obtained is unexpected too. Attached patch expands the warnings around this command to reflect both issues: -The waiting time is not necessarily limited to just things that involve the table -The locks it obtains while running this phase of the rebuild are unusual -- Greg Smith 2ndQuadrant US greg@2ndQuadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us --------------050405050001070303010307 Content-Type: text/x-patch; name="concurrent-index-wait.patch" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="concurrent-index-wait.patch" diff --git a/doc/src/sgml/ref/create_index.sgml b/doc/src/sgml/ref/create_index.sgml index 43b6499..cb3334b 100644 --- a/doc/src/sgml/ref/create_index.sgml +++ b/doc/src/sgml/ref/create_index.sgml @@ -381,7 +381,17 @@ CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ name In a concurrent index build, the index is actually entered into the system catalogs in one transaction, then the two table scans occur in a - second and third transaction. + second and third transaction. All active transactions at the time the + second table scan starts, not just ones that already involve the table, + have the potential to block the concurrent index creation for however + long it takes for them to finish. When checking for transactions that + could still use the original index, concurrent index creation advances + through potentially interfering older transactions one at a time, + obtaining locks on their virtual transaction identifiers to wait for + them to complete. + + + If a problem arises while scanning the table, such as a uniqueness violation in a unique index, the CREATE INDEX command will fail but leave behind an invalid index. This index --------------050405050001070303010307--