Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.80) (envelope-from ) id 1aZyH2-0006L0-HU for pgsql-performance@arkaria.postgresql.org; Sun, 28 Feb 2016 10:08:24 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84) (envelope-from ) id 1aZyH2-0006wn-22 for pgsql-performance@arkaria.postgresql.org; Sun, 28 Feb 2016 10:08:24 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84) (envelope-from ) id 1aZyFM-00056j-KO for pgsql-performance@postgresql.org; Sun, 28 Feb 2016 10:06:40 +0000 Received: from mail-wm0-x230.google.com ([2a00:1450:400c:c09::230]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84) (envelope-from ) id 1aZyFI-0008BM-Jm for pgsql-performance@postgresql.org; Sun, 28 Feb 2016 10:06:40 +0000 Received: by mail-wm0-x230.google.com with SMTP id n186so11564375wmn.1 for ; Sun, 28 Feb 2016 02:06:36 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=2ndquadrant-com.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc; bh=E9XJsi2OxeOrol+JH+pjwFAuFn/HlkDLEllcedaiSAs=; b=rxTZz1W0YoWcA7vbK9iHMp4047w0Y1QJgGC+hYU7Bi/EkkmIJubcewuDqRFXZgGBRK YpQZkFPnSv04t+xtOyP3dZMCYNZdUfpFCmrfTEIxWvZP315CqCoLXGie8f1C78/oV9AK B5jzaVBUbvL8C5Vp9jl7q3Az5kjtzNT/NLyWGyqYtfgxMGMHWdzVsnfTJr8NSgT3JXBq DjDGusabU9ijlb11lH9yOYDOuzKpRb/CXi/9ax0Zc+jGJJyli8YX7nNb8XP/AXD9GCw2 U2W5Rde05+/WcWwcBnnHjkZj5/4AqdKCiURv1BOf2rFckpWzvBJvw1HxNPhnJQ3i5U6n ADog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc; bh=E9XJsi2OxeOrol+JH+pjwFAuFn/HlkDLEllcedaiSAs=; b=YArl2mjL/R21KZdjmF67U9NPuwUIIlfSX3Vj7CgxsudTil3nDMRSaCW8RA6cMDeFYt 2CFDkJt1UXaFw9Aw23J8F9/+jC3Kw96GOKs8xee4Tu+1BDy3nfVFO9lz1w2a3S4y0trI 2M+3jADvwUIvGtrOALQ1D6vM+dB2/Eavqvwy8Jd+Z9zZhSD5JUbmI/RGlgIBCAgzu73O wpznrlEmkP7PqjToEc+ctwNE7EA7qpBzq+17li93gnbSKGap/kMB36+mRuvwQeQwBwf1 qFiUwSYbOA3djMicKJpCAsZf8x3FoteMdREdNoARcAvXSh7j0rIPF8vehSzKPutl+Okf vdjQ== X-Gm-Message-State: AD7BkJIzQYDlz27E3wc/KvkkTHQTwjITsIh+j29esQ/B+K2y+rw/M5vyQW9eId44bmvBmWg12BhM7xxhTrowin6c MIME-Version: 1.0 X-Received: by 10.194.2.41 with SMTP id 9mr10713813wjr.10.1456653995584; Sun, 28 Feb 2016 02:06:35 -0800 (PST) Received: by 10.27.16.13 with HTTP; Sun, 28 Feb 2016 02:06:35 -0800 (PST) In-Reply-To: References: Date: Sun, 28 Feb 2016 23:06:35 +1300 Message-ID: Subject: Re: Merge joins on index scans From: David Rowley To: James Parks Cc: postgres performance list Content-Type: text/plain; charset=UTF-8 X-Pg-Spam-Score: -2.6 (--) List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org On 27 February 2016 at 11:07, James Parks wrote: > > CREATE TABLE a (id bigint primary key, nonce bigint); > CREATE TABLE b (id bigint primary key, a_id bigint not null); > CREATE INDEX a_idx ON b (a_id); > > The query: > > SELECT b.* FROM b JOIN a ON b.a_id = a.id WHERE a.nonce = ? ORDER BY b.id > ASC; > > (skip down to [1] and [2] to see the query performance) > > What I know: > > If you force the query planner to use a merge join on the above query, it > takes 10+ minutes to complete using the data as per below. If you force the > query planner to use a hash join on the same data, it takes ~200 > milliseconds. I believe I know what is going on here, but can you please test; SELECT b.* FROM b WHERE EXISTS (SELECT 1 FROM a ON b.a_id = a.id AND a.nonce = ?) ORDER BY b.id ASC; using the merge join plan. If this performs much better then the problem is due to the merge join mark/restore causing the join to have to transition through many tuples which don't match the a.nonce = ? predicate. The mark and restore is not required for the rewritten query, as this use a semi join rather than a regular inner join. With the semi join the executor knows that it's only meant to be matching a single tuple in "a", so once the first match is found it can move to the next row in the outer relation without having to restore the scan back to where it started matching that inner row again. If I'm right, to get around the problem you could; create index on a (nonce, id); If such an index is out of the question then a patch has been submitted for review which should fix this problem in (hopefully) either 9.6 or 9.7 https://commitfest.postgresql.org/9/129/ If you have a test environment handy, it would be nice if you could test the patch on the current git head to see if this fixes your problem. The findings would be quite interesting for me. Please note this patch is for test environments only at this stage, not for production use. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance