Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1e2DOW-0008Pw-Eg for pgsql-performance@arkaria.postgresql.org; Wed, 11 Oct 2017 09:33:40 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1e2DOV-0005ZG-Eg for pgsql-performance@arkaria.postgresql.org; Wed, 11 Oct 2017 09:33:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1e2DOT-0005Y0-Dk for pgsql-performance@postgresql.org; Wed, 11 Oct 2017 09:33:37 +0000 Received: from mail-it0-x235.google.com ([2607:f8b0:4001:c0b::235]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA1:256) (Exim 4.84_2) (envelope-from ) id 1e2DOQ-0001ma-Fb for pgsql-performance@postgresql.org; Wed, 11 Oct 2017 09:33:36 +0000 Received: by mail-it0-x235.google.com with SMTP id 72so2173974itl.5 for ; Wed, 11 Oct 2017 02:33:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:sender:from:date:message-id:subject:to; bh=/x4vtbEKwDapPTKFC646uZj9Q6HLusJdevS52KhEC4I=; b=LgO9pIT9C5+Lvz9WjeN7V7mjApO3jJmeTb6rl4ULHxaDSDLou5n3aUJz+dWMWuIKrs ek7YxC6/haO2HiZzLL7Hw6ygpbxkNS5Q+ipvGj0gqyR9MRYwHE4lioriZprTgYNYIOhz fyIrLITjlH3K7GKgOxBb9BmGDyiov7v2kImyhDhdCQC5vdjVUF+9oraIh+WjnrUvIY5X ng+073fgBB6VY0q8/GrbzOnwNJYnGHlgL0D1k0Zws53hQJWULQXGZgPJOpOjVb/ASwCd Ui9KoOa4kqw7dGuJgGnBTpD/TUJlQ0Hox1UtGl5V+sXR/ZRirGbh4ZSn7rOk6azKqVi5 syCA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:sender:from:date:message-id:subject :to; bh=/x4vtbEKwDapPTKFC646uZj9Q6HLusJdevS52KhEC4I=; b=HzOEB9sxqvzWXw7IEFNfHXZ/b7J7G93TIFU386NTVSPWzTWvFLO+BxDVFmusnnvQgL IoJcMiXNXTAl5KX+LiUDweMP0Unl4NCGiB2am+rtEfjUV+njEUUy2/YA28zILSxipZ1I 8GTbCQtOOeBh17wQ9mXS+yT2fwGC8WUHzor76HN0YhmZiHaUa/OHQPJPBUhUJjzo1BjY 2vON6wfJJjTLn24lgun8IhoCIxC7p4MIwpeS4GFjXuYCnjeUoxwDsyj8gYDIkXv28/Bj X0p5lYIKIzeHq6jn8M72OFkpjYahjPVpMSexwbpLjQ3GIL/U56dhRtcSXUIC8J7E91Gt oh2A== X-Gm-Message-State: AMCzsaVP9SBYjQgo1fu/WbQIX7r87PiKYqlPmfQ5UC3fxz7A9wkcWwxK BUh4OgDs8E/mJ2pf/RjHct2uX+3qwlhElDq0ifhkYg== X-Google-Smtp-Source: AOwi7QCyacYy+3/SSfXL9Rk++LaWzyI4dSOnDntg+xDnkqj0WpmJpwnq2r1VnmI9MD578lmLFuEWI2NrT4RjZx2WT+M= X-Received: by 10.36.245.76 with SMTP id k73mr8459497ith.126.1507714411710; Wed, 11 Oct 2017 02:33:31 -0700 (PDT) MIME-Version: 1.0 Received: by 10.107.188.197 with HTTP; Wed, 11 Oct 2017 02:33:31 -0700 (PDT) From: Ants Aasma Date: Wed, 11 Oct 2017 12:33:31 +0300 X-Google-Sender-Auth: MWU4MSJ2le4qJwEA-TD8KZjZD2E Message-ID: Subject: Rowcount estimation changes based on from clause order To: "pgsql-performance@postgresql.org" Content-Type: text/plain; charset="UTF-8" List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org I stumbled upon a severe row count underestimation that confusingly went away when two inner joins in the from clause were reordered. I whittled it down to a reproducible test case. Schema: CREATE TABLE small (id serial primary key, ref_id int not null, subset int not null); CREATE TABLE big (id serial primary key, small_id int not null); INSERT INTO small (ref_id, subset) SELECT i/2+1, i/2+1 FROM generate_series(1,1000) i; INSERT INTO big (small_id) SELECT (i % 1000) + 1 FROM generate_series(1,1000000) i; CREATE INDEX ON small (ref_id); CREATE INDEX ON big (small_id); ANALYZE; And the queries, differing in only the order of joins: SELECT * FROM small INNER JOIN big ON small.id = big.small_id INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON lookup.ref = small.ref_id WHERE small.subset = 42; SELECT * FROM small INNER JOIN (SELECT 1 UNION ALL SELECT 2) lookup(ref) ON lookup.ref = small.ref_id INNER JOIN big ON small.id = big.small_id WHERE small.subset = 42; Resulting plan for the first case: Nested Loop (cost=20.45..2272.13 rows=8 width=24) -> Nested Loop (cost=0.28..16.69 rows=1 width=16) -> Append (cost=0.00..0.04 rows=2 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) -> Result (cost=0.00..0.01 rows=1 width=4) -> Index Scan using small_ref_id_idx on small (cost=0.28..8.32 rows=1 width=12) Index Cond: (ref_id = (1)) Filter: (subset = 42) -> Bitmap Heap Scan on big (cost=20.18..2245.44 rows=1000 width=8) Recheck Cond: (small_id = small.id) -> Bitmap Index Scan on big_small_id_idx (cost=0.00..19.93 rows=1000 width=0) Index Cond: (small_id = small.id) Second case plan is identical except row count of the topmost nest loop: Nested Loop (cost=20.45..2272.13 rows=1000 width=24) The union subselect was in reality somewhat more complicated, but for the row count issue the simplification does not seem to matter. The behavior is seen both on 9.4 and on master. Does anybody have any idea what is going on here? In the real world case this is based on the estimation was 5 rows instead of 200k, which resulted in quite bad plan choices downstream. Regards, Ants Aasma -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance