Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1eAd5x-0003P1-Ov for pgsql-performance@arkaria.postgresql.org; Fri, 03 Nov 2017 14:37:18 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1eAd5x-000131-Bi for pgsql-performance@arkaria.postgresql.org; Fri, 03 Nov 2017 14:37:17 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1eAd5v-0000sN-Be for pgsql-performance@postgresql.org; Fri, 03 Nov 2017 14:37:15 +0000 Received: from gproxy5-pub.mail.unifiedlayer.com ([67.222.38.55]) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1eAd5r-00084A-9m for pgsql-performance@postgresql.org; Fri, 03 Nov 2017 14:37:14 +0000 Received: from cmgw3 (unknown [10.0.90.84]) by gproxy5.mail.unifiedlayer.com (Postfix) with ESMTP id A13D7140769 for ; Fri, 3 Nov 2017 08:37:05 -0600 (MDT) Received: from host214.hostmonster.com ([74.220.215.214]) by cmgw3 with id VSd11w00C4e7MuJ01Sd4T7; Fri, 03 Nov 2017 08:37:05 -0600 X-Authority-Analysis: v=2.2 cv=H76r+6Qi c=1 sm=1 tr=0 a=E7lA7DdVRVhKYBNJxJVZOg==:117 a=E7lA7DdVRVhKYBNJxJVZOg==:17 a=IkcTkHD0fZMA:10 a=sC3jslCIGhcA:10 a=VVlED5B4AAAA:8 a=BA-StuRC2SNYTJrQDCQA:9 a=QEXdDO2ut3YA:10 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gusw.net; s=default; h=Content-Transfer-Encoding:Content-Type:In-Reply-To:MIME-Version :Date:Message-ID:From:References:To:Subject:Sender:Reply-To:Cc:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=/2otXtOnnmioDneMAx85ugBKAd2k4Z4qcK/QGkh/Viw=; b=gcY3yzQRYTxH/q+wMs6dK+XWIt iZuhUtB+wU3nQXe4YqPBIEGuMprNb/FnUdPN1n6IjJvDIRAKw7y0NMjWpv+TkVAczYb/VwLiJ8I9Z ykiRLS/EQYlB7qX21b7KgQatT; Received: from [191.7.145.23] (port=60531 helo=[192.168.9.4]) by host214.hostmonster.com with esmtpsa (TLSv1.2:ECDHE-RSA-AES128-GCM-SHA256:128) (Exim 4.87) (envelope-from ) id 1eAd5h-001mow-26 for pgsql-performance@postgresql.org; Fri, 03 Nov 2017 08:37:01 -0600 Subject: Re: Re: OLAP/reporting queries fall into nested loops over seq scans or other horrible planner choices To: pgsql-performance@postgresql.org References: <1509611428.3268.5.camel@cybertec.at> <1509701327868-0.post@n3.nabble.com> From: Gunther Message-ID: <9a76f13e-cdb1-1d8d-2178-67e6dcf169bc@gusw.net> Date: Fri, 3 Nov 2017 10:36:57 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-US X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host214.hostmonster.com X-AntiAbuse: Original Domain - postgresql.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - gusw.net X-BWhitelist: no X-Source-IP: 191.7.145.23 X-Exim-ID: 1eAd5h-001mow-26 X-Source: X-Source-Args: X-Source-Dir: X-Source-Sender: ([192.168.9.4]) [191.7.145.23]:60531 X-Source-Auth: gunther+pragmaticdata.com X-Email-Count: 1 X-Source-Cap: cHJhZ21hdDE7cHJhZ21hdDE7aG9zdDIxNC5ob3N0bW9uc3Rlci5jb20= X-Local-Domain: yes List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org On Fri, Nov 3, 2017 at 5:28 AM, Thomas Kellerer wrote: >> I do like Oracle's approach with SQL profiles, where you can force the >> optimizer to try harder to find a good execution plan. I _think_ it even >> runs the statement with multiple plans and compares the expected outcome >> with the actual values. Once a better plan is found that plan can be >> attached to that query and the planner will use that plan with subsequent >> executions. I have used that approach with Oracle. I didn't like it. It is too difficult, too complicated. Requires all sorts of DBA privileges. Nothing that would help a lowly user trying his ad-hoc queries. I think a "dynamic feedback plan optimization" would be more innovative and ultimately deliver better on the original RDBMS vision. The RDBMS should exert all intelligence that it can to optimize the query execution. (I know that means: no reliance on hints.) There is so much more that could be done, such as materialized and potentially indexed partial results. (I know Oracle as materialized partial results). But the dynamic feedback plan would be even cooler.  So that means the outer relation should be built or sampled to estimate the selectivity, the inner relation should be built completely, and if it is too large, it should be thrown back to the optimizer to change the plan. Or may be the planner needs some second look pattern matching criticizer: Any pattern of Nested Loop I would re-check and possibly sample a few rows. And Nested Loop with costly inner loop should almost always be avoided. Nested Loop of Seq Scan is a no-no unless it can be proven that the cardinality of the inner relation to scan is less than 100. But even more, once you have the inner and outer table of a Nested Loop built or sampled, there should be no reason not to run the Hash Join. I guess I still don't get why the optimizer even today would EVER consider a Nested Loop over a Hash Join, unless there is some clear indication that the query will be used to just get the FIRST ROWS (Oracle hint) and that those first rows will actually exist (user waits 30 minutes at 100% CPU only to be informed that the query has no results!), and that the results are likely to come out early in the Nested Loop! So many constraints to make that Nested Loop plan a successful strategy. Why ever choose it??? I guess, hints or no hints, I think Nested Loops should not be used by the optimizer unless it has positive indication  that it meets all the criteria for being a good strategy, i.e., that there is a continuous path of indexed columns starting with constant query parameters. This is the usual OLTP query. And that is what Nested Loops are for. But in all other cases, and if space allows at all, always use Hash Joins. It is even cheaper to do a trial and error! Assume that space will allow, and quit if it doesn't, rather than being sheepish and going to a 1 hour CPU bound operation. Because if space does not allow, the chance for Nested Loop being a good idea is also close to nil! So if space doesn't allow, it would be Sort-Merge on Disk. Especially if the query has a DISTINCT or ORDER BY clause anyway! Why is that not always a better strategy? And yes, until all this is figured out: by all means include the pg_hint_plan.c -- pretty please! regards, -Gunther -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance