Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dOY6j-0005Xy-U9 for pgsql-performance@arkaria.postgresql.org; Fri, 23 Jun 2017 23:35:22 +0000 Received: from localhost ([127.0.0.1] helo=postgresql.org) by malur.postgresql.org with smtp (Exim 4.84_2) (envelope-from ) id 1dOY6j-0005jN-H1 for pgsql-performance@arkaria.postgresql.org; Fri, 23 Jun 2017 23:35:21 +0000 Received: from makus.postgresql.org ([2001:4800:1501:1::229]) by malur.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dOY4v-0006c7-K5 for pgsql-performance@postgresql.org; Fri, 23 Jun 2017 23:33:29 +0000 Received: from sss.pgh.pa.us ([66.207.139.130]) by makus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_CBC_SHA384:256) (Exim 4.84_2) (envelope-from ) id 1dOY4t-00057Z-9S for pgsql-performance@postgresql.org; Fri, 23 Jun 2017 23:33:28 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.14.4/8.14.4) with ESMTP id v5NNXPeN018038; Fri, 23 Jun 2017 19:33:25 -0400 From: Tom Lane To: Clint Miller cc: pgsql-performance@postgresql.org Subject: Re: Efficiently merging and sorting collections of sorted rows In-reply-to: References: Comments: In-reply-to Clint Miller message dated "Fri, 23 Jun 2017 17:58:28 -0500" MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <18036.1498260805.1@sss.pgh.pa.us> Content-Transfer-Encoding: quoted-printable Date: Fri, 23 Jun 2017 19:33:25 -0400 Message-ID: <18037.1498260805@sss.pgh.pa.us> List-Archive: List-Help: List-ID: List-Owner: List-Post: List-Subscribe: List-Unsubscribe: X-Mailing-List: pgsql-performance Precedence: bulk Sender: pgsql-performance-owner@postgresql.org Clint Miller writes: > That's a good plan because it's not doing a quick sort. Instead, it's just > reading the sort order off of the index, which is exactly what I want. (I > had to disable enable_sort because I didn't have enough rows of test data > in the table to get Postgres to use the index. But if I had enough rows, > the enable_sort stuff wouldn't be necessary. My real table has lots of ro= ws > and doesn't need enable_sort turned off to do the sort with the index.) TBH, I think this whole argument is proceeding from false premises. Using an indexscan as a substitute for an explicit sort of lots of rows isn't all that attractive, because it implies a whole lot of random access to the table (unless the table is nearly in index order, which isn't a condition you can count on without expending a lot of maintenance effort to keep it that way). seqscan-and-sort is often a superior alternative, especially if you're willing to give the sort a reasonable amount of work_mem. > What I'd really like Postgres to do is use the index to get a sorted list > of rows where s =3D 'a'. Then, use the index again to get a sorted list of > rows where s =3D 'b'. Then it seems like Postgres should be able to merge= the > sorted lists into a single sorted result set in O(n) time and O(1) memory > using a single merge operation. If there's no duplicates to remove, I think this will work: explain (select * from foo a where s =3D 'a' order by i) union all (select * from foo b where s =3D 'b' order by i) order by i; Merge Append (cost=3D0.32..48.73 rows=3D12 width=3D36) Sort Key: a.i -> Index Only Scan using foo_idx on foo a (cost=3D0.15..24.26 rows=3D6= width=3D36) Index Cond: (s =3D 'a'::text) -> Index Only Scan using foo_idx on foo b (cost=3D0.15..24.26 rows=3D6= width=3D36) Index Cond: (s =3D 'b'::text) In this case it's pretty obvious that the two union arms can never return the same row, but optimizing OR into UNION in general is difficult because of the possibility of duplicates. I wouldn't recommend holding your breath waiting for the planner to do this for you. regards, tom lane --=20 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance