Efficiently merging and sorting collections of sorted rows

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Clint Miller <[email protected]>
To: [email protected]
Subject: Efficiently merging and sorting collections of sorted rows
Date: Fri, 23 Jun 2017 17:58:28 -0500
Message-ID: <CALck7q=QCruONy5WSbyiSrGUyrHRxWfaLcT=TkDNEyozOVomTA@mail.gmail.com> (raw)
List-Unsubscribe:  <mailto:[email protected]?body=unsub%20pgsql-performance>

Let's say I have the following table and index:

create table foo(s text, i integer);
create index foo_idx on foo (s, i);

If I run the following commands:

start transaction;
set local enable_sort = off;
explain analyze select * from foo where s = 'a' order by i;
end;

I get the following query plan:

Index Only Scan using foo_idx on foo (cost=0.14..8. 15 row=1 width=36)
(actual time=0.008..0.0 10 rows=3 loops=1)

  Index Cond: (s = 'a'::text)

  Heap Fetches: 3

That's a good plan because it's not doing a quick sort. Instead, it's just
reading the sort order off of the index, which is exactly what I want. (I
had to disable enable_sort because I didn't have enough rows of test data
in the table to get Postgres to use the index. But if I had enough rows,
the enable_sort stuff wouldn't be necessary. My real table has lots of rows
and doesn't need enable_sort turned off to do the sort with the index.)

But, if I run the following commands:

start transaction;
set local enable_sort = off;
explain analyze select * from foo where s = 'a' or s = 'b' order by i;
end;

I get the following plan:

Sort  (cost=10000000001.16..10000000001.16 rows=2 width=36) (actual
time=0.020..0.021 rows=7 loops=1)
  Sort Key: i
  Sort Method: quicksort  Memory: 25kB
  ->  Seq Scan on foo  (cost=0.00..1.15 rows=2 width=36) (actual
time=0.009..0.011 rows=7 loops=1)
        Filter: ((s = 'a'::text) OR (s = 'b'::text))
        Rows Removed by Filter: 3

Here, it's loading the full result set into memory and doing a quick sort.
(I think that's what it's doing, at least. If that's not the case, let me
know.) That's not good.

What I'd really like Postgres to do is use the index to get a sorted list
of rows where s = 'a'. Then, use the index again to get a sorted list of
rows where s = 'b'. Then it seems like Postgres should be able to merge the
sorted lists into a single sorted result set in O(n) time and O(1) memory
using a single merge operation.

Am I doing something wrong here? Is there a way to get Postgres to not do a
quick sort here?

My concern is that my real table has a significant number of rows, and the
result set will not fit into memory. So instead of getting a quick sort,
I'll end up getting a slow, disk-based merge sort. I really need the bulk
of the sort operation to come off of the index so that time and memory are
small.

Thanks for any help on this issue.

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: Efficiently merging and sorting collections of sorted rows
  In-Reply-To: <CALck7q=QCruONy5WSbyiSrGUyrHRxWfaLcT=TkDNEyozOVomTA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox