public inbox for [email protected]  
help / color / mirror / Atom feed
Why does it sort rows after a nested loop that uses already-sorted indexes?
3+ messages / 2 participants
[nested] [flat]

* Why does it sort rows after a nested loop that uses already-sorted indexes?
@ 2024-04-18 11:00 negora <[email protected]>
  2024-04-18 14:53 ` Re: Why does it sort rows after a nested loop that uses already-sorted indexes? Tom Lane <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: negora @ 2024-04-18 11:00 UTC (permalink / raw)
  To: PostgreSQL - General <[email protected]>

Hi:

I've a question regarding nested loops and the order in which they 
return rows. Can you help me, please?

Suppose that I've two tables:

     - Table [sales_order]

         * Columns [id]
         * Index [sales_order_pkey] on [id]

     - Table [order_line]

         * Columns [id], [sales_order_id]
         * Index [order_line_ukey] on [sales_order_id], [id]

Then, I run the following query:

-----------------------------------------------------------

SELECT sales_order.id, order_line.id

FROM main.sales_order

JOIN main.order_line ON order_line.sales_order_id = sales_order.id

WHERE sales_order.customer_id = 2

ORDER BY sales_order.id, order_line.id;

-----------------------------------------------------------

The query planner decides to use the following nested loop:

-----------------------------------------------------------

Incremental Sort  (cost=26.90..16020.06 rows=144955 width=8)

   Sort Key: sales_order.id, order_line.id

   Presorted Key: sales_order.id

   ->  Nested Loop  (cost=0.70..4593.99 rows=144955 width=8)

         ->  Index Scan using sales_order_pkey on sales_order  
(cost=0.28..19.31 rows=79 width=4)

               Filter: (customer_id = 2)

         ->  Index Only Scan using order_line_ukey on order_line  
(cost=0.42..39.22 rows=1869 width=8)

               Index Cond: (sales_order_id = sales_order.id)

-----------------------------------------------------------

As you can see, the planner does detect that the outer loop returns the 
rows presorted by [sales_order.id]. However, it's unable to detect that 
the rows returned by the inner loop are also sorted by [sales_order.id] 
first, and then by [order_line.id].

Why is it? Is it because the planner is designed to always ignore the 
order of the inner loop, even although it could take advantage of it 
(for example, because the analysis time rarely is worth it)? Or is there 
something that I'm missing?

If I'm not mistaken, in this case both index scans seem to be done 
serially, in an N x M style, so I think the row order would be 
preserved, Right?

Thank you!

negora



^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Why does it sort rows after a nested loop that uses already-sorted indexes?
  2024-04-18 11:00 Why does it sort rows after a nested loop that uses already-sorted indexes? negora <[email protected]>
@ 2024-04-18 14:53 ` Tom Lane <[email protected]>
  2024-04-19 08:11   ` Re: Why does it sort rows after a nested loop that uses already-sorted indexes? negora <[email protected]>
  0 siblings, 1 reply; 3+ messages in thread

From: Tom Lane @ 2024-04-18 14:53 UTC (permalink / raw)
  To: negora <[email protected]>; +Cc: PostgreSQL - General <[email protected]>

negora <[email protected]> writes:
> As you can see, the planner does detect that the outer loop returns the 
> rows presorted by [sales_order.id]. However, it's unable to detect that 
> the rows returned by the inner loop are also sorted by [sales_order.id] 
> first, and then by [order_line.id].

That's a level of analysis that it doesn't do, and TBH I'm not even
entirely sure it's correct to assume that the output is sorted like
that.  At minimum you'd need an additional assumption that the
outer side's join key is unique, which is a factor that we don't
currently track when reasoning about ordering.

			regards, tom lane






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Why does it sort rows after a nested loop that uses already-sorted indexes?
  2024-04-18 11:00 Why does it sort rows after a nested loop that uses already-sorted indexes? negora <[email protected]>
  2024-04-18 14:53 ` Re: Why does it sort rows after a nested loop that uses already-sorted indexes? Tom Lane <[email protected]>
@ 2024-04-19 08:11   ` negora <[email protected]>
  0 siblings, 0 replies; 3+ messages in thread

From: negora @ 2024-04-19 08:11 UTC (permalink / raw)
  To: Tom Lane <[email protected]>; +Cc: PostgreSQL - General <[email protected]>

 > That's a level of analysis that it doesn't do...

Great. I suspected that, but I needed a confirmation from a reliable 
source. Thank you!

 > ...and TBH I'm not even
 > entirely sure it's correct to assume that the output is sorted like
 > that.  At minimum you'd need an additional assumption that the
 > outer side's join key is unique, which is a factor that we don't
 > currently track when reasoning about ordering.

Ouch! I hadn't thought about that possibility! When I tried to mentally 
reproduce the nested loop, I always considered the values of the outer 
loop to be unique. I guess that was because, very often, I used unique 
indexes for my tests... But it doesn't have to be so, of course.

Best regards.



On 18/04/2024 16:53, Tom Lane wrote:
> negora <[email protected]> writes:
>> As you can see, the planner does detect that the outer loop returns the
>> rows presorted by [sales_order.id]. However, it's unable to detect that
>> the rows returned by the inner loop are also sorted by [sales_order.id]
>> first, and then by [order_line.id].
> 
> That's a level of analysis that it doesn't do, and TBH I'm not even
> entirely sure it's correct to assume that the output is sorted like
> that.  At minimum you'd need an additional assumption that the
> outer side's join key is unique, which is a factor that we don't
> currently track when reasoning about ordering.
> 
> 			regards, tom lane






^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2024-04-19 08:11 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-04-18 11:00 Why does it sort rows after a nested loop that uses already-sorted indexes? negora <[email protected]>
2024-04-18 14:53 ` Tom Lane <[email protected]>
2024-04-19 08:11   ` negora <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox