public inbox for [email protected]  
help / color / mirror / Atom feed
Using left joins instead of inner joins as an optimization
3+ messages / 3 participants
[nested] [flat]

* Using left joins instead of inner joins as an optimization
@ 2024-09-05 16:58  Xavier Solomon <[email protected]>
  0 siblings, 2 replies; 3+ messages in thread

From: Xavier Solomon @ 2024-09-05 16:58 UTC (permalink / raw)
  To: pgsql-general

Hi All!

I'm currently writing a view that joins many tables and I was wondering
how PostgreSQL optimizes
projections on such a view. In particular I was wondering if it is a
correct and valid optimization
technique to use left joins when they are equivalent to an inner join.

I have created a minimal example. Suppose we have two tables:

> create table a(a_id int primary key generated always as identity, a_data
text);
> create table b(b_id int primary key generated always as identity, a_id
int not null references a(a_id), b_data text);

Then the query
> explain select b_id from b natural left join a;
results in a `Seq Scan on b`. Whereas the query
> explain select b_id from b natural join a;
results in a join with sequential scans on both a and b.

I believe because b.a_id is not null and references a.a_id a left and an
inner join are exactly equivalent.

My questions are:
- Am I wrong that in such a situation a left and inner join are equivalent?
- Why does PostgreSQL not automatically optimize this?
- Is it a bad idea to use left joins to optimize this even if semantically
an inner join would be correct?

Thank you for your help!


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Using left joins instead of inner joins as an optimization
@ 2024-09-06 13:24  David Rowley <[email protected]>
  parent: Xavier Solomon <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: David Rowley @ 2024-09-06 13:24 UTC (permalink / raw)
  To: Xavier Solomon <[email protected]>; +Cc: pgsql-general

On Fri, 6 Sept 2024 at 23:05, Xavier Solomon
<[email protected]> wrote:
> > create table a(a_id int primary key generated always as identity, a_data text);
> > create table b(b_id int primary key generated always as identity, a_id int not null references a(a_id), b_data text);
>
> Then the query
> > explain select b_id from b natural left join a;
> results in a `Seq Scan on b`. Whereas the query
> > explain select b_id from b natural join a;
> results in a join with sequential scans on both a and b.
>
> I believe because b.a_id is not null and references a.a_id a left and an inner join are exactly equivalent.

Mostly equivalent, but there are a few corner cases where they're not.

> My questions are:
> - Am I wrong that in such a situation a left and inner join are equivalent?

The foreign key triggers are deferred until at least the end of the
statement, so there are cases where the foreign key can be temporarily
violated.  For example, if the outer query is an UPDATE a SET a_id =
... and you have an AFTER UPDATE ON a trigger that runs your left join
query, a_id will be changed but the changes won't have been verified
(or cascaded) in the referencing table.

> - Why does PostgreSQL not automatically optimize this?

We would need to change the way foreign keys work or maybe at least
give the planner more context as to where the query it's planning is
coming from. I think it might be safe to do this optimisation if it's
a top-level query, but not if it's being run from a trigger.  Maybe
there are other cases which are safe too.

> - Is it a bad idea to use left joins to optimize this even if semantically an inner join would be correct?

If you only ever do it in top-level statements, then I think it's
safe. If not, you might get wrong results. For deferred foreign key
constraints, the window where the foreign key could be violated lasts
until the end of the transaction, so even top-level queries could see
wrong results if you use left join instead of inner.

David






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Using left joins instead of inner joins as an optimization
@ 2024-09-06 13:50  Greg Sabino Mullane <[email protected]>
  parent: Xavier Solomon <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Greg Sabino Mullane @ 2024-09-06 13:50 UTC (permalink / raw)
  To: Xavier Solomon <[email protected]>; +Cc: pgsql-general

On Fri, Sep 6, 2024 at 7:05 AM Xavier Solomon <[email protected]>
wrote:

> > explain select b_id from b natural left join a;
> results in a `Seq Scan on b`. Whereas the query
> > explain select b_id from b natural join a;
> results in a join with sequential scans on both a and b.
>

I think your example is a little too contrived. Try explaining
select * from b natural left join a;
and you should see the plans become equivalent again.

I would expect a query that left joins but only pulls data from one table
to be not feasible in real life. Yes, in an ideal world the non-left join
would be smart enough to not even do the scan on a, but it's kind of a moot
point outside of odd select clauses.

- Is it a bad idea to use left joins to optimize this even if semantically
> an inner join would be correct?
>

Not at all - if it works for you, go ahead. But I'm dubious you will gain
much for queries that actually make use of the left join, at least for
relatively simply selects.

Cheers,
Greg


^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2024-09-06 13:50 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-09-05 16:58 Using left joins instead of inner joins as an optimization Xavier Solomon <[email protected]>
2024-09-06 13:24 ` David Rowley <[email protected]>
2024-09-06 13:50 ` Greg Sabino Mullane <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox