public inbox for [email protected]help / color / mirror / Atom feed
Re: BUG #19460: FULL JOIN rewriting issue on empty queries 10+ messages / 3 participants [nested] [flat]
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 01:12 Richard Guo <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Richard Guo @ 2026-04-20 01:12 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: [email protected]; [email protected]; Robert Haas <[email protected]> On Mon, Apr 20, 2026 at 6:10 AM Tom Lane <[email protected]> wrote: > This turns out to be because somebody long ago thought that outer join > removal could be lazy about how much of the planner's data structures > it needs to update. Specifically, when the lower LEFT OUTER JOIN > gets removed, we failed to remove the associated relids from the > left_relids and right_relids of the upper "ON rhs.id = lhs.id" clause, > and that blocks recognition of the applicability of a hash or merge > join, because clause_sides_match_join() fails. I came to the same conclusion. > The fix seems pretty trivial, as attached. (While I'm only certain > that we have to fix left_relids and right_relids, this discovery > makes it seem like it'd be pretty foolish not to fix all the relid > sets of a RestrictInfo.) I didn't make a regression test case yet, > but we need one since no existing test results change (!?). This fix LGTM. I think it'd be better to have a regression test case. How about this one: create table t (id int unique); explain (costs off) select t1.* from t t1 full join (select 1 as x from t t2 left join t t3 on t2.id = t3.id ) sub on t1.id = sub.x; ERROR: FULL JOIN is only supported with merge-joinable or hash-joinable join conditions > I'm feeling a tad nervous about pushing this into released branches. > It seems likely that it might enable quite a few join plans that were > previously not considered, and people tend not to like plan changes in > stable branches. However, (a) it's hard to argue that this isn't a > regression from pre-v16, and (b) since this change affects no existing > test, maybe the blast radius isn't as big as I fear. Fair points on both sides. I'd lean slightly toward back-patching this fix, mostly because of your points (a) and (b). Without a back-patch, users like François would need to adjust affected queries when upgrading from pre-v16 to v16–v18, which feels a bit unfortunate. - Richard ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 01:26 Tom Lane <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Tom Lane @ 2026-04-20 01:26 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: [email protected]; [email protected]; Robert Haas <[email protected]> Richard Guo <[email protected]> writes: > On Mon, Apr 20, 2026 at 6:10 AM Tom Lane <[email protected]> wrote: >> This turns out to be because somebody long ago thought that outer join >> removal could be lazy about how much of the planner's data structures >> it needs to update. Specifically, when the lower LEFT OUTER JOIN >> gets removed, we failed to remove the associated relids from the >> left_relids and right_relids of the upper "ON rhs.id = lhs.id" clause, >> and that blocks recognition of the applicability of a hash or merge >> join, because clause_sides_match_join() fails. > I came to the same conclusion. Thanks for looking at it! There is a loose end still bothering me: if you remove the lower "WHERE t.id = ..." clause, or change it to be something other than an equality constraint on t.id, the bug doesn't manifest. The reason for that is un-obvious. I suppose it's somehow related to the code that tries to push equality-to-a-constant through outer join clauses, but that code shouldn't be able to produce any new clauses here, so why is there a visible effect? I'm too tired to look right now, and was planning to study it more tomorrow. But if you are interested in digging before that, feel free. regards, tom lane ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 02:17 Richard Guo <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Richard Guo @ 2026-04-20 02:17 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: [email protected]; [email protected]; Robert Haas <[email protected]> On Mon, Apr 20, 2026 at 10:26 AM Tom Lane <[email protected]> wrote: > Thanks for looking at it! There is a loose end still bothering me: > if you remove the lower "WHERE t.id = ..." clause, or change it to be > something other than an equality constraint on t.id, the bug doesn't > manifest. The reason for that is un-obvious. The reason seems to be that the equality constraint is a restriction clause for the inner relation 't', and is needed to determine that the relation has a matching unique index and is therefore distinct. If we remove it, or change it to something that isn't mergejoinable, we won't be able to prove the inner side of the left join is distinct, and thus won't be able to remove that left join. I think the qual clause "sub.id = empty_source.id" might be confusing, because empty_source.id is constant NULL, and this clause would be simplified to constant NULL during const-folding. - Richard ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 02:32 Tom Lane <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Tom Lane @ 2026-04-20 02:32 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: [email protected]; [email protected]; Robert Haas <[email protected]> Richard Guo <[email protected]> writes: > On Mon, Apr 20, 2026 at 10:26 AM Tom Lane <[email protected]> wrote: >> Thanks for looking at it! There is a loose end still bothering me: >> if you remove the lower "WHERE t.id = ..." clause, or change it to be >> something other than an equality constraint on t.id, the bug doesn't >> manifest. The reason for that is un-obvious. > The reason seems to be that the equality constraint is a restriction > clause for the inner relation 't', and is needed to determine that the > relation has a matching unique index and is therefore distinct. If we > remove it, or change it to something that isn't mergejoinable, we > won't be able to prove the inner side of the left join is distinct, > and thus won't be able to remove that left join. Hmm. The bug also goes away if "t" doesn't have a unique/pkey constraint, and I find that easy to understand: we can't apply outer join removal unless rel_supports_distinctness/rel_is_distinct_for succeed, so that this buggy code in remove_rel_from_restrictinfo is not reached. But that logic doesn't consider WHERE constraints AFAICS. So I think there is some other code path involved. It might turn out to not be all that interesting to run this to ground, but I want to do so because it might inform our estimate of the patch's blast radius. regards, tom lane ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 02:51 Richard Guo <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Richard Guo @ 2026-04-20 02:51 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: [email protected]; [email protected]; Robert Haas <[email protected]> On Mon, Apr 20, 2026 at 11:32 AM Tom Lane <[email protected]> wrote: > Richard Guo <[email protected]> writes: > > On Mon, Apr 20, 2026 at 10:26 AM Tom Lane <[email protected]> wrote: > >> Thanks for looking at it! There is a loose end still bothering me: > >> if you remove the lower "WHERE t.id = ..." clause, or change it to be > >> something other than an equality constraint on t.id, the bug doesn't > >> manifest. The reason for that is un-obvious. > > The reason seems to be that the equality constraint is a restriction > > clause for the inner relation 't', and is needed to determine that the > > relation has a matching unique index and is therefore distinct. If we > > remove it, or change it to something that isn't mergejoinable, we > > won't be able to prove the inner side of the left join is distinct, > > and thus won't be able to remove that left join. > Hmm. The bug also goes away if "t" doesn't have a unique/pkey > constraint, and I find that easy to understand: we can't apply outer > join removal unless rel_supports_distinctness/rel_is_distinct_for > succeed, so that this buggy code in remove_rel_from_restrictinfo > is not reached. But that logic doesn't consider WHERE constraints > AFAICS. So I think there is some other code path involved. Hmm, relation_has_unique_index_for does consider the lower "WHERE t.id = ..." clause, as that clause is a restriction clause for "t", and relation_has_unique_index_for automatically adds any usable restriction clauses for the rel. - Richard ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 19:01 Tom Lane <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Tom Lane @ 2026-04-20 19:01 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: [email protected]; [email protected]; Robert Haas <[email protected]> Richard Guo <[email protected]> writes: > On Mon, Apr 20, 2026 at 11:32 AM Tom Lane <[email protected]> wrote: >> Hmm. The bug also goes away if "t" doesn't have a unique/pkey >> constraint, and I find that easy to understand: we can't apply outer >> join removal unless rel_supports_distinctness/rel_is_distinct_for >> succeed, so that this buggy code in remove_rel_from_restrictinfo >> is not reached. But that logic doesn't consider WHERE constraints >> AFAICS. So I think there is some other code path involved. > Hmm, relation_has_unique_index_for does consider the lower "WHERE t.id > = ..." clause, as that clause is a restriction clause for "t", and > relation_has_unique_index_for automatically adds any usable > restriction clauses for the rel. Ah, I finally got it through my head that there are two distinct proof paths by which we might reach the conclusion that the lower left join is removable. I had been thinking that we were proving that from the combination of the "sub.id = empty_source.id" clause with the unique index on t.id. But we're not, in the query as-submitted, because we pull up the NULL::UUID constant and const-fold that clause to NULL. Instead, it's the lowest "WHERE t.id = ..." that is combined with the unique index to make the proof. So without an equality test there, we don't think the inner side is unique and don't do join removal, thus dodging the bug. The WHERE FALSE bit masks this omission because it causes us to reduce the outer join to a dummy relation anyway, later on. But if you take that out, you can see that join removal is not being performed. I thought it was worth memorializing these two variants in separate test queries, so I did that. The variant without the lowest WHERE has a non-null constant in the LOJ's left-hand side, so it's able to make the removal proof from the "sub.id = empty_source.id" clause. Pushed at cfcd57111 et al. Thanks again for the report! regards, tom lane ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 19:23 Tom Lane <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Tom Lane @ 2026-04-20 19:23 UTC (permalink / raw) To: [email protected]; +Cc: Richard Guo <[email protected]>; [email protected]; Robert Haas <[email protected]> I wrote: > Pushed at cfcd57111 et al. Thanks again for the report! Hmm, skink seems unhappy with this. Looking... regards, tom lane ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 22:40 Tom Lane <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Tom Lane @ 2026-04-20 22:40 UTC (permalink / raw) To: [email protected]; +Cc: Richard Guo <[email protected]>; [email protected]; Robert Haas <[email protected]> I wrote: >> Pushed at cfcd57111 et al. Thanks again for the report! > Hmm, skink seems unhappy with this. Looking... Ah: equivclass.c doesn't mind letting em->em_relids be an alias for the left_relids or right_relids of some source RestrictInfo. That's not problematic as long as those are all constants after construction of the EquivalenceClass, but when remove_rel_from_eclass is trying to change things, it's a big problem. This seems to do the trick to fix it, although I'm going to wait for a valgrind regression run to finish before deciding this is enough: diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c index bfb1af614c2..03056bdf3e0 100644 --- a/src/backend/optimizer/plan/analyzejoins.c +++ b/src/backend/optimizer/plan/analyzejoins.c @@ -783,6 +783,8 @@ remove_rel_from_eclass(EquivalenceClass *ec, int relid, int ojrelid) bms_is_member(ojrelid, cur_em->em_relids)) { Assert(!cur_em->em_is_const); + /* em_relids is likely to be shared with some RestrictInfo */ + cur_em->em_relids = bms_copy(cur_em->em_relids); cur_em->em_relids = bms_del_member(cur_em->em_relids, relid); cur_em->em_relids = bms_del_member(cur_em->em_relids, ojrelid); if (bms_is_empty(cur_em->em_relids)) This discovery may help explain why we'd seen so few trouble reports up to now. At least some RestrictInfos' left/right_relids would have indirectly gotten "fixed" by the above. BTW, the case that is crashing the regression tests is where the above bit reduces em_relids to empty, allowing bms_del_member to pfree it. Now, the source RestrictInfo's left/right_relids is pointing at garbage. The reason this didn't cause trouble before is that if em_relids becomes empty, we remove that EquivalenceMember altogether, and apparently that's enough to keep us from consulting the source RestrictInfo anymore. But the loop over ec_sources just below does see it, and now it is needing the left/right_relids to be valid. regards, tom lane ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-20 23:04 Richard Guo <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 10+ messages in thread From: Richard Guo @ 2026-04-20 23:04 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: [email protected]; [email protected]; Robert Haas <[email protected]> On Tue, Apr 21, 2026 at 7:40 AM Tom Lane <[email protected]> wrote: > Ah: equivclass.c doesn't mind letting em->em_relids be an alias > for the left_relids or right_relids of some source RestrictInfo. > That's not problematic as long as those are all constants after > construction of the EquivalenceClass, but when remove_rel_from_eclass > is trying to change things, it's a big problem. ha, I just came to the same conclusion. > This seems to do the trick to fix it, although I'm going to wait > for a valgrind regression run to finish before deciding this > is enough: This seems safe enough to me. LGTM. - Richard ^ permalink raw reply [nested|flat] 10+ messages in thread
* Re: BUG #19460: FULL JOIN rewriting issue on empty queries @ 2026-04-21 07:40 François Jehl <[email protected]> parent: Richard Guo <[email protected]> 0 siblings, 0 replies; 10+ messages in thread From: François Jehl @ 2026-04-21 07:40 UTC (permalink / raw) To: Richard Guo <[email protected]>; +Cc: Tom Lane <[email protected]>; [email protected]; Robert Haas <[email protected]> Thanks Tom and Richard for the quick diagnosis and fix, I should be the one saying thanks here! François On Tue, Apr 21, 2026 at 1:04 AM Richard Guo <[email protected]> wrote: > On Tue, Apr 21, 2026 at 7:40 AM Tom Lane <[email protected]> wrote: > > Ah: equivclass.c doesn't mind letting em->em_relids be an alias > > for the left_relids or right_relids of some source RestrictInfo. > > That's not problematic as long as those are all constants after > > construction of the EquivalenceClass, but when remove_rel_from_eclass > > is trying to change things, it's a big problem. > > ha, I just came to the same conclusion. > > > This seems to do the trick to fix it, although I'm going to wait > > for a valgrind regression run to finish before deciding this > > is enough: > > This seems safe enough to me. LGTM. > > - Richard > ^ permalink raw reply [nested|flat] 10+ messages in thread
end of thread, other threads:[~2026-04-21 07:40 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-04-20 01:12 Re: BUG #19460: FULL JOIN rewriting issue on empty queries Richard Guo <[email protected]> 2026-04-20 01:26 ` Tom Lane <[email protected]> 2026-04-20 02:17 ` Richard Guo <[email protected]> 2026-04-20 02:32 ` Tom Lane <[email protected]> 2026-04-20 02:51 ` Richard Guo <[email protected]> 2026-04-20 19:01 ` Tom Lane <[email protected]> 2026-04-20 19:23 ` Tom Lane <[email protected]> 2026-04-20 22:40 ` Tom Lane <[email protected]> 2026-04-20 23:04 ` Richard Guo <[email protected]> 2026-04-21 07:40 ` François Jehl <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox