Re: Proposal: Recent mutated table tracking in memory

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Nadav Shatz <[email protected]>
To: Tatsuo Ishii <[email protected]>
Cc: [email protected]
Subject: Re: Proposal: Recent mutated table tracking in memory
Date: Wed, 20 May 2026 15:25:54 +0300
Message-ID: <CACeKOO2eUrfo_UDMFSEd=2y8zj8y93m38EzRCpg1HuizYBf3wA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CACeKOO3OYtS8dezQn4beE_A1aqc2-Uqqb08pQ+oZt_brk7aByQ@mail.gmail.com>
	<CACeKOO31db8_uiZj+6cKRZUBt+iDc82xhEwHeOh+7rHT6gf5mg@mail.gmail.com>
	<[email protected]>
	<[email protected]>

--0000000000005b422006523ee630
Content-Type: multipart/alternative; boundary="0000000000005b421f06523ee62e"

--0000000000005b421f06523ee62e
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hi Tatsuo,

Thanks for checking the V3, sorry for missing the test issue.

I reproduced the timeout locally. Found and fixed the root cause.

Root cause
----------

In CommandComplete.c, the autocommit write-tracking code was
gated only on session_context->is_in_transaction, not on the
cluster mode.

In native replication and snapshot isolation modes,
dml_adaptive() is never called (it lives inside
where_to_send_main_replica), so is_in_transaction is never set
to true even inside an explicit BEGIN/COMMIT block.  That meant
every DML in those modes was treated as autocommit by the
write-tracking code, triggering
pool_track_table_mutation_get_database_oid() =E2=80=94 which does a
relcache do_query =E2=80=94 while a transaction was actually in flight
on the backend connection.  The do_query conflicts with the
in-flight transaction and hangs the session.  Subsequent
shutdown then hangs in terminate_all_childrens / waitpid.

Fix
---

Gate the autocommit write-tracking in CommandComplete.c on
MAIN_REPLICA in addition to the existing checks.
dml_adaptive_global is only meaningful in streaming replication
mode anyway (the matching routing logic in
where_to_send_main_replica is already SR-only), so this just
makes the autocommit path consistent.

Also broadened the query cache bypass to all dml_adaptive*
modes.  The new helper pool_has_dml_adaptive_write_in_transaction()
checks the existing memqcache DML oid buffer (oidbufp via the
new pool_has_dml_table_oids()), which is populated for any DML
in any cluster mode and reset on transaction boundary.  This
fixes the original "SELECT returns stale 1 instead of 2 after
UPDATE" regression in streaming replication and avoids the same
class of bug in plain dml_adaptive too.

Verified
--------

- 006.memqcache with disable_load_balance_on_write =3D
  'dml_adaptive_global' appended in all three modes: PASS
- 043.track_table_mutation: PASS

Attached: v4-0001-Feature-load-balancing-control-by-table-tracking.patch

Thanks!

On Wed, May 20, 2026 at 7:28=E2=80=AFAM Tatsuo Ishii <[email protected]>=
 wrote:

> > Hi Nadav,
> >
> > Sorry, I missed your last email.
> > Will check & test tomorrow.
>
> I finally got a chance to test your v3 patch.
> Unfortunately the test failed with timeout again.
>
> testing 006.memqcache...timeout.
> out of 1 ok:0 failed:0 timeout:1
>
> From src/test/regression/log/006.memqcache:
>
> 2026-05-20 13:08:33.798: main pid 3562591: LOG:  stop request sent to
> pgpool (pid: 3561918). waiting for termination...
> .....2026-05-20 13:08:38.799: main pid 3562591: LOG:  stop request sent t=
o
> pgpool (pid: 3561918). waiting for termination...
> .....2026-05-20 13:08:43.801: main pid 3562591: LOG:  stop request sent t=
o
> pgpool (pid: 3561918). waiting for termination...
>
> It seems pgpool main process won't stop.
>
> Regards,
> --
> Tatsuo Ishii
> SRA OSS K.K.
> English: http://www.sraoss.co.jp/index_en/
> Japanese:http://www.sraoss.co.jp
>

--=20
Nadav Shatz
Tailor Brands | CTO

--0000000000005b421f06523ee62e
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi Tatsuo,<br><br>Thanks for checking the V3, sorry for mi=
ssing the test issue.<div><br>I reproduced the timeout locally. Found and f=
ixed the root cause.<br><br>Root cause<br>----------<br><br>In CommandCompl=
ete.c, the autocommit write-tracking code was<br>gated only on session_cont=
ext-&gt;is_in_transaction, not on the<br>cluster mode.<br><br>In native rep=
lication and snapshot isolation modes,<br>dml_adaptive() is never called (i=
t lives inside<br>where_to_send_main_replica), so is_in_transaction is neve=
r set<br>to true even inside an explicit BEGIN/COMMIT block.=C2=A0 That mea=
nt<br>every DML in those modes was treated as autocommit by the<br>write-tr=
acking code, triggering<br>pool_track_table_mutation_get_database_oid() =E2=
=80=94 which does a<br>relcache do_query =E2=80=94 while a transaction was =
actually in flight<br>on the backend connection.=C2=A0 The do_query conflic=
ts with the<br>in-flight transaction and hangs the session.=C2=A0 Subsequen=
t<br>shutdown then hangs in terminate_all_childrens / waitpid.<br><br>Fix<b=
r>---<br><br>Gate the autocommit write-tracking in CommandComplete.c on<br>=
MAIN_REPLICA in addition to the existing checks.<br>dml_adaptive_global is =
only meaningful in streaming replication<br>mode anyway (the matching routi=
ng logic in<br>where_to_send_main_replica is already SR-only), so this just=
<br>makes the autocommit path consistent.<br><br>Also broadened the query c=
ache bypass to all dml_adaptive*<br>modes.=C2=A0 The new helper pool_has_dm=
l_adaptive_write_in_transaction()<br>checks the existing memqcache DML oid =
buffer (oidbufp via the<br>new pool_has_dml_table_oids()), which is populat=
ed for any DML<br>in any cluster mode and reset on transaction boundary.=C2=
=A0 This<br>fixes the original &quot;SELECT returns stale 1 instead of 2 af=
ter<br>UPDATE&quot; regression in streaming replication and avoids the same=
<br>class of bug in plain dml_adaptive too.<br><br>Verified<br>--------<br>=
<br>- 006.memqcache with disable_load_balance_on_write =3D<br>=C2=A0 &#39;d=
ml_adaptive_global&#39; appended in all three modes: PASS<br>- 043.track_ta=
ble_mutation: PASS<br><br>Attached: v4-0001-Feature-load-balancing-control-=
by-table-tracking.patch<br><br>Thanks!<div></div></div></div><br><div class=
=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr=
">On Wed, May 20, 2026 at 7:28=E2=80=AFAM Tatsuo Ishii &lt;<a href=3D"mailt=
o:[email protected]">[email protected]</a>&gt; wrote:<br></div><block=
quote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1=
px solid rgb(204,204,204);padding-left:1ex">&gt; Hi Nadav,<br>
&gt; <br>
&gt; Sorry, I missed your last email.<br>
&gt; Will check &amp; test tomorrow.<br>
<br>
I finally got a chance to test your v3 patch.<br>
Unfortunately the test failed with timeout again.<br>
<br>
testing 006.memqcache...timeout.<br>
out of 1 ok:0 failed:0 timeout:1<br>
<br>

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Proposal: Recent mutated table tracking in memory
  In-Reply-To: <CACeKOO2eUrfo_UDMFSEd=2y8zj8y93m38EzRCpg1HuizYBf3wA@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox