public inbox for [email protected]  
help / color / mirror / Atom feed
From: Tatsuo Ishii <[email protected]>
To: [email protected]
Cc: [email protected]
Subject: Re: Proposal: Recent mutated table tracking in memory
Date: Mon, 18 May 2026 19:11:05 +0900 (JST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <CACeKOO31db8_uiZj+6cKRZUBt+iDc82xhEwHeOh+7rHT6gf5mg@mail.gmail.com>
References: <[email protected]>
	<CACeKOO3OYtS8dezQn4beE_A1aqc2-Uqqb08pQ+oZt_brk7aByQ@mail.gmail.com>
	<CACeKOO31db8_uiZj+6cKRZUBt+iDc82xhEwHeOh+7rHT6gf5mg@mail.gmail.com>

Hi Nadav,

Sorry, I missed your last email.
Will check & test tomorrow.

Regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Hi Tatsuo,
> 
> Any update on this proposal?
> 
> Thanks
> 
> On Thu, Apr 23, 2026 at 5:16 PM Nadav Shatz <[email protected]> wrote:
> 
>> Hi Tatsuo,
>>
>> Good catch on the 006.memqcache timeout. My previous fix had
>> wrong side effects -- setting writing_transaction for
>> dml_adaptive_global also changed routing behavior (it forced the
>> whole transaction to primary, effectively reducing the feature to
>> 'transaction' mode). That's what caused the hang.
>>
>> Fixed properly in v3: instead of touching writing_transaction,
>> added a memqcache-specific guard that checks whether the current
>> dml_adaptive* session has tracked writes in the current
>> transaction, and skips the cache fetch if so.
>>
>> Attached: v3-0001-Feature-load-balancing-control-by-table-tracking.patch
>>
>> Changes in v3 vs v2:
>>
>> - pool_set_writing_transaction() reverted to original behavior
>>   (dml_adaptive_global no longer sets writing_transaction, so
>>   routing stays per-table as intended).
>>
>> - Added new helper pool_has_dml_adaptive_write_in_transaction()
>>   in pool_session_context.c. Returns true when the current session
>>   is in dml_adaptive* mode, is inside an explicit transaction, and
>>   has already tracked at least one write (via
>>   transaction_temp_write_list).
>>
>> - The two memqcache fetch guards in pool_proto_modules.c
>>   (simple query at line 270, extended query at line 1028) now
>>   also call !pool_has_dml_adaptive_write_in_transaction().
>>   Autocommit writes in dml_adaptive_global are still handled by
>>   the existing pool_invalidate_query_cache() at COMMIT time --
>>   no change needed there.
>>
>> Verified locally by mutating 006.memqcache with
>> disable_load_balance_on_write = 'dml_adaptive_global' in the
>> streaming replication mode (the only mode where dml_adaptive
>> applies) and the jdbctest now correctly returns "2" instead of
>> the stale cached "1". Both 006.memqcache and 043.track_table_mutation
>> pass.
>>
>> Thanks!
>>
>> On Thu, Apr 23, 2026 at 11:14 AM Tatsuo Ishii <[email protected]>
>> wrote:
>>
>>> Hi Nadav,
>>>
>>> Unfortunately the mutated 006.memqcache failed (timeout).
>>>
>>> >> > memqcache bug fix
>>> >> > -----------------
>>> >> >
>>> >> > Good catch. The root cause: pool_set_writing_transaction() was
>>> >> > explicitly skipping dml_adaptive_global, so
>>> >> > pool_is_writing_transaction() always returned false in this mode.
>>> >> > The query cache fetch guard at pool_proto_modules.c:270
>>> >> > (!pool_is_writing_transaction()) then served stale cached results
>>> >> > after DML in the same transaction.
>>> >> >
>>> >> > Fix: pool_set_writing_transaction() now sets the flag for
>>> >> > dml_adaptive_global (only 'off' and 'dml_adaptive' skip it). This
>>> >> > ensures the query cache is properly bypassed after writes within
>>> >> > the same transaction.
>>>
>>> Regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS K.K.
>>> English: http://www.sraoss.co.jp/index_en/
>>> Japanese:http://www.sraoss.co.jp
>>>
>>> > Hi Tatsuo,
>>> >
>>> > Rebased onto current master, renumbered the regression tests
>>> > (043/044 to avoid collision with the new 042.ssl_reload), and
>>> > combined everything into a single commit.
>>> >
>>> > Attached: v2-0001-Feature-load-balancing-control-by-table-tracking.patch
>>> >
>>> > Looking forward to your review.
>>> >
>>> >
>>> > On Sun, Apr 19, 2026 at 10:25 AM Tatsuo Ishii <[email protected]>
>>> wrote:
>>> >
>>> >> > Hi Tatsuo,
>>> >> >
>>> >> > hank you for the detailed review. Attached patch addresses all items.
>>> >>
>>> >> I guess the attached patch is on top of
>>> >> v1-0001-Feature-load-balancing-control-by-table-tracking.patch.  To
>>> >> apply v2-0001-address-review.patch, we need to apply
>>> >> v1-0001-Feature-load-balancing-control-by-table-tracking.patch first.
>>> >> Unfortunately due to recent commit, it does not apply anymore. Can you
>>> >> please provide v1 + v2 that are rebased against latest master branch?
>>> >> Also 042 regression test is already used by recent commit. Can you
>>> >> renumber 042.track_table_mutation and
>>> >> 043.track_table_mutation_watchdog to 043.track_table_mutation and
>>> >> 044.track_table_mutation_watchdog accordingly?
>>> >>
>>> >> Looking forward to seeing new patch.
>>> >>
>>> >> Regards,
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS K.K.
>>> >> English: http://www.sraoss.co.jp/index_en/
>>> >> Japanese:http://www.sraoss.co.jp
>>> >>
>>> >>
>>> >> > memqcache bug fix
>>> >> > -----------------
>>> >> >
>>> >> > Good catch. The root cause: pool_set_writing_transaction() was
>>> >> > explicitly skipping dml_adaptive_global, so
>>> >> > pool_is_writing_transaction() always returned false in this mode.
>>> >> > The query cache fetch guard at pool_proto_modules.c:270
>>> >> > (!pool_is_writing_transaction()) then served stale cached results
>>> >> > after DML in the same transaction.
>>> >> >
>>> >> > Fix: pool_set_writing_transaction() now sets the flag for
>>> >> > dml_adaptive_global (only 'off' and 'dml_adaptive' skip it). This
>>> >> > ensures the query cache is properly bypassed after writes within
>>> >> > the same transaction.
>>> >> >
>>> >> > Removed dead query parse cache code (~700 lines)
>>> >> > -------------------------------------------------
>>> >> >
>>> >> > You're right -- pool_track_table_mutation_get_cached_parse,
>>> >> > pool_track_table_mutation_cache_parse, and
>>> >> > pool_track_table_mutation_normalize_and_hash were never called.
>>> >> > These were leftover from an earlier design where we planned to
>>> >> > cache SQL parse results in shared memory. The feature ended up
>>> >> > using pgpool's existing parser directly, and this code was never
>>> >> > wired up.
>>> >> >
>>> >> > Removed: QueryParseCache and QueryParseEntry structs, all related
>>> >> > static functions, the TRACK_TABLE_MUTATION_QUERY_SEM semaphore,
>>> >> > and the track_table_mutation_query_buckets /
>>> >> > track_table_mutation_query_parse_cache_size configuration
>>> >> > parameters. This also reduces shared memory usage from ~6.4 MB
>>> >> > to ~80 KB with default settings.
>>> >> >
>>> >> > check_object_relationship_list scope
>>> >> > -------------------------------------
>>> >> >
>>> >> > You're correct -- dml_adaptive_global does not use
>>> >> > dml_adaptive_object_relationship_list. Changed
>>> >> > check_object_relationship_list() to check for DLBOW_DML_ADAPTIVE
>>> >> > only, not DLBOW_IS_DML_ADAPTIVE (which includes global).
>>> >> >
>>> >> > Documentation fixes
>>> >> > -------------------
>>> >> >
>>> >> > - Removed "(Lagless Replica Reads)" from section title and
>>> >> >   "lagless" language from description.
>>> >> >
>>> >> > - Described fallback behavior when neither
>>> >> >   replication_delay_source_cmd nor delay_threshold_by_time is
>>> >> >   configured (TTL stays at 100ms default minimum).
>>> >> >
>>> >> > - "query cache" references removed (the query parse cache is gone).
>>> >> >
>>> >> > - Added 128-table-per-SELECT limit to Limitations section
>>> >> >   (uses POOL_MAX_SELECT_OIDS).
>>> >> >
>>> >> > Code style fixes
>>> >> > ----------------
>>> >> >
>>> >> > - DLBOW_IS_DML_ADAPTIVE() calls no longer split across lines.
>>> >> >
>>> >> > - Split the long errmsg line in
>>> >> >   is_select_object_in_temp_write_list.
>>> >> >
>>> >> > - Removed redundant is_adaptive variable in
>>> >> >   is_select_object_in_temp_write_list (the check at function
>>> >> >   entry already guarantees it).
>>> >> >
>>> >> > Thanks!
>>> >> >
>>> >> > On Wed, Apr 15, 2026 at 1:43 AM Tatsuo Ishii <[email protected]>
>>> >> wrote:
>>> >> >
>>> >> >> Hi Nadav,
>>> >> >>
>>> >> >> > Hi Tatsuo,
>>> >> >> >
>>> >> >> > Looks good to me thanks!
>>> >> >> >
>>> >> >> > Please go ahead with your review. waiting to hear back from you.
>>> >> >>
>>> >> >> Here are the code review results.
>>> >> >>
>>> >> >> diff --git a/doc/src/sgml/loadbalance.sgml
>>> >> b/doc/src/sgml/loadbalance.sgml
>>> >> >> index 9e1e7b39b..7384ce81a 100644
>>> >> >> --- a/doc/src/sgml/loadbalance.sgml
>>> >> >> +++ b/doc/src/sgml/loadbalance.sgml
>>> >> >> :
>>> >> >> + <sect2 id="runtime-config-table-mutation-map">
>>> >> >> +  <title>Table Mutation Map Configuration (Lagless Replica
>>> >> Reads)</title>
>>> >> >>
>>> >> >> "(Lagless Replica Reads)" sounds like an advertisement to me. It
>>> >> >> should be removed.
>>> >> >>
>>> >> >> +  <para>
>>> >> >> +   These parameters configure the track table mutation feature,
>>> which
>>> >> is
>>> >> >> activated by setting
>>> >> >> +   <xref linkend="guc-disable-load-balance-on-write"> to
>>> >> >> <literal>dml_adaptive_global</literal>.
>>> >> >> +   The feature tracks recently written tables to prevent stale
>>> reads
>>> >> from
>>> >> >> replica nodes during
>>> >> >> +   replication lag, implementing the "lagless" architecture
>>> pattern for
>>> >> >> distributed systems
>>> >> >> +   with read replicas.
>>> >> >>
>>> >> >> I think the feature does not guarantee "lagless" anytime, in all
>>> cases.
>>> >> >>
>>> >> >> +  <para>
>>> >> >> +   This feature requires time-based replication delay monitoring.
>>> This
>>> >> >> can be provided by either
>>> >> >> +   <xref linkend="guc-replication-delay-source-cmd"> (external
>>> command
>>> >> >> mode) or by setting
>>> >> >> +   <xref linkend="guc-delay-threshold-by-time"> (which uses
>>> >> >> <literal>pg_stat_replication.replay_lag</literal>
>>> >> >> +   from PostgreSQL 10+). At least one of these must be configured
>>> for
>>> >> the
>>> >> >> TTL calculation to work.
>>> >> >>
>>> >> >> If one of these is not set, what happens? Error? Need to describe
>>> it.
>>> >> >>
>>> >> >> +  </para>
>>> >> >> +
>>> >> >> +  <warning>
>>> >> >> +   <para>
>>> >> >> +    Enabling <literal>dml_adaptive_global</literal> increases
>>> shared
>>> >> >> memory consumption. With default settings,
>>> >> >> +    the feature requires approximately 6.4 MB of shared memory
>>> (0.1 MB
>>> >> >> for table tracking + 6.3 MB for query cache).
>>> >> >>
>>> >> >> "query cache" should be "query parse cache".
>>> >> >>
>>> >> >> +    Memory usage scales with configuration parameters:
>>> >> >> +   </para>
>>> >> >> +   <itemizedlist>
>>> >> >> +    <listitem>
>>> >> >> +     <para>
>>> >> >> +      Table tracking: <literal>track_table_mutation_table_size * 40
>>> >> >> bytes</literal> (default: 2048 * 40 = ~80 KB)
>>> >> >> +     </para>
>>> >> >> +    </listitem>
>>> >> >> +    <listitem>
>>> >> >> +     <para>
>>> >> >> +      Query cache:
>>> >> <literal>track_table_mutation_query_parse_cache_size *
>>> >> >> 640 bytes</literal> (default: 10000 * 640 = ~6.3 MB)
>>> >> >>
>>> >> >> "query cache" should be "query parse cache".
>>> >> >>
>>> >> >> +   <title>Limitations</title>
>>> >> >>
>>> >> >> I think number of tables tacked in a SELECT is limited to 8. It
>>> should
>>> >> >> be mentioned.
>>> >> >>
>>> >> >> diff --git a/src/context/pool_query_context.c
>>> >> >> b/src/context/pool_query_context.c
>>> >> >> index a056ac596..0190d3673 100644
>>> >> >> --- a/src/context/pool_query_context.c
>>> >> >> +++ b/src/context/pool_query_context.c
>>> >> >> @@ -1828,15 +1829,23 @@ is_in_list(char *name, List *list)
>>> >> >>  static bool
>>> >> >>  is_select_object_in_temp_write_list(Node *node, void *context)
>>> >> >>  {
>>> >> >> -       if (node == NULL ||
>>> pool_config->disable_load_balance_on_write
>>> >> !=
>>> >> >> DLBOW_DML_ADAPTIVE)
>>> >> >> +       if (node == NULL ||
>>> >> >> +               !DLBOW_IS_DML_ADAPTIVE(
>>> >> >> +
>>> >> >> pool_config->disable_load_balance_on_write))
>>> >> >>
>>> >> >> You don't need to split the line.
>>> >> >>
>>> >> >> +               is_adaptive = DLBOW_IS_DML_ADAPTIVE(
>>> >> >> +
>>> >> >>              pool_config->disable_load_balance_on_write);
>>> >> >>
>>> >> >> You don't need to split the line.
>>> >> >>
>>> >> >> -               if (pool_config->disable_load_balance_on_write ==
>>> >> >> DLBOW_DML_ADAPTIVE && session_context->is_in_transaction)
>>> >> >> +               if (is_adaptive &&
>>> >> >> +                       session_context->is_in_transaction)
>>> >> >>                 {
>>> >> >>                         ereport(DEBUG1,
>>> >> >>
>>> >> >> (errmsg("is_select_object_in_temp_write_list: \"%s\", found relation
>>> >> >> \"%s\"", (char *) context, rgv->relname)));
>>> >> >> This line is too long. Please split.
>>> >> >>
>>> >> >> @@ -1880,7 +1889,13 @@ static char
>>> >> >> *get_associated_object_from_dml_adaptive_relations
>>> >> >>  void
>>> >> >>  check_object_relationship_list(char *name, bool is_func_name)
>>> >> >>  {
>>> >> >> -       if (pool_config->disable_load_balance_on_write ==
>>> >> >> DLBOW_DML_ADAPTIVE &&
>>> >> >> pool_config->parsed_dml_adaptive_object_relationship_list)
>>> >> >> +       bool            is_adaptive;
>>> >> >> +
>>> >> >> +       is_adaptive = DLBOW_IS_DML_ADAPTIVE(
>>> >> >> +
>>> >> >>      pool_config->disable_load_balance_on_write);
>>> >> >>
>>> >> >> I wrote in the commit message:
>>> >> >>
>>> >> >> modifications are only detected in the same transaction). Note,
>>> >> >> however, you cannot use dml_adaptive_object_relationship_list to
>>> track
>>> >> >> dependency among table and other objects.
>>> >> >>
>>> >> >> In my understanding the feature does not use
>>> >> >> dml_adaptive_object_relationship_list. If this is correct, why
>>> >> >> check_object_relationship_list() is called here in case
>>> >> >> dml_adaptive_global?  If the feature uses
>>> >> >> dml_adaptive_object_relationship_list, test cases should be
>>> included.
>>> >> >>
>>> >> >> diff --git a/src/utils/pool_track_table_mutation.c
>>> >> >> b/src/utils/pool_track_table_mutation.c
>>> >> >> new file mode 100644
>>> >> >> index 000000000..9be46b28f
>>> >> >> --- /dev/null
>>> >> >> +++ b/src/utils/pool_track_table_mutation.c
>>> >> >>
>>> >> >> It seems following functions are not used anywhere. I wonder if this
>>> >> >> feature actually use "query parse cache".
>>> >> >>
>>> >> >> pool_track_table_mutation_get_cached_parse
>>> >> >> pool_track_table_mutation_cache_parse
>>> >> >> pool_track_table_mutation_normalize_and_hash
>>> >> >>
>>> >> >> Besides the code review, I mutated one of regression tests to check
>>> >> >> whether the feature co exists with in the existing memory query
>>> cache
>>> >> >> feature. After attached patch applied, I ran 006.memqcache and got
>>> the
>>> >> >> following result.
>>> >> >>
>>> >> >> cd src/test/regression
>>> >> >> ./regress.sh 006
>>> >> >> creating pgpool-II temporary installation ...
>>> >> >> moving pgpool_setup to temporary installation path ...
>>> >> >> moving watchdog_setup to temporary installation path ...
>>> >> >> using pgpool-II at
>>> >> >>
>>> >>
>>> /home/t-ishii/work/Pgpool-II/current/pgpool2/src/test/regression/temp/installed
>>> >> >> *************************
>>> >> >> REGRESSION MODE          : install
>>> >> >> Pgpool-II version        : pgpool-II version 4.8devel
>>> (mitsukakeboshi)
>>> >> >> Pgpool-II install path   :
>>> >> >>
>>> >>
>>> /home/t-ishii/work/Pgpool-II/current/pgpool2/src/test/regression/temp/installed
>>> >> >> PostgreSQL bin           : /usr/local/pgsql/bin
>>> >> >> PostgreSQL Major version : 18
>>> >> >> pgbench                  : /usr/local/pgsql/bin/pgbench
>>> >> >> PostgreSQL jdbc          :
>>> >> >> /usr/local/pgsql/share/postgresql-9.2-1003.jdbc4.jar
>>> >> >> *************************
>>> >> >> testing 006.memqcache...failed.
>>> >> >> out of 1 ok:0 failed:1 timeout:0
>>> >> >>
>>> >> >> log/006.memqcache shows:
>>> >> >>
>>> >> >> ../expected.txt result.txt differ: char 1, line 1
>>> >> >>
>>> >> >> So I checked the test script and found the error was generated by a
>>> >> >> Java program test.
>>> >> >>
>>> >> >>         java jdbctest > result.txt 2>&1
>>> >> >>         cmp ../expected.txt result.txt
>>> >> >>         if [ $? != 0 ];then
>>> >> >>                 ./shutdownall
>>> >> >>                 exit 1
>>> >> >>         fi
>>> >> >>
>>> >> >> In jdbctest.java:
>>> >> >>
>>> >> >>                 /*
>>> >> >>                  * Cache test in an explicit transaction
>>> >> >>                  */
>>> >> >>                 conn.setAutoCommit(false);
>>> >> >>                 // execute DML. This should prevent SELECTs from
>>> using
>>> >> >> query cache in the transaction.
>>> >> >>                 sql = "UPDATE t1 SET i = 2;";
>>> >> >>                 pst = conn.createStatement();
>>> >> >>                 pst.executeUpdate(sql);
>>> >> >>                 pst.close();
>>> >> >>                 // should not use the cache and should return "2",
>>> >> rather
>>> >> >> than "1"
>>> >> >>                 prest = conn.prepareStatement("SELECT * FROM t1");
>>> >> >>                 rs = prest.executeQuery();
>>> >> >>
>>> >> >> The expected file (expected.txt) has "2" but the result file
>>> >> >> (testdir/result.txt) was "1".  This is the reason why the test
>>> >> >> failed. I wonder if there's something wrong with the feature when
>>> the
>>> >> >> query cache is enabled. Can you look into this?
>>> >> >>
>>> >> >> Regards,
>>> >> >> --
>>> >> >> Tatsuo Ishii
>>> >> >> SRA OSS K.K.
>>> >> >> English: http://www.sraoss.co.jp/index_en/
>>> >> >> Japanese:http://www.sraoss.co.jp
>>> >> >>
>>> >> >
>>> >> >
>>> >> > --
>>> >> > Nadav Shatz
>>> >> > Tailor Brands | CTO
>>> >>
>>> >
>>> >
>>> > --
>>> > Nadav Shatz
>>> > Tailor Brands | CTO
>>>
>>
>>
>> --
>> Nadav Shatz
>> Tailor Brands | CTO
>>
> 
> 
> -- 
> Nadav Shatz
> Tailor Brands | CTO


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Proposal: Recent mutated table tracking in memory
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox