On Thu, Aug 21, 2025 at 8:04 AM Tatsuo Ishii <ishii@postgresql.org> wrote:

Hi Nadav,

> Hi Tatsuo,
>
> Thank you for your reply, I agree with your approach. Better to get (1) out
> of the way first.
>
> As a simplest approach that we can implement that would support completely
> offloading the responsibility of the lag checking we can set it to “file”
> and add another config for file path. Or just if starts with “file:” it’ll
> understand.

My concern about the "file:" approach is, race condition. What if
pgpool reads the file while it is being updated by someone else? Also
I think the command approach is more flexible and generic. For
example, the "file approch" can be easily simulated by setting the
command "/usr/bin/cat path_to_the_file".

> Then the internal polling can just read the file on schedule. The entire
> updating mechanism will be left to the external service.

Internal polling is a little bit complicated and will not be easily
changed to just reading a file. The internal polling has two options:
one is checking WAL LSN difference, the other is replication delay in
time. The file approch would only replace the latter. I suggest to
leave the internal polling code as it is.

> Having this as a first step also opens up the door for other
> implementations.
>
> Another classic option would be calling an API endpoint. But that might
> come with a lot more bulk and security concerns.

I agree that calling API could bring security concerns.

BTW, in the command approch, the command should be executed as
sr_check_user.

> I suggest I work on a patch for file support.
>
> What do you think?

For the reason above I prefer the command approch, not the file
support.

> Nadav Shatz
> Tailor Brands | CTO
>
>
> On Wed, Aug 20, 2025 at 3:45 PM Tatsuo Ishii <ishii@postgresql.org> wrote:
>
>> Hi Nadav,
>>
>> Thank you for the answer.
>>
>> I think your proposal actually includes two orthogonal proposals.
>>
>> (1) "inject" replication delay value from external source (in your
>> case from Aurora).
>>
>> (2) per relation recent access based routing.
>>
>> I suggest to implement (1) first, then (2). This incremental approach
>> would be easier than implementing (1)+(2) at once.
>>
>> For (1) we could add new pgpool.conf parameter, say
>> "replication_delay_source". If it is set to "builtin", then
>> replication delay source is PostgreSQL as we already does today. If
>> it's set other than "builtin", then it's an external command name (+
>> arguments) to be executed to import replication delay value. The
>> command should return replication delay value represented in strings
>> like "0 20 10", which means node 0, 1 and 2 replication delay values
>> in millisecond (in this case since the node 0 is primary, its
>> replication delay is 0). The command will be invoked every
>> sr_check_period.
>>
>> I am not sure if this actually works in Aurora. This is just a quick
>> idea.
>>
>> (2) would be probably much harder than (1). So we need more discussion
>> later on.
>>
>> Best regards,
>> --
>> Tatsuo Ishii
>> SRA OSS K.K.
>> English: http://www.sraoss.co.jp/index_en/
>> Japanese:http://www.sraoss.co.jp
>>