Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1utMRP-005cR3-Qi for pgpool-hackers@arkaria.postgresql.org; Tue, 02 Sep 2025 08:32:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1utMRP-002lmJ-1c for pgpool-hackers@arkaria.postgresql.org; Tue, 02 Sep 2025 08:32:35 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1utMRO-002lmB-JG for pgpool-hackers@lists.postgresql.org; Tue, 02 Sep 2025 08:32:35 +0000 Received: from mail-yw1-x112d.google.com ([2607:f8b0:4864:20::112d]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1utMRM-00086o-1r for pgpool-hackers@lists.postgresql.org; Tue, 02 Sep 2025 08:32:33 +0000 Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-72267c0505fso25211747b3.2 for ; Tue, 02 Sep 2025 01:32:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=tailorbrands.com; s=google; t=1756801952; x=1757406752; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=/u2xRn8JSP6NKeHr6f48bNMNhzZJchcJgsQWI1gEFBA=; b=NeMv5xR6XWNX/z507HE7DXtWc5X3aCF1nkROeravZ6+dBdKeUjYeOrkwuczoWF7/Xg HZWwR/yg3C7JKyxcizwUQowmgWjA3HuEf6+o38SVo7GVsFzZ5615/K9uSrno5dDHioES 5c5A4CO70wTJ1+1hOi4Ft4sPK9NqwLSUhaXId1UVOpp+NiquG9cPoj6EfYgYkAIAhmgX mt5yB+pfBAoJgaZj1AJuq27kFXJq5d1tFZWa52AHZ+6vbgJwqeek6rFQRJIYnEuIAhy0 TmGsU2kySMtO+eVvJ+nU07MiEiAn1Pm9PDGPttrAMhpvlZaVZshrEDV7TvifrFRimK7+ hxOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1756801952; x=1757406752; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/u2xRn8JSP6NKeHr6f48bNMNhzZJchcJgsQWI1gEFBA=; b=pI565yhUZPpZkLXGHRe2Zt0mVop1a73hP0e3IzyvU+v4ec4tQg9KCrO9ASDJmHWMlg zHGyyUwoHHldD54m9KL63C7XaZeCn4U7MPd2xNd2HlEim5isEXCJ0142BY8nm+uWcN0H 3382TXQ8IFLW+A6l8DNQPg1mWPDZQLxdZUJsnH2RqMt3Kngvmb0i+IgOWOhSOE7zLB0Q hnq4R7OMvwKaEthZeiXrcbh77haU1FyN6Yge50svQ0TssThMk/yxkOlwiE/dMbt392du AysyVJCP5o7+tT0yUeH/dTW+DGAaTiYxdTJZgZskWDO/1dHh2SOA1271dsvjasZ+gAan /VNg== X-Gm-Message-State: AOJu0Yx8ZrMIggUPnhTu7was4LmampGAdBaaoZg2TOXncPZaS/hSqOYE 9MMK1qFCl+xcAu4XmWLlxBnPj51LdoSkWVmonmtd386IKTpjrRjYL24PeYU8sieyyRuSC69UoVj M4Vg7TdqbvPyRFd57o3MvwkVuZqi6kxG3xIZ3yTJ0ZaZ7acQQys2DRHsv7w== X-Gm-Gg: ASbGncuyXbuRz3IRFmfpNrYP6BpgcibrjtaHq1AQULxqkph5Bjho32ecZyHdXKJRgzB gzE2uq0zvWFi4NsqAOmj3A8hfb1DfXezgrCkHE660jpu3S9itfdhKSqWn+hXNXEiFQ3CByfBm3v VjScoqztSVNOjSCn/WazQWTCI7X9D1iDlcHC7Cre0w0xbt8UzOaqknEQtNz2QAa4M94L23EB/Ih qi7kg== X-Google-Smtp-Source: AGHT+IHjfD7IKNgK0qO+ro47YFm6e9yUGOdj+cYtt8DOE5r6W3EMW7tUxlV2J103SFTQnCmvA/cIVu91+AapSpX9Dyg= X-Received: by 2002:a05:690c:640a:b0:721:aca:b1e3 with SMTP id 00721157ae682-722763ba27bmr129264867b3.24.1756801949819; Tue, 02 Sep 2025 01:32:29 -0700 (PDT) MIME-Version: 1.0 References: <20250826.104155.1065705000208094250.ishii@postgresql.org> <20250902.074118.343005423574242823.ishii@postgresql.org> In-Reply-To: <20250902.074118.343005423574242823.ishii@postgresql.org> From: Nadav Shatz Date: Tue, 2 Sep 2025 11:32:18 +0300 X-Gm-Features: Ac12FXzH5cTvpNFh7gIYnkHBOfOR-OKOHIPk6A-MbagqiXsoReWzv4vYEF1i_Vs Message-ID: Subject: Re: Proposal: recent access based routing for primary-replica setups To: Tatsuo Ishii Cc: pgpool-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000290a4d063dcd54ec" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000290a4d063dcd54ec Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, All good. The usual way the Pgpool accesses the lag params is through the relevant tables in the DB. for aurora that isn't available. The numbers are available directly from AWS API calls tho. This solution will work with Aurora by circumventing this issue. What i mentioned as a concern is that since the command doesn't currently accept the actual DB instance list (primary/replicas) and their order it can't guarantee it'll return the lag values in the expected order. Except the primary being the first, how will the running command know the order in which pgpool has loaded the replicas into it's memory/configuration? Hope this makes more sense - if not let me know and i'll provide some examples. Thanks, On Tue, Sep 2, 2025 at 1:41=E2=80=AFAM Tatsuo Ishii = wrote: > Hi Nadav, > > Sorry for late reply. > > >> I haven=E2=80=99t tried it yet but the whole premise of having it run = a command > is > >> that it=E2=80=99s not dependent on the specific DB. As you mentioned e= arlier. > >> > >> The issue blocking the regular lag extraction from aurora is that it > >> doesn=E2=80=99t update the tables in the DB. It does have a CloudWatch= API to > get > >> the numbers tho. > > I am not familiar with CloudWatch API and am not sure I fully > understand you issue. What is your issue with CloudWatch API? Is it a > technical problem, or some cost issue (I guess CloudWatch is a paid > service)? > > >> Ordering here could get tricky since we couple the command with the > >> instance order. > >> > >> Maybe we can expand the command to receive some arguments as to instan= ce > >> order. > > Can you elaborate what "ordering" is? > > Best regards, > -- > Tatsuo Ishii > SRA OSS K.K. > English: http://www.sraoss.co.jp/index_en/ > Japanese:http://www.sraoss.co.jp > > > Hi Tatsuo, > > > > I don't want to rush at all - did you get a chance to look at what I > sent? > > Can I share more relevant information with you? > > > > What do you think? > > > > On Tue, Aug 26, 2025 at 9:54=E2=80=AFAM Nadav Shatz > wrote: > > > >> Hi Tatsuo, > >> > >> I haven=E2=80=99t tried it yet but the whole premise of having it run = a command > is > >> that it=E2=80=99s not dependent on the specific DB. As you mentioned e= arlier. > >> > >> The issue blocking the regular lag extraction from aurora is that it > >> doesn=E2=80=99t update the tables in the DB. It does have a CloudWatch= API to > get > >> the numbers tho. > >> > >> You can see the metric AuroraReplicaLag under > >> > >> > https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/metrics-refe= rence.html > >> > >> So if we have a simple command to either get it or have something else > >> update a file with the numbers based on it we=E2=80=99ll be fine. > >> > >> Ordering here could get tricky since we couple the command with the > >> instance order. > >> > >> Maybe we can expand the command to receive some arguments as to instan= ce > >> order. > >> > >> What do you think? > >> > >> > >> Nadav Shatz > >> Tailor Brands | CTO > >> > >> > >> On Tue, Aug 26, 2025 at 4:42=E2=80=AFAM Tatsuo Ishii > wrote: > >> > >>> Hi Nadav, > >>> > >>> Thank you for updating the patch. I will look into that. > >>> > >>> I have a question. Have you actually tried the patch with AWS Aurora? > >>> I am wondering how patched pgpool works with Aurora. I am asking > >>> because in the doc "8.5. Aurora Configuration Example": > >>> > >>> Set sr_check_period to 0 to disable streaming replication delay > >>> checking. This is because Aurora does not provide necessary function= s > >>> to check the replication delay. > >>> > >>> sr_check_period =3D 0 > >>> > >>> So streaming replication checking is disabled, and it means that your > >>> patch is also effectively disabled too. > >>> > >>> Best regards, > >>> -- > >>> Tatsuo Ishii > >>> SRA OSS K.K. > >>> English: http://www.sraoss.co.jp/index_en/ > >>> Japanese:http://www.sraoss.co.jp > >>> > >>> > Hi Tatsuo, > >>> > > >>> > Thank you for the notes - please find attached an updated version. > >>> > > >>> > What do you think? > >>> > > >>> > Thanks, > >>> > > >>> > On Mon, Aug 25, 2025 at 5:18=E2=80=AFAM Tatsuo Ishii > >>> wrote: > >>> > > >>> >> Hi Nadav, > >>> >> > >>> >> Thank you for the patch! > >>> >> > >>> >> I have one question. How do you provide a password > (sr_check_password) > >>> >> while executing replication_delay_source_cmd as sr_check_user? In = my > >>> >> understanding replication_delay_source_cmd is executed through su > >>> >> command in your patch. In this case su command tries to read the > >>> >> password from terminal. I don't see such a code in the patch. > >>> >> > >>> >> BTW, I start to think that executing replication_delay_source_cmd = as > >>> >> sr_check_user might not be a good idea. sr_check_user is a databas= e > >>> >> user, not OS user. In PostgreSQL they are not necessarily the > >>> >> same. Also doing su in pgpool process needs to be very carefully t= o > >>> >> avoid vulnerability. Probably we just execute it as pgpool OS user= ? > >>> >> > >>> >> Lastly when I apply the patches using git apply, there are some > >>> >> trailing space errors. > >>> >> > >>> >> $ git apply ~/external-lag-feature-implementation.patch > >>> >> /home/t-ishii/external-lag-feature-implementation.patch:314: > trailing > >>> >> whitespace. > >>> >> > >>> >> /home/t-ishii/external-lag-feature-implementation.patch:317: > trailing > >>> >> whitespace. > >>> >> > >>> >> /home/t-ishii/external-lag-feature-implementation.patch:318: > trailing > >>> >> whitespace. > >>> >> cmd_len =3D strlen(escaped_cmd) + > >>> >> /home/t-ishii/external-lag-feature-implementation.patch:320: > trailing > >>> >> whitespace. > >>> >> > >>> >> /home/t-ishii/external-lag-feature-implementation.patch:322: > trailing > >>> >> whitespace. > >>> >> snprintf(full_command, cmd_len, "su - %s -= c > >>> '%s'", > >>> >> warning: squelched 4 whitespace errors > >>> >> warning: 9 lines add whitespace errors. > >>> >> > >>> >> $ git apply ~/external-lag-feature-tests.patch > >>> >> /home/t-ishii/external-lag-feature-tests.patch:87: trailing > whitespace. > >>> >> - test_parsing.sh: Unit test for parsing logic > >>> >> /home/t-ishii/external-lag-feature-tests.patch:440: trailing > >>> whitespace. > >>> >> # Test 2: Float values > >>> >> warning: 2 lines add whitespace errors. > >>> >> > >>> >> Also I have some compilation errors after patching the source > >>> >> code. See attached compilation log. > >>> >> > >>> >> Best regards, > >>> >> -- > >>> >> Tatsuo Ishii > >>> >> SRA OSS K.K. > >>> >> English: http://www.sraoss.co.jp/index_en/ > >>> >> Japanese:http://www.sraoss.co.jp > >>> >> > >>> > > >>> > > >>> > -- > >>> > Nadav Shatz > >>> > Tailor Brands | CTO > >>> > >> > > > > -- > > Nadav Shatz > > Tailor Brands | CTO > --=20 Nadav Shatz Tailor Brands | CTO --000000000000290a4d063dcd54ec Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

All good.

= The usual way the Pgpool accesses the lag params is through the relevant ta= bles in the DB. for aurora that isn't available.
The numbers = are available directly from AWS API calls tho. This solution will work with= Aurora by circumventing this issue.

What i mentio= ned as a concern is that since the command doesn't currently accept the= actual=C2=A0DB instance list (primary/replicas) and their order it can'= ;t guarantee=C2=A0it'll return the lag values in the expected order.

Except the primary being the first, how will the run= ning command know the order in which pgpool has loaded the replicas into it= 's memory/configuration?

Hope this makes more = sense - if not let me know and i'll provide some examples.
Thanks,

On Tue, Sep 2, 2025 at 1:41= =E2=80=AFAM Tatsuo Ishii <ishii@= postgresql.org> wrote:
Hi Nadav,

Sorry for late reply.

>> I haven=E2=80=99t tried it yet but the whole premise of having it = run a command is
>> that it=E2=80=99s not dependent on the specific DB. As you mention= ed earlier.
>>
>> The issue blocking the regular lag extraction from aurora is that = it
>> doesn=E2=80=99t update the tables in the DB. It does have a CloudW= atch API to get
>> the numbers tho.

I am not familiar with CloudWatch API and am not sure I fully
understand you issue. What is your issue with CloudWatch API?=C2=A0 Is it a=
technical problem, or some cost issue (I guess CloudWatch is a paid
service)?

>> Ordering here could get tricky since we couple the command with th= e
>> instance order.
>>
>> Maybe we can expand the command to receive some arguments as to in= stance
>> order.

Can you elaborate what "ordering" is?

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

> Hi Tatsuo,
>
> I don't want to rush at all - did you get a chance to look at what= I sent?
> Can I share more relevant information with you?
>
> What do you think?
>
> On Tue, Aug 26, 2025 at 9:54=E2=80=AFAM Nadav Shatz <nadav@tailorbrands.com>= ; wrote:
>
>> Hi Tatsuo,
>>
>> I haven=E2=80=99t tried it yet but the whole premise of having it = run a command is
>> that it=E2=80=99s not dependent on the specific DB. As you mention= ed earlier.
>>
>> The issue blocking the regular lag extraction from aurora is that = it
>> doesn=E2=80=99t update the tables in the DB. It does have a CloudW= atch API to get
>> the numbers tho.
>>
>> You can see the metric AuroraReplicaLag under
>>
>> https://= docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide/metrics-reference.html=
>>
>> So if we have a simple command to either get it or have something = else
>> update a file with the numbers based on it we=E2=80=99ll be fine.<= br> >>
>> Ordering here could get tricky since we couple the command with th= e
>> instance order.
>>
>> Maybe we can expand the command to receive some arguments as to in= stance
>> order.
>>
>> What do you think?
>>
>>
>> Nadav Shatz
>> Tailor Brands | CTO
>>
>>
>> On Tue, Aug 26, 2025 at 4:42=E2=80=AFAM Tatsuo Ishii <ishii@postgresql.org= > wrote:
>>
>>> Hi Nadav,
>>>
>>> Thank you for updating the patch. I will look into that.
>>>
>>> I have a question. Have you actually tried the patch with AWS = Aurora?
>>> I am wondering how patched pgpool works with Aurora. I am aski= ng
>>> because in the doc "8.5. Aurora Configuration Example&quo= t;:
>>>
>>>=C2=A0 Set sr_check_period to 0 to disable streaming replicatio= n delay
>>>=C2=A0 checking. This is because Aurora does not provide necess= ary functions
>>>=C2=A0 to check the replication delay.
>>>
>>>=C2=A0 sr_check_period =3D 0
>>>
>>> So streaming replication checking is disabled, and it means th= at your
>>> patch is also effectively disabled too.
>>>
>>> Best regards,
>>> --
>>> Tatsuo Ishii
>>> SRA OSS K.K.
>>> English: http://www.sraoss.co.jp/index_en/
>>> Japanese:http://www.sraoss.co.jp
>>>
>>> > Hi Tatsuo,
>>> >
>>> > Thank you for the notes - please find attached an updated= version.
>>> >
>>> > What do you think?
>>> >
>>> > Thanks,
>>> >
>>> > On Mon, Aug 25, 2025 at 5:18=E2=80=AFAM Tatsuo Ishii <= ishii@postgresql.= org>
>>> wrote:
>>> >
>>> >> Hi Nadav,
>>> >>
>>> >> Thank you for the patch!
>>> >>
>>> >> I have one question. How do you provide a password (s= r_check_password)
>>> >> while executing replication_delay_source_cmd as sr_ch= eck_user? In my
>>> >> understanding replication_delay_source_cmd is execute= d through su
>>> >> command in your patch. In this case su command tries = to read the
>>> >> password from terminal. I don't see such a code i= n the patch.
>>> >>
>>> >> BTW, I start to think that executing replication_dela= y_source_cmd as
>>> >> sr_check_user might not be a good idea. sr_check_user= is a database
>>> >> user, not OS user. In PostgreSQL they are not necessa= rily the
>>> >> same. Also doing su in pgpool process needs to be ver= y carefully to
>>> >> avoid vulnerability. Probably we just execute it as p= gpool OS user?
>>> >>
>>> >> Lastly when I apply the patches using git apply, ther= e are some
>>> >> trailing space errors.
>>> >>
>>> >> $ git apply ~/external-lag-feature-implementation.pat= ch
>>> >> /home/t-ishii/external-lag-feature-implementation.pat= ch:314: trailing
>>> >> whitespace.
>>> >>
>>> >> /home/t-ishii/external-lag-feature-implementation.pat= ch:317: trailing
>>> >> whitespace.
>>> >>
>>> >> /home/t-ishii/external-lag-feature-implementation.pat= ch:318: trailing
>>> >> whitespace.
>>> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0cmd_len =3D strlen(escaped_cmd) +
>>> >> /home/t-ishii/external-lag-feature-implementation.pat= ch:320: trailing
>>> >> whitespace.
>>> >>
>>> >> /home/t-ishii/external-lag-feature-implementation.pat= ch:322: trailing
>>> >> whitespace.
>>> >>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0snprintf(full_command, cmd_len, "= ;su - %s -c
>>> '%s'",
>>> >> warning: squelched 4 whitespace errors
>>> >> warning: 9 lines add whitespace errors.
>>> >>
>>> >> $ git apply ~/external-lag-feature-tests.patch
>>> >> /home/t-ishii/external-lag-feature-tests.patch:87: tr= ailing whitespace.
>>> >> - test_parsing.sh: Unit test for parsing logic
>>> >> /home/t-ishii/external-lag-feature-tests.patch:440: t= railing
>>> whitespace.
>>> >> # Test 2: Float values
>>> >> warning: 2 lines add whitespace errors.
>>> >>
>>> >> Also I have some compilation errors after patching th= e source
>>> >> code. See attached compilation log.
>>> >>
>>> >> Best regards,
>>> >> --
>>> >> Tatsuo Ishii
>>> >> SRA OSS K.K.
>>> >> English: http://www.sraoss.co.jp/index_en/
>>> >> Japanese:
http://www.sraoss.co.jp
>>> >>
>>> >
>>> >
>>> > --
>>> > Nadav Shatz
>>> > Tailor Brands | CTO
>>>
>>
>
> --
> Nadav Shatz
> Tailor Brands | CTO


--
Nadav Shatz
<= font color=3D"#000000">Tailor Brands=C2=A0| CTO
--000000000000290a4d063dcd54ec--