Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFrM2-005dfw-27 for pgsql-hackers@arkaria.postgresql.org; Thu, 23 Apr 2026 10:32:19 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wFrM1-001Jxl-1U for pgsql-hackers@arkaria.postgresql.org; Thu, 23 Apr 2026 10:32:17 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFrM0-001Jxc-35 for pgsql-hackers@lists.postgresql.org; Thu, 23 Apr 2026 10:32:17 +0000 Received: from mail-pj1-x102e.google.com ([2607:f8b0:4864:20::102e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wFrLv-00000002OwC-3bfe for pgsql-hackers@lists.postgresql.org; Thu, 23 Apr 2026 10:32:16 +0000 Received: by mail-pj1-x102e.google.com with SMTP id 98e67ed59e1d1-3591cc98871so2967623a91.3 for ; Thu, 23 Apr 2026 03:32:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776940331; x=1777545131; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=3wCkQIG+gjCFCGt6/uTM6o2Zh3x5wUICzrfMuz6kC5Y=; b=EeqqfthYFSgnJhGi8wjkYDObBIKZRF1RuriJ+/K9j+cuw34S6AWVsi1xe+OVXfSxgg hEZTBWsCOxslKgG4B5oBtgPsx8kDcWVkWC+O4sicW2J5bmygbdWhBaeDNwLtd1f1BkMp xTwlYpLwAJ5bm+D4KzZRh8PaDntkg3GtA52F2mBmhWJqidx1YCZkd+XQtjeD6VA4rIyg rxcEnBhuWV7/oXtOBb0xxg3dqAMOe25Kk7SBdMOCV7aIqShk1XRMO6FabyEXfhoNdYtB Dy4EXcFt87XxQTfYKXuMol2tK98Xvi0hPFlOI2RrCiinTMUz0GScf3ranePO9hLzp5rn gkFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776940331; x=1777545131; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=3wCkQIG+gjCFCGt6/uTM6o2Zh3x5wUICzrfMuz6kC5Y=; b=k6yfLGp071E5j65sL8kb+BOyDFCcUaCwSURx7zJ0SDUej96wYPu3DMiboe1Vb+W0k3 Uj6FyljKai2CsyHzh4aDUOJMeTjrn+PVawjLCXA8utgp1GOK0ukgvAQdNQhpRdvEGMB8 db/LpryyLYtQ6Mvol5mdC7M/hzUAgnKTcs9k4IOdOcojhAT3r/tPY45ev4xfRC8n5OGW vLPk8UOZNknQrf2ZhyWR9z9vf7W8sRrXzXq7Sa+OcsiVZoyRg7TyvnuKYF1LE1/U6nyb sBA73PUuNG/K6gxKzqzLpVCP43u21CBuoPvfQmjGYC6azxOLHEVky0Tc9RaPwtya8LSW FY9g== X-Forwarded-Encrypted: i=1; AFNElJ9k+hMIvhOlda/zeW0Z6D/amRBQ0MXrdYLjGtXYtQY5uOJpp5zJE11BLLFTqZAZT9wCYMfkCvQUDfk2h73J@lists.postgresql.org X-Gm-Message-State: AOJu0YwA02jk9FhjzwW/MQyDzz8UQj/7VzOTI4DYjsbWWkGagf18eyI9 sS+nqTwja4AwIY4kDQ0x1Eh3Dxpy9nCiF7qqnbsRPISn0tSIPR11SNn4 X-Gm-Gg: AeBDieti4Dd6Ij9ATKWUQyeQmNxg2EkvsXT21XodqkUQIOzi5wq7kHjVEhJYPAd+/Gp /i22aWPQCPGRKjaPHGDZUeN8VnYGluuSOoF3RIvLzFZLQHIYNxShaBlHkyh4VWnZ2wGzTJrwOGn fCvS+P1eBOHI5MdAgMXwyXyZ3ob7/DBfj+mvKNM4TuujYihitlM4s7uvO9OCu/zjT0O2b3QosSS JrgRIPGpwidX3njONtXbg570q66Y7JstFkRb9pvqQjat8zmEN6UInu5koBTv0SNXVxBUXCoPV/N 6FsVpyh/tD/2dWfld7QvX8+7dKCluUeMscNFKTbFR4+IjpPyz1a0AFSQAZtV/tIlMzWsjQbm0WC 075lqTOYnMHjywq0lB5GSsh4NhvD8+el/gbTcKRk0MUHfxMX5xSa2sjd1KoNOu5pFFsDBIYd1aj WqG1N2o5Oqm70BuebcqgBHbhv3PrNO6wOz2PSiM2rrbw== X-Received: by 2002:a17:903:3d43:b0:2b0:5d60:7f3f with SMTP id d9443c01a7336-2b5f9ee90bdmr202524155ad.16.1776940330868; Thu, 23 Apr 2026 03:32:10 -0700 (PDT) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2b5fab20e80sm199745295ad.57.2026.04.23.03.32.08 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Apr 2026 03:32:10 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: PoC: Add condition variable support to WaitEventSetWait() From: Chao Li In-Reply-To: <256e0146-eeb4-443d-b2e0-7ee834ef330e@postgrespro.ru> Date: Thu, 23 Apr 2026 18:31:30 +0800 Cc: Xuneng Zhou , PostgreSQL Hackers Content-Transfer-Encoding: quoted-printable Message-Id: <3DEC425F-DB2A-4565-AAFC-2FA01B64A331@gmail.com> References: <63B9B6C7-BB04-4F6E-B3AC-6886DDC1C3FD@gmail.com> <00D3DE2F-10B8-4439-91B4-D1C24A98627A@gmail.com> <289B3400-74C9-4C1C-9882-2C6F3C515FE2@gmail.com> <256e0146-eeb4-443d-b2e0-7ee834ef330e@postgrespro.ru> To: Yura Sokolov X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Apr 23, 2026, at 17:52, Yura Sokolov = wrote: >=20 > 23.04.2026 11:15, Chao Li =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >>=20 >>=20 >>> On Apr 22, 2026, at 21:13, Yura Sokolov = wrote: >>>=20 >>> 22.04.2026 05:58, Xuneng Zhou =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >>>> Hi Yura, >>>>=20 >>>> On Wed, Apr 22, 2026 at 1:06=E2=80=AFAM Yura Sokolov = wrote: >>>>>=20 >>>>> 08.04.2026 11:22, Chao Li =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >>>>>>> On Apr 8, 2026, at 11:50, Chao Li = wrote: >>>>>>>> On Apr 2, 2026, at 15:38, Chao Li = wrote: >>>>>>>>> On Mar 31, 2026, at 16:59, Chao Li = wrote: >>>>>>>>>> On Mar 31, 2026, at 15:28, Chao Li = wrote: >>>>>>>>>>=20 >>>>>>>>>> Hi, >>>>>>>>>>=20 >>>>>>>>>> There is an XXX comment in WalSndWait(): >>>>>>>>>> ``` >>>>>>>>>> * XXX: A desirable future improvement would be to add support = for CVs >>>>>>>>>> * into WaitEventSetWait(). >>>>>>>>>> ``` >>>>>>>>>>=20 >>>>>>>>>> I have been exploring a possible approach for that. This = patch is a PoC that adds ConditionVariable support to WaitEventSet. This = v1 is mainly intended to gather feedback on the design, so I have only = done some basic testing so far, such as a normal logical replication = workflow. >>>>>>>>>>=20 >>>>>>>>>> I=E2=80=99d like to highlight a few key points about the = design: >>>>>>>>>>=20 >>>>>>>>>> 1. In the current WalSndWait(), although it prepares to sleep = on a ConditionVariable, it does not actually check whether the CV has = been signaled. In this PoC, I kept that same behavior. However, I tried = to make the WaitEventSet support for CVs generic, so that if we want to = add actual signal checking in the future, that would be possible. >>>>>>>>>>=20 >>>>>>>>>> 2. To keep the design generic, this patch introduces a new = wait event type, WL_CONDITION_VARIABLE. A WL_CONDITION_VARIABLE event = occupies a position in the event array, similar to latch and socket = events. When a CV is signaled, the corresponding WL_CONDITION_VARIABLE = event is returned in occurred_events. >>>>>>>>>>=20 >>>>>>>>>> 3. The WaitEventSet APIs AddWaitEventToSet() and = ModifyWaitEvent() are extended to support CVs by adding one more = parameter =E2=80=9Ccv" to both APIs. The downside of this approach is = that all call sites of these two APIs need to be updated. I also = considered adding separate APIs for CVs, such as = AddWaitEventToSetForCV() and ModifyWaitEventForCV(), since CVs do not = rely on the kernel and it might therefore make sense to decouple them = from socket and latch handling. But for v1, I chose the more generic = approach. I=E2=80=99d be interested in hearing comments on this part of = the design. >>>>>>>>>>=20 >>>>>>>>>> 4. One important point is that this patch extends the = WaitEventSet abstraction, not the underlying kernel wait primitives. A = ConditionVariable is still a userspace/shared-memory concept, but with = this design it can participate in the same waiting framework as sockets = and latches. I think that is useful because it allows mixed waits to be = handled through one interface. >>>>>>>>>>=20 >>>>>>>>>> Here is the v1 patch. >>>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> I just noticed that I missed checking in my last edit when = switching to the other branch, so attaching an updated v1. >>>>>>>>=20 >>>>>>>> PFA v2 - fixed a CI failure from contrib/postgres_fdw. >>>>>>>=20 >>>>>>> Fixed a CI test failure and rebased. >>>>>>=20 >>>>>> PFA v4 - try to fix a test failure on windows. >>>>>=20 >>>>> Good day, Chao Li. >>>>>=20 >>>>> ConditionalVariable works through the MyLatch. >>>>> And almost always there is single latch for one backend: >>>>> MyLatch =3D PGPROC->procLatch; >>>>>=20 >>>>> (Single exception I found is = XLogRecoveryCtl->recoveryWakeupLatch.) >>>>>=20 >>>>> And that is where ConditionVariablePrepareToSleep were used in = WalSndWait >>>>> but without check: FeBeWaitSet always waits for MyLatch. >>>>>=20 >>>>> Therefore, I don't clear get, how you suggest to simultaneously = wait >>>>> WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE? >>>>> How you will distinguish which one was fired? >>>>>=20 >>>>> It looks to my, WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE = had to be >>>>> mutual exclusive. At least unless ConditionVariable internal are = changed. >>>>>=20 >>>>> And check after waiting for conditional variable set had to be = placed into >>>>> WaitEventSetWaitBlock where all other such checks are. >>>>>=20 >>>>> All written above is just my humble opinion. >>>>> Excuse me if i'm too strict. >>>>>=20 >>>>=20 >>>> I think you're correct that ConditionVariableSignal() wakes the = target >>>> process by calling SetLatch(&proc->procLatch). So every CV signal = also >>>> sets MyLatch. The patch introduces ConditionVariableSignaled() to >>>> distinguish the two by checking CV wait-list membership (not the = latch >>>> state itself), so in principle the two sources are distinguishable. >>>=20 >>> Nope. >>> Patch may say if ConditionVariable was signalled. >>> But it cann't say if MyLatch were set as well without touching >>> ConditionVariable. >>> It is not better than current state in term of code clearness and = flawlessness. >>=20 >> Hi Yura, >>=20 >> Thank you for the review. This patch is marked as =E2=80=9CPoC=E2=80=9D= , so nothing has been finalized yet, and I am open to any comments and = suggestions. >>=20 >> In theory, ConditionVariable relies on MyLatch, and in most cases a = process has only one MyLatch, so your point that WL_LATCH_SET on MyLatch = and WL_CONDITION_VARIABLE should ideally be mutually exclusive makes = sense. >>=20 >> However, the current code does not really follow that principle. = Looking at the current code in WalSndWait(): >> ``` >> if (wait_event =3D=3D WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION) >> ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv); >> else if (MyWalSnd->kind =3D=3D REPLICATION_KIND_PHYSICAL) >> ConditionVariablePrepareToSleep(&WalSndCtl->wal_flush_cv); >> else if (MyWalSnd->kind =3D=3D REPLICATION_KIND_LOGICAL) >> ConditionVariablePrepareToSleep(&WalSndCtl->wal_replay_cv); >>=20 >> if (WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, wait_event) =3D=3D= 1 && >> (event.events & WL_POSTMASTER_DEATH)) >> { >> ConditionVariableCancelSleep(); >> proc_exit(1); >> } >>=20 >> ConditionVariableCancelSleep(); >> ``` >>=20 >> It calls ConditionVariablePrepareToSleep() on a specific CV, so = ideally only ConditionVariableSignal() on that specific CV should wake = the wait. >>=20 >> However, the current code actually relies on = WaitEventSetWait(FeBeWaitSet) waiting on MyLatch, where FeBeWaitSet = includes WL_LATCH_SET. So if someone calls SetLatch(the_proc_latch), the = wait will still wake up. The code does not check at all which CV was = actually signaled. >=20 > And looks like it doesn't need to. It seems to me, ConditionVariable = is > used just as convenient way to wake up many processes. Just as list of = latches. >=20 >> With this patch, we would be able to verify whether the CV was = signaled. The current PoC does not do that yet, because I simply = followed the existing behavior, but with the new WaitEventSetWait() = support, it would be easy to add such a check. >=20 > This check is just your function ConditionVariableSignaled . This = function > is only valuable addition of your patch. >=20 > But still: ConditionVariable has no value by itself. It is just a way = to > notify that some condition MAY BE changed. One had to recheck = condition it > waits any way, because ConditionVariable could be awaken spontanously. = If > you read comments in ConditionVariableBroadcast, you will see, it is > totally legal to signal "one more backend" if several broadcasts are > performed simultaneously. >=20 >> =46rom that perspective, I think the PoC is already better than the = current code. >=20 > I disagree. It is my personal opinion. >=20 >> As for your question, =E2=80=9CHow will you distinguish which one was = fired?=E2=80=9D: when FeBeWaitSet includes both WL_LATCH_SET and = WL_CONDITION_VARIABLE, and MyLatch is used for WL_LATCH_SET, then if the = CV is signaled, WL_LATCH_SET may also fire as a side effect. I think = that would be okay. Do you have a use case where that would lead to a = problem? >=20 > If you detect ConditionVariable was fired, how you will distinguish, = was > MyLatch set separately from ConditionVariable or not? > In current state of your patch there is no way. > So, WL_LATCH_SET | WL_CONDITIONAL_VARIABLE become meaningless. >=20 >> Again, this PoC version is still far from the final version. Any = discussion is very welcome. >=20 > So we discuss. >=20 > I repeat: ConditionVariable by itself is meaningless. It exists to = signal > about probably changed other condition. > Therefore, WL_LATCH_SET + ConditionVariableSignaled() is more than = enough, > imho. > I still don't see need in WL_CONDITION_VARIABLE. > And the place you did patch for is single and not representative. >=20 > If you find more places where it could be useful, then it will be = clearer > which way API should look like and which semantic it should implement. >=20 > imho. I could be wrong. >=20 > --=20 > regards > Yura Sokolov aka funny-falcon I don=E2=80=99t have an immediate plan to work on this patch. We can = wait for more voices. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/