Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFpEa-005awy-1N for pgsql-hackers@arkaria.postgresql.org; Thu, 23 Apr 2026 08:16:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wFpEZ-000gyX-1S for pgsql-hackers@arkaria.postgresql.org; Thu, 23 Apr 2026 08:16:27 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFpEZ-000gyP-09 for pgsql-hackers@lists.postgresql.org; Thu, 23 Apr 2026 08:16:27 +0000 Received: from mail-pj1-x102a.google.com ([2607:f8b0:4864:20::102a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wFpEU-00000002NyP-03pK for pgsql-hackers@lists.postgresql.org; Thu, 23 Apr 2026 08:16:26 +0000 Received: by mail-pj1-x102a.google.com with SMTP id 98e67ed59e1d1-35d99bae2ebso5781632a91.3 for ; Thu, 23 Apr 2026 01:16:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776932180; x=1777536980; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=5RV6J1MTRGRs8NyUDHCzzMiERWQO4G7vnJavX1j3wKo=; b=JJjjoJ4XUk+k/d1isQKY+ZzNIzY1CXBQp6ANIhjva21p6PhmolYonNK2Bpa+Lh3v+l f/lOXrNZTkKiMIUGPqpgxGQq6C4LPkIrsVCUlFj5TsVBmIPqo3Sz+QDmcJ9oL0YR5Npe Fg60FblBlCNHtUaTrdrxHPr8TRmlo96CyiaiCtStilnhef/0Hc8rDrww1+TYRQG8MYF7 OwDedcdS5g10t4uEnNklZFn+bJDxRkp9kEGZ4MFZYh+z6t4P6rnnCiGHI6+0Pk5uvY8Y /wNDktTq2C06QPuw6Xj23XnvaPsJLm1CaDegofSDjBlrVqlanjib1Y3tH5+OS7D+pBkg 2lvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776932180; x=1777536980; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=5RV6J1MTRGRs8NyUDHCzzMiERWQO4G7vnJavX1j3wKo=; b=o9cz/PHolVPMCa3XAj3uGIRgbl3l3eOILyugf9lhbRtmDD4jc5oviqwlZoa3my2rtQ DnsCmDmV/uMOqxWZlMqrae3djyzVEtLj2CQaZFxbupoTuoiKHiZsplUapusluwQ69Eqw ZATj/daCVsl80WciStLVjjRUHh7Qm3Jzn1yhvkhYYSQjVdGWWMj7C1oqmPp1n3+/zrIR actJgCbTv0W8mKbb/DQA5fjamL6kf9cmQmnrLrsdo7F1wd8UzcVJkxDOyE023iTFUz0X eXEkzsMUvynyISYh6k1wGq5IIc5UayIqRzB40NS6F1dsqcHf5aeFejru+sftnRoQ9Krc Htbw== X-Forwarded-Encrypted: i=1; AFNElJ8EgZJbYcVxcm8PemSRXfZ0rGDXcYpulyxf9uWZTHz+wHemx0HlYLHndPGmKiSXM8fhBOYHIgBHN0z/ChSH@lists.postgresql.org X-Gm-Message-State: AOJu0YxIurz0zD0DNH6Jw8dBROBkVJJXaodYKFIu1oQMH8rpk3IBaNXh neNhqrNIqZOfXwngAG9dv6G38fS5kRXXdcQJL83RRB/9jGsQk7piDfOv X-Gm-Gg: AeBDietp49aIqNgRNyxfCAwce8pd7uhQd2wJVvlr1Gsp1NV+wHQAvvM5qd2wVOVxeeS 6+Y9V1q5T+YjYwxnRGxSMUQg8qhHCMhwY3PnaxZhfTEYEBrGBYN4xbHswxG8nU6xePfRJ4BkuzE 5rosClNBl84RO9r73Z47s07ErKRJAA1Uhwpb0dCKtqapvZ4syL5ild/Y/EpBJwgT2/7l9UfZd9X qaPoNN7b4IyK/MJtWJ9hom6mSvhEdiehrUBvXRUDall2FCjmZrcsC5TI6eiMvVwPYvqS6gMDW3h 7YhBXqOJPf8zT9yRNx6GLbOoH23IcF5Fzrpyz1adRGbtWkj39GhSf9EpuC0CerqOMz4uslgYccg KdniCsl/HN73bVPrrOCnfbsy9ZFqAkC/u5jd8K8qsJcNT7Qf+sBQKtkKupqseyyfuftTWE2Ke5X JHm2+BXU2nqnHX/C9tEQC5GD9F+nFik/whZPUaeQQiaQ== X-Received: by 2002:a17:902:7249:b0:2ae:b9cd:d2ce with SMTP id d9443c01a7336-2b5f9f63300mr209224965ad.27.1776932180415; Thu, 23 Apr 2026 01:16:20 -0700 (PDT) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2b5fa9ff409sm236534155ad.14.2026.04.23.01.16.18 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 23 Apr 2026 01:16:19 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: PoC: Add condition variable support to WaitEventSetWait() From: Chao Li In-Reply-To: Date: Thu, 23 Apr 2026 16:15:40 +0800 Cc: Xuneng Zhou , PostgreSQL Hackers Content-Transfer-Encoding: quoted-printable Message-Id: <289B3400-74C9-4C1C-9882-2C6F3C515FE2@gmail.com> References: <63B9B6C7-BB04-4F6E-B3AC-6886DDC1C3FD@gmail.com> <00D3DE2F-10B8-4439-91B4-D1C24A98627A@gmail.com> To: Yura Sokolov X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Apr 22, 2026, at 21:13, Yura Sokolov = wrote: >=20 > 22.04.2026 05:58, Xuneng Zhou =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >> Hi Yura, >>=20 >> On Wed, Apr 22, 2026 at 1:06=E2=80=AFAM Yura Sokolov = wrote: >>>=20 >>> 08.04.2026 11:22, Chao Li =D0=BF=D0=B8=D1=88=D0=B5=D1=82: >>>>> On Apr 8, 2026, at 11:50, Chao Li wrote: >>>>>> On Apr 2, 2026, at 15:38, Chao Li wrote: >>>>>>> On Mar 31, 2026, at 16:59, Chao Li = wrote: >>>>>>>> On Mar 31, 2026, at 15:28, Chao Li = wrote: >>>>>>>>=20 >>>>>>>> Hi, >>>>>>>>=20 >>>>>>>> There is an XXX comment in WalSndWait(): >>>>>>>> ``` >>>>>>>> * XXX: A desirable future improvement would be to add support = for CVs >>>>>>>> * into WaitEventSetWait(). >>>>>>>> ``` >>>>>>>>=20 >>>>>>>> I have been exploring a possible approach for that. This patch = is a PoC that adds ConditionVariable support to WaitEventSet. This v1 is = mainly intended to gather feedback on the design, so I have only done = some basic testing so far, such as a normal logical replication = workflow. >>>>>>>>=20 >>>>>>>> I=E2=80=99d like to highlight a few key points about the = design: >>>>>>>>=20 >>>>>>>> 1. In the current WalSndWait(), although it prepares to sleep = on a ConditionVariable, it does not actually check whether the CV has = been signaled. In this PoC, I kept that same behavior. However, I tried = to make the WaitEventSet support for CVs generic, so that if we want to = add actual signal checking in the future, that would be possible. >>>>>>>>=20 >>>>>>>> 2. To keep the design generic, this patch introduces a new wait = event type, WL_CONDITION_VARIABLE. A WL_CONDITION_VARIABLE event = occupies a position in the event array, similar to latch and socket = events. When a CV is signaled, the corresponding WL_CONDITION_VARIABLE = event is returned in occurred_events. >>>>>>>>=20 >>>>>>>> 3. The WaitEventSet APIs AddWaitEventToSet() and = ModifyWaitEvent() are extended to support CVs by adding one more = parameter =E2=80=9Ccv" to both APIs. The downside of this approach is = that all call sites of these two APIs need to be updated. I also = considered adding separate APIs for CVs, such as = AddWaitEventToSetForCV() and ModifyWaitEventForCV(), since CVs do not = rely on the kernel and it might therefore make sense to decouple them = from socket and latch handling. But for v1, I chose the more generic = approach. I=E2=80=99d be interested in hearing comments on this part of = the design. >>>>>>>>=20 >>>>>>>> 4. One important point is that this patch extends the = WaitEventSet abstraction, not the underlying kernel wait primitives. A = ConditionVariable is still a userspace/shared-memory concept, but with = this design it can participate in the same waiting framework as sockets = and latches. I think that is useful because it allows mixed waits to be = handled through one interface. >>>>>>>>=20 >>>>>>>> Here is the v1 patch. >>>>>>>>=20 >>>>>>>=20 >>>>>>> I just noticed that I missed checking in my last edit when = switching to the other branch, so attaching an updated v1. >>>>>>=20 >>>>>> PFA v2 - fixed a CI failure from contrib/postgres_fdw. >>>>>=20 >>>>> Fixed a CI test failure and rebased. >>>>=20 >>>> PFA v4 - try to fix a test failure on windows. >>>=20 >>> Good day, Chao Li. >>>=20 >>> ConditionalVariable works through the MyLatch. >>> And almost always there is single latch for one backend: >>> MyLatch =3D PGPROC->procLatch; >>>=20 >>> (Single exception I found is XLogRecoveryCtl->recoveryWakeupLatch.) >>>=20 >>> And that is where ConditionVariablePrepareToSleep were used in = WalSndWait >>> but without check: FeBeWaitSet always waits for MyLatch. >>>=20 >>> Therefore, I don't clear get, how you suggest to simultaneously wait >>> WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE? >>> How you will distinguish which one was fired? >>>=20 >>> It looks to my, WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE = had to be >>> mutual exclusive. At least unless ConditionVariable internal are = changed. >>>=20 >>> And check after waiting for conditional variable set had to be = placed into >>> WaitEventSetWaitBlock where all other such checks are. >>>=20 >>> All written above is just my humble opinion. >>> Excuse me if i'm too strict. >>>=20 >>=20 >> I think you're correct that ConditionVariableSignal() wakes the = target >> process by calling SetLatch(&proc->procLatch). So every CV signal = also >> sets MyLatch. The patch introduces ConditionVariableSignaled() to >> distinguish the two by checking CV wait-list membership (not the = latch >> state itself), so in principle the two sources are distinguishable. >=20 > Nope. > Patch may say if ConditionVariable was signalled. > But it cann't say if MyLatch were set as well without touching > ConditionVariable. > It is not better than current state in term of code clearness and = flawlessness. Hi Yura, Thank you for the review. This patch is marked as =E2=80=9CPoC=E2=80=9D, = so nothing has been finalized yet, and I am open to any comments and = suggestions. In theory, ConditionVariable relies on MyLatch, and in most cases a = process has only one MyLatch, so your point that WL_LATCH_SET on MyLatch = and WL_CONDITION_VARIABLE should ideally be mutually exclusive makes = sense. However, the current code does not really follow that principle. Looking = at the current code in WalSndWait(): ``` if (wait_event =3D=3D WAIT_EVENT_WAIT_FOR_STANDBY_CONFIRMATION) = ConditionVariablePrepareToSleep(&WalSndCtl->wal_confirm_rcv_cv); else if (MyWalSnd->kind =3D=3D REPLICATION_KIND_PHYSICAL) = ConditionVariablePrepareToSleep(&WalSndCtl->wal_flush_cv); else if (MyWalSnd->kind =3D=3D REPLICATION_KIND_LOGICAL) = ConditionVariablePrepareToSleep(&WalSndCtl->wal_replay_cv); if (WaitEventSetWait(FeBeWaitSet, timeout, &event, 1, = wait_event) =3D=3D 1 && (event.events & WL_POSTMASTER_DEATH)) { ConditionVariableCancelSleep(); proc_exit(1); } ConditionVariableCancelSleep(); ``` It calls ConditionVariablePrepareToSleep() on a specific CV, so ideally = only ConditionVariableSignal() on that specific CV should wake the wait. However, the current code actually relies on = WaitEventSetWait(FeBeWaitSet) waiting on MyLatch, where FeBeWaitSet = includes WL_LATCH_SET. So if someone calls SetLatch(the_proc_latch), the = wait will still wake up. The code does not check at all which CV was = actually signaled. With this patch, we would be able to verify whether the CV was signaled. = The current PoC does not do that yet, because I simply followed the = existing behavior, but with the new WaitEventSetWait() support, it would = be easy to add such a check. =46rom that perspective, I think the PoC is = already better than the current code. As for your question, =E2=80=9CHow will you distinguish which one was = fired?=E2=80=9D: when FeBeWaitSet includes both WL_LATCH_SET and = WL_CONDITION_VARIABLE, and MyLatch is used for WL_LATCH_SET, then if the = CV is signaled, WL_LATCH_SET may also fire as a side effect. I think = that would be okay. Do you have a use case where that would lead to a = problem? Again, this PoC version is still far from the final version. Any = discussion is very welcome. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/