Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFNnA-0052XK-2y for pgsql-hackers@arkaria.postgresql.org; Wed, 22 Apr 2026 02:58:21 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wFNn9-00BNsm-0R for pgsql-hackers@arkaria.postgresql.org; Wed, 22 Apr 2026 02:58:19 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wFNn8-00BNsc-28 for pgsql-hackers@lists.postgresql.org; Wed, 22 Apr 2026 02:58:18 +0000 Received: from mail-ed1-x52e.google.com ([2a00:1450:4864:20::52e]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wFNn6-00000002BYt-177s for pgsql-hackers@lists.postgresql.org; Wed, 22 Apr 2026 02:58:17 +0000 Received: by mail-ed1-x52e.google.com with SMTP id 4fb4d7f45d1cf-672645dbfeaso5372785a12.0 for ; Tue, 21 Apr 2026 19:58:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1776826694; cv=none; d=google.com; s=arc-20240605; b=FbVSelBqD6nngGy5sWUwzGyiiU1MqMHZ8G1FJd2US7zaLsGQapOOe2+f1L1TdXYlqM 4weVGz2jkcneITt+jX6NTCscz3lXaGOW1IsMUHIxhW//p71U+ZEJucEjc2ur43DTB+Ok sunU/nEx6trdJC1VUfxnu5dOyOpSzkvBmT5zRCzL7ripPGGthov/1Wtgi3EI3FrLWugV BQj1m8KhuQ2tcqhuco2vf+1cqHgtKFxsu+vL27586aw1NJqJvQYm2xE2yHP2cxeKv9R/ SwkpoTAlebMR35xBHvkGKuQN/vEJ8f0r0KW/WCcWojvQBWtPXv5CU7cCNIrsHmdRnJeZ SLZw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=AH5eQmwimPJhW49Fg5kLldrwyn/j8zY8kaxj7DGC9p0=; fh=fCxbqbwla69Szw6jlvePJjit9loJ+QNhGLNni2sJV4k=; b=HYFuatTQ3n4egLVWMFBOquc5kauNVnd0d1kW++6Z17zU1KOsh/gawMrmfYDG0E1pt4 QIctrAs0oVM2cl4Y/Igoc5x803+uJRGxfy/QURTyogGzTaA0TIiYH1u0LqtcVY5MIUYb 5tCYvzXX7OP9cYCfW9ZKzEiKyEm6Z6DvCP/udW5by58P2NEutulLeb2jbYe4X1fBGGAa BJKhcHSiHhkYqJrA0fE07285wNUK5koetW1dYZlQUqqKgdOKcqpQI9FTV3cK4vDXdvBX 1LZHZOrSzdRjgRTpB1vnbYfpQbzW0dAQZhec7PxjoP23HoVoXnl8pC8dgl3LOgPcdymM x7+w==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1776826694; x=1777431494; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AH5eQmwimPJhW49Fg5kLldrwyn/j8zY8kaxj7DGC9p0=; b=XuM+yNNdN3JeTPd2XLWy2YH6uh670nK/pESUpVJkGNnphuTTfWNQSRaUTz5gRsgIWw rUDSHJAFHCofaK9DAJqxlB+dsY2WyTW2vZ8XIdARoZ7odOTvMh3lQChMtrkmz0Bvx1Fr gvp2R+h4K7uMcp0oZUQELa1D5xk1o/uQ19LbLCW+KyhReJHVVj831jLolD1bIjvuTzme kBNneDUq+CeSYSaobtosoyWuwI7/n61ez3clQS44uvQSjF7aZTijneByZ115qV3jGiu6 ozYhTkEWbzOFOFMCqVhSr9uZENEndlfW/oyJIExJSuzyZ8aiYn79KyWPDpJ0bCq016Ok 28Og== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776826694; x=1777431494; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=AH5eQmwimPJhW49Fg5kLldrwyn/j8zY8kaxj7DGC9p0=; b=Z3T84cyg1HymNOHwFE4GVZUWaqszQVtyqjuNp3qYsev+rvxPmMDp7tYrYES91t3hmh IMreKIFTuBeWTYZlfxw4IH/s+rpDlrAL4w5GEvD00FNVrYEtpXTEbZOFgh88fihhCOiY nPSzqXN4zokpznyJz3uA+6qqE9pU4qTpmFZYJOdAGTfO8L8Zq2QCssTko9b0GCMIJoBB le0mme+SISao3SyZvXE1bMaCOvDRjVhrmBkDMifESedJ+F9htzbUW2046BYIonh9OPeR 3xeVwXy3v/SXHUV6A/lsQSxeXleK2oThFgFGvAkC4jluRD7iiCEB40YZiLQUYniH9jPW HgdQ== X-Forwarded-Encrypted: i=1; AFNElJ+pkYJCgUpvdYHRMlZBywhl3fNjgaZWkQvL9h7LOWBmkuoLKspsc557RcQao+RkvyvPRrLQoakS+9Mbwjrw@lists.postgresql.org X-Gm-Message-State: AOJu0YzGPZvDJ0hBf70ztdaQmv1RLpMLsEofMlBCvei1peFIMPbVjX+g lgY3XXFcYjuwyDJPiOnWgy2tNX1q7NrLiA8Af6LCmamXdfOCdQJYampitc2Vt8GG8lcrt4LZGPK 4sRjzhSDVzBDvbp/uBtTnDbK5LjUHAhE= X-Gm-Gg: AeBDieum/I0CzXY2Snd17R/f3OL+qlrLhdA5d4+ggNEGax3FykaKi1sSY7M9jxv3dql HiWOIQY01Mdgssn70l7/4DkD/dnmjIeewtWDHYaz8jeFTP3ctnmM4RuHoOydYdxJDrfHmDKvXbH nOs8LcAf3lLH1PpO3L77SXBN4LIzQjKDZsHhaS1cK8XeCcZyo+9dqO91kUiz45U/ZK6gEQ60u1E ic33l+j3l2Y6WjK8WsFN2twqvz9ivvLKJ7vW6NZEgpRzoamNWJ06CoSRe+Iv2fXrnOWqw3mSTlC i/xLQVomraFipBR+1CS0AELsl1FL1SCWD/tQQ9BSKa//gy513qvEwN9NDAHPLoMdXD584n/+Oid lxAE8w5ZdXDqmDpTEvg== X-Received: by 2002:aa7:cd4d:0:b0:672:a242:5887 with SMTP id 4fb4d7f45d1cf-672bfd9db45mr6029072a12.12.1776826693352; Tue, 21 Apr 2026 19:58:13 -0700 (PDT) MIME-Version: 1.0 References: <63B9B6C7-BB04-4F6E-B3AC-6886DDC1C3FD@gmail.com> <00D3DE2F-10B8-4439-91B4-D1C24A98627A@gmail.com> In-Reply-To: From: Xuneng Zhou Date: Wed, 22 Apr 2026 10:58:01 +0800 X-Gm-Features: AQROBzB4buQu-wrziNzMwI8lO6cSpAEtTxTuu1LRYtxVqW6S_XUbmZ8CbmarUqs Message-ID: Subject: Re: PoC: Add condition variable support to WaitEventSetWait() To: Yura Sokolov Cc: Chao Li , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hi Yura, On Wed, Apr 22, 2026 at 1:06=E2=80=AFAM Yura Sokolov wrote: > > 08.04.2026 11:22, Chao Li =D0=BF=D0=B8=D1=88=D0=B5=D1=82: > >> On Apr 8, 2026, at 11:50, Chao Li wrote: > >>> On Apr 2, 2026, at 15:38, Chao Li wrote: > >>>> On Mar 31, 2026, at 16:59, Chao Li wrote: > >>>>> On Mar 31, 2026, at 15:28, Chao Li wrote: > >>>>> > >>>>> Hi, > >>>>> > >>>>> There is an XXX comment in WalSndWait(): > >>>>> ``` > >>>>> * XXX: A desirable future improvement would be to add support for = CVs > >>>>> * into WaitEventSetWait(). > >>>>> ``` > >>>>> > >>>>> I have been exploring a possible approach for that. This patch is a= PoC that adds ConditionVariable support to WaitEventSet. This v1 is mainly= intended to gather feedback on the design, so I have only done some basic = testing so far, such as a normal logical replication workflow. > >>>>> > >>>>> I=E2=80=99d like to highlight a few key points about the design: > >>>>> > >>>>> 1. In the current WalSndWait(), although it prepares to sleep on a = ConditionVariable, it does not actually check whether the CV has been signa= led. In this PoC, I kept that same behavior. However, I tried to make the W= aitEventSet support for CVs generic, so that if we want to add actual signa= l checking in the future, that would be possible. > >>>>> > >>>>> 2. To keep the design generic, this patch introduces a new wait eve= nt type, WL_CONDITION_VARIABLE. A WL_CONDITION_VARIABLE event occupies a po= sition in the event array, similar to latch and socket events. When a CV is= signaled, the corresponding WL_CONDITION_VARIABLE event is returned in occ= urred_events. > >>>>> > >>>>> 3. The WaitEventSet APIs AddWaitEventToSet() and ModifyWaitEvent() = are extended to support CVs by adding one more parameter =E2=80=9Ccv" to bo= th APIs. The downside of this approach is that all call sites of these two = APIs need to be updated. I also considered adding separate APIs for CVs, su= ch as AddWaitEventToSetForCV() and ModifyWaitEventForCV(), since CVs do not= rely on the kernel and it might therefore make sense to decouple them from= socket and latch handling. But for v1, I chose the more generic approach. = I=E2=80=99d be interested in hearing comments on this part of the design. > >>>>> > >>>>> 4. One important point is that this patch extends the WaitEventSet = abstraction, not the underlying kernel wait primitives. A ConditionVariable= is still a userspace/shared-memory concept, but with this design it can pa= rticipate in the same waiting framework as sockets and latches. I think tha= t is useful because it allows mixed waits to be handled through one interfa= ce. > >>>>> > >>>>> Here is the v1 patch. > >>>>> > >>>> > >>>> I just noticed that I missed checking in my last edit when switching= to the other branch, so attaching an updated v1. > >>> > >>> PFA v2 - fixed a CI failure from contrib/postgres_fdw. > >> > >> Fixed a CI test failure and rebased. > > > > PFA v4 - try to fix a test failure on windows. > > Good day, Chao Li. > > ConditionalVariable works through the MyLatch. > And almost always there is single latch for one backend: > MyLatch =3D PGPROC->procLatch; > > (Single exception I found is XLogRecoveryCtl->recoveryWakeupLatch.) > > And that is where ConditionVariablePrepareToSleep were used in WalSndWait > but without check: FeBeWaitSet always waits for MyLatch. > > Therefore, I don't clear get, how you suggest to simultaneously wait > WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE? > How you will distinguish which one was fired? > > It looks to my, WL_LATCH_SET on MyLatch and WL_CONDITION_VARIABLE had to = be > mutual exclusive. At least unless ConditionVariable internal are changed. > > And check after waiting for conditional variable set had to be placed int= o > WaitEventSetWaitBlock where all other such checks are. > > All written above is just my humble opinion. > Excuse me if i'm too strict. > I think you're correct that ConditionVariableSignal() wakes the target process by calling SetLatch(&proc->procLatch). So every CV signal also sets MyLatch. The patch introduces ConditionVariableSignaled() to distinguish the two by checking CV wait-list membership (not the latch state itself), so in principle the two sources are distinguishable. In practice, however, the current patch seems not reliably report WL_CONDITION_VARIABLE. Inside WaitEventSetWait(), the latch check runs before the new CV check. When a CV signal fires, both conditions become true simultaneously: MyLatch is set AND the proc has been removed from the CV wait list. With nevents=3D1 (which is what WalSndWait passes), the latch event fills the one-slot output buffer and the loop breaks before reaching the CV check: /* existing latch check =E2=80=94 runs first */ if (set->latch && set->latch->is_set) { ... returned_events++; if (returned_events =3D=3D nevents) break; /* CV check below is never reached */ } /* new CV check =E2=80=94 unreachable when buffer is full */ if (set->cv) cv_signaled =3D ConditionVariableSignaled(set->cv); So for the converted caller, WL_CONDITION_VARIABLE is not reliably observable. In the common case where the latch readiness is returned first, the one-slot buffer is filled by WL_LATCH_SET and the CV check is skipped. -- Best, Xuneng