Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vdqrG-005fz9-08 for pgsql-hackers@arkaria.postgresql.org; Thu, 08 Jan 2026 14:19:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vdqrE-002Nw1-2p for pgsql-hackers@arkaria.postgresql.org; Thu, 08 Jan 2026 14:19:25 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vdqrE-002Nvs-1q for pgsql-hackers@lists.postgresql.org; Thu, 08 Jan 2026 14:19:25 +0000 Received: from mail-oo1-xc2e.google.com ([2607:f8b0:4864:20::c2e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vdqrC-005Hv6-2x for pgsql-hackers@lists.postgresql.org; Thu, 08 Jan 2026 14:19:25 +0000 Received: by mail-oo1-xc2e.google.com with SMTP id 006d021491bc7-657464a9010so1195239eaf.3 for ; Thu, 08 Jan 2026 06:19:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767881961; x=1768486761; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qM3ThWoER/9U07WlymIriClN59x7xCiETMCG9722MPs=; b=j+/te1vid9nut9rdUKMHktn8Dx5r1XXmoz6jYAKiOOQlReh5Ppk9RWFOdNLiB5ficP mOqlCwW/uPbO1le9itGW63JFRHTez2oh3RpfIsVWQ9Wbj/L/bAYTfxTdW1P8GvIKw6wu LLUwLIwxwdwNy5V3LAgSh08HypMA+Hncrm4CWgbkCWovDXaBbQDKoMuXibNI6QrLsGbc E3bm4NEABUy4QbE2EvYAkS4qaOVTZcf8xwC/z2WbNhvQEGZ2uV1RIc8jfJr5yDRHmrMw HPB1d7OfEKI1dc3q6pTvlhmwqcxzZs4Myhgtcf2DnNVVxFaoyIG8hbEM+Clx7dmEdEu3 oxSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767881961; x=1768486761; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=qM3ThWoER/9U07WlymIriClN59x7xCiETMCG9722MPs=; b=TJCnm1CWjyhMxcBugXABka7KF42Zx8Q0nxQ//0E6aP6lgf3WlH8a4l30dylfm+i/bh 0tLSZSpA5hnIvMjr+YJhFjTK5jIh+ec37XFHz4+qJin3HhH8w1iyDPwzl5tv7fCyuFqz LBKwMB2RWbCFpV6ju8Cen4jZHsxadb6W/HtJlCpOr84TroKlzGqd1yD+16CsBgj8+fpE SZt3r1fQigUyjbdNC5birmYfl0YXvXF3TmPoNHmF5yc8/0cbfOLDSMP1ZFmdvcmMMApu hI3pjWkTl4/kkNatT3yKEnhYQDmUiMLA/9URogAK8/k8gHiDrSXM4gaMbXGQK6Px2WA6 g3ig== X-Forwarded-Encrypted: i=1; AJvYcCVoU7B1xPXetujV9Tsa+jTk7OZFkumqsQlz2XXRkpoWxyaeRsE2B/zJ5iO5Xy4UncmHMhllQ2Hejl+rNfGW@lists.postgresql.org X-Gm-Message-State: AOJu0YzoEqaHKEobixbPefQhlVZhaWjQKhTAvmSCfZV33K7Rmqb2E8DZ HLE7rxCdvTXSNppOfz9M43xU4zzEEwjqj4vF7XknMwqIepFNBGWmm8kP1qfaPGkqW35gClBDZq4 7z2qxxhE0OvthFIHt/bGuPY53rEqRL24= X-Gm-Gg: AY/fxX4dkubqZHKKAdPApZnmM24fCMOnIfGmjGxh5Al/eMviSzyXf+4cRb57D8OEoLZ JOeX/rTloxDuSHjLlB61C5Z/OQ5JYoLSRlvcty3L7BFxMR99u82NUEum1WQ26/O0NMzU2KHx9Li G/BCD0MM7UfrgfuwGa2pCUa+cvPvNQXSligjvPV3pWY2Mddat764dXFtPlSKiQvnNJp5hfXazbO 6n/CNfY8M96wsyygkMcLpx4ey2M7rECgP7HW10rHorTPV4HMILt+chl6LvcGVQEkaCoRN2cxmGt /zo5M4vHtD57+QpljnK/v97ZKFt+dt2kgmqXR8Q/wQBoEN+5Su3GBlRAjxqMP7NPwoTGXw== X-Google-Smtp-Source: AGHT+IGXMu5rB0A4AV8coa01RBeENFfb/N7uE6rwS+ajgMgD1q06SiFbie2LyXX8BwknEzljMvVv+imeqaq8PJzUL9g= X-Received: by 2002:a05:6820:8411:b0:659:9a49:908e with SMTP id 006d021491bc7-65f54ef22bemr1358395eaf.13.1767881961161; Thu, 08 Jan 2026 06:19:21 -0800 (PST) MIME-Version: 1.0 References: <202601011659.ikh4ku4p3ovb@alvherre.pgsql> In-Reply-To: From: Alexander Korotkov Date: Thu, 8 Jan 2026 16:19:08 +0200 X-Gm-Features: AQt7F2pOQRQXvw3OSc5wJeA660Ij4zfXiD6eKdzvIQlUDd7fX5a-Hnr78nP6EVg Message-ID: Subject: Re: Implement waiting for wal lsn replay: reloaded To: Xuneng Zhou Cc: Andres Freund , Thomas Munro , =?UTF-8?Q?=C3=81lvaro_Herrera?= , Chao Li , pgsql-hackers , Michael Paquier , jian he , Tomas Vondra , Yura Sokolov Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Jan 7, 2026 at 6:08=E2=80=AFAM Xuneng Zhou w= rote: > On Wed, Jan 7, 2026 at 8:32=E2=80=AFAM Andres Freund = wrote: > > On 2026-01-06 18:42:59 +1300, Thomas Munro wrote: > > > Could this be causing the recent flapping failures on CI/macOS in > > > recovery/031_recovery_conflict? I didn't have time to dig personally > > > but f30848cb looks relevant: > > > > > > Waiting for replication conn standby's replay_lsn to pass 0/03467F58 = on primary > > > error running SQL: 'psql::1: ERROR: canceling statement due t= o > > > conflict with recovery > > > DETAIL: User was or might have been using tablespace that must be dr= opped.' > > > while running 'psql --no-psqlrc --no-align --tuples-only --quiet > > > --dbname port=3D25195 > > > host=3D/var/folders/g9/7rkt8rt1241bwwhd3_s8ndp40000gn/T/LqcCJnsueI > > > dbname=3D'postgres' --file - --variable ON_ERROR_STOP=3D1' with sql '= WAIT > > > FOR LSN '0/03467F58' WITH (MODE 'standby_replay', timeout '180s', > > > no_throw);' at /Users/admin/pgsql/src/test/perl/PostgreSQL/Test/Clust= er.pm > > > line 2300. > > > > > > https://cirrus-ci.com/task/5771274900733952 > > > > > > The master branch in time-descending order, macOS tasks only: > > > > > > task_id | substring | status > > > ------------------+-----------+----------- > > > 6460882231754752 | c970bdc0 | FAILED > > > 5771274900733952 | 6ca8506e | FAILED > > > 6217757068361728 | 63ed3bc7 | FAILED > > > 5980650261446656 | ae283736 | FAILED > > > 6585898394976256 | 5f13999a | COMPLETED > > > 4527474786172928 | 7f9acc9b | COMPLETED > > > 4826100842364928 | e8d4e94a | COMPLETED > > > 4540563027918848 | b9ee5f2d | FAILED > > > 6358528648019968 | c5af141c | FAILED > > > 5998005284765696 | e212a0f8 | COMPLETED > > > 6488580526178304 | b85d5dc0 | FAILED > > > 5034091344560128 | 7dc95cc3 | ABORTED > > > 5688692477526016 | bb048e31 | COMPLETED > > > 5481187977723904 | d351063e | COMPLETED > > > 5101831568752640 | f30848cb | COMPLETED <-- the change > > > 6395317408497664 | 3f33b63d | COMPLETED > > > 6741325208354816 | 877ae5db | COMPLETED > > > 4594007789010944 | de746e0d | COMPLETED > > > 6497208998035456 | 461b8cc9 | COMPLETED > > > > The failure rates of this are very high - the majority of the CI runs o= n the > > postgres/postgres repos failed since the change went in. Which then als= o means > > cfbot has a very high spurious failure rate. I think we need to revert = this > > change until the problem has been verified as fixed. > > This specific failure can be reproduced with this patch v1. > > I guess the potential race condition is: when > wait_for_replay_catchup() runs WAIT FOR LSN on the standby, if a > tablespace conflict fires during that wait, the WAIT FOR LSN session > is killed even though it doesn't use the tablespace. > > In my test, the failure won't occur after applying the v2 patch. I see, you were right. This is not related to the MyProc->xmin. ResolveRecoveryConflictWithTablespace() calls GetConflictingVirtualXIDs(InvalidTransactionId, InvalidOid). That would kill WAIT FOR LSN query independently on its xmin. I guess your patch is the only way to go. It's clumsy to wrap WAIT FOR LSN call with retry loop, but it would still consume less resources than polling. ------ Regards, Alexander Korotkov Supabase