Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wa3IZ-001UqJ-0t for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Jun 2026 03:20:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wa3IX-0094w1-0q for pgsql-hackers@arkaria.postgresql.org; Thu, 18 Jun 2026 03:20:09 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wa3IW-0094vt-2z for pgsql-hackers@lists.postgresql.org; Thu, 18 Jun 2026 03:20:08 +0000 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wa3IU-000000011XH-40QI for pgsql-hackers@lists.postgresql.org; Thu, 18 Jun 2026 03:20:08 +0000 Received: by mail-pl1-x631.google.com with SMTP id d9443c01a7336-2c0c3546924so4178675ad.3 for ; Wed, 17 Jun 2026 20:20:06 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1781752805; cv=none; d=google.com; s=arc-20240605; b=eMYrzgXiFEnxdTBRZN+AbI6uOJ3GxIgA2JteZBfR6mNckhTP2b8HmvwnCI5dU4p1MJ oIqo5rEaBH5qEmf2GmiPYQ1JZnCxn43NLwLNsgH28piEVkGB4NYvxp/c8c8CPiHKn7ON t63Nxv4G+M9Mla/yqut2E7UGXdsaHm0IkRH/Q7eW/LrK/aaTM2FxP4/PUDkWuOQaVWCP OWwszjf08n9v1Ewyv2L383ld6rVC+9xnmeekSuwFtvJtoxu1xv4a5Kf3sqNqeTKFAvuF gqD20+me0w7ZhtL8O6tv8ObrH9Hbgln5ExTpSeyCYd20t/Qw9VIjOPmhtnwULyWN84HX ynxA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=RXs3IxTNfUT+gQ7ENGR+WAll5ZrjY61IzO6qZ/luvrw=; fh=tdZdi1Mw1MZb0F7mNO3qGYdYz6/3oUPx1dIO+IVhC9g=; b=eZfsraPZhKAAY/4VEX2oJCxCXVz0dY00MUi69rNVxa45b5tZWQMepaEPfD8oyNFmQ3 HLCPpATJdhaE8PHKKw6E/jJgL8KRJDU7H8j21iXtMZB0uPaxoDu8dX96kjeX8Xes4rgg 5+Qyud92V841zatY+ybkj10EE1Y/nXt0Qev8D9tJ+ke/v9ujb2NT88P0H2OWQn1tdSae 5p1yyb3xkU+sVVs6f1tYwt47TfyOZnsPsZu92YHcKJKOY3NspuXENIFOcHZGW4JZKo9m ayPBNHHWrsmusUeFmRv7kAoqDeZ4k/wo82bHUwx5yTIOlWvCekYXos0gzDw5gBKn2kq1 MMZg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1781752805; x=1782357605; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=RXs3IxTNfUT+gQ7ENGR+WAll5ZrjY61IzO6qZ/luvrw=; b=V41NMeX1wUHWlaJuYIKaItgBNZT2j7lnBa+euH2OONkChCoRYxYKDZ2rle3JuT4k1w AScZmEq0ghcBprS6hcQgBlYCd/9PUka2DXIRjSH96VAChLaJayKLQnvNIC9ujA2vKhSz XdHoTXgVv+s8GRT9nrWT6T55ZPjrv8gcO+v/zVsWLlEzdMuIIQt80jSPYvW7XwPQcFEs r0HJm8ZvhqtSPXQJfKIoQGiqurxD+RuaLoD49Nk04fdwvbh9camvNjT5kgVJnuKhtyWP AGBOtVrggtHGxPmCL0/YzR96RQ4gkyl9xtYNVOoVGSwhcNEQ3HYmkK9VDoTSiIefi0bx Qj1w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1781752805; x=1782357605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=RXs3IxTNfUT+gQ7ENGR+WAll5ZrjY61IzO6qZ/luvrw=; b=qZZZLFhmTa5uSkzXkiYxwwH9bU3oVErSY+rKZiUDTtzOSbqt8w9stPCymiJUEIpqi1 FfhN0SmRrDynt9B0M3VLD7mSxbUHqYCVPSZL828UUXZxPtcryxjxmZYWpHIzUuX46R4D xkMhBZySPTW9KaCrl4WAw99a7tSAq7iGvhohSTmn7UXZuydS4ibaIQY5ipvzyB0lCBjE 5sjPn2IzkTCfQZ22BbZr8mVbukKegjMqQ/aOrUg9+coAY2biuiDZR8Y9sw/C7KMTH8Wc yydI53WXSK57lA6o394RCjQGIy9h1UIV9LnP+5ooqjP82ovPsPspHhp55FUgOZ7Viwj6 s+QA== X-Forwarded-Encrypted: i=1; AFNElJ+xdVJlEKhX0gbA5ZnDhW/CZdD1kAp5Wdr9sQlxhWLSELp1ScPTRWFezEX2D489zPx3j5D1K08ZDhiYmYvV@lists.postgresql.org X-Gm-Message-State: AOJu0YyOolF7Wxql3pPqgYFn/iaHdL5PpZUnRVqEMUc2Q7kRI7Klwps9 5lJDO85L07iUecCnBf0M7y+w6mGi0QzY6GCD7VahTSIoj7/15cl7XuUlD9cXmVHn+uCyh6l7B05 fGP2e6WfcPU6lbc129e8PegEJ24Az9d4= X-Gm-Gg: AfdE7clPRDpBDV+wUbrDj8j+nFk0pfn2TcMO+3MTj5YfgUK+GYJGm8Pctr9pBReBjMA WWi3UWhmR5PzXvjUX8KMzFA2WVCldiAbrcDbHDvXBa5F1nH8/7pNd7TagMIgbIQ5UA4vIkK/1PM 48bxD0Yzew5QImWFFheJOS03IUCyfS9NDmaGVvYdOJK492u00FfS06PfPey8/Owjg9UlrBFNfTC hf9Ji0PjB+WcZ7qu+wuHSpWrgN7KdhRNr0QZ8S/3J8LiJTJVllolEGkU6VcI8FWmxAEFdOSLd47 TuIby9A/E3WAkUPklDDLvPmEh/0i3WViGGNVIlupJ3CO4yTCy1Dmf8uPDRWFetRaXCLewTR3QBb 6vXihn89AJXw1e/yxtUOBWPy4pO42tFTA52eeAP+b X-Received: by 2002:a17:902:e889:b0:2c1:150d:6db5 with SMTP id d9443c01a7336-2c6e4724551mr17848945ad.6.1781752804659; Wed, 17 Jun 2026 20:20:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Amit Kapila Date: Thu, 18 Jun 2026 08:49:53 +0530 X-Gm-Features: AVVi8Cey4IS7rCHuz9pu5_FCAQclr50Nto_AXj7wxF2MewBxYnaMfV_6kAb4LQc Message-ID: Subject: Re: Fix race in ReplicationSlotRelease for ephemeral slots To: Xuneng Zhou Cc: Fujii Masao , "Zhijie Hou (Fujitsu)" , Srinath Reddy Sadipiralla , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Wed, Jun 17, 2026 at 12:59=E2=80=AFPM Xuneng Zhou = wrote: > > On Tue, Jun 16, 2026 at 8:46=E2=80=AFPM Fujii Masao wrote: > > > > On Fri, Jun 12, 2026 at 7:54=E2=80=AFPM Amit Kapila wrote: > > > I feel even if there is an argument to do such a refactoring, it can > > > be done separately. We can push forward with 0001 and then do more > > > discussion for 0002, if required. I can take care of 0001 unless > > > Fujii-San wishes to take care of it? > > > > Yeah, please feel free to work on 0001. > > > > Regarding 0002, since the race is very rare and non-fatal, I'm okay > > with accepting the risk rather than adding more refactoring just to > > avoid it. > > > > I'm a bit tempted to add a source comment explaining the risk and > > why we accept it, though, so other developers can understand > > the tradeoff. For example: > > > > diff --git a/src/backend/replication/logical/slotsync.c > > b/src/backend/replication/logical/slotsync.c > > index 05637344363..ca49f20e7d9 100644 > > --- a/src/backend/replication/logical/slotsync.c > > +++ b/src/backend/replication/logical/slotsync.c > > @@ -560,6 +560,12 @@ drop_local_obsolete_slots(List *remote_slot_list) > > * the same shared memory as that of > > 'local_slot'. Thus check if > > * local_slot is still the synced one before > > performing the actual > > * drop. > > + * > > + * Because local_slot still points to a > > reusable slot-array entry, > > + * fields such as name or database OID could > > already be stale here. > > + * That could cause an incorrect cleanup > > decision for this cycle or > > + * briefly lock an unrelated database. We > > accept that risk because > > + * this race is rare and non-fatal. > > */ > > SpinLockAcquire(&local_slot->mutex); > > synced_slot =3D local_slot->in_use && > > local_slot->data.synced; > > Thanks for suggesting the comment! It helps to clarify the situation > and the trade-off we made here. I tweaked it a bit and added it to the > patches prepared by Zhijie. > + * + * We cannot close this window by holding + * ReplicationSlotControlLock while taking the database lock, + * because the database-drop path holds the database lock and then + * scans replication slots. The database-drop path acquires ReplicationSlotControlLock in shared mode, so not sure if the above is completely correct, here you are going in the direction of trying to defend that no easy solution exists which needs more thought. Fujii-San's proposal was better but there also we may need to be a bit more specific about "That could cause an incorrect cleanup decision ...", otherwise, it makes the comment unclear. I am planning to commit and backpatch the fix for the first problem based on what Hou-San has shared (v2-*), then we can discuss how to improve the existing comments and if we agree on something, that can be a HEAD-only patch. --=20 With Regards, Amit Kapila.