Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJVcy-009H5J-1H for pgsql-general@arkaria.postgresql.org; Thu, 13 Nov 2025 11:36:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vJVcv-000t2p-0n for pgsql-general@arkaria.postgresql.org; Thu, 13 Nov 2025 11:36:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vJVbT-000oHr-20 for pgsql-general@lists.postgresql.org; Thu, 13 Nov 2025 11:35:03 +0000 Received: from mail-yw1-x1130.google.com ([2607:f8b0:4864:20::1130]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vJUnK-007W8H-0X for pgsql-general@lists.postgresql.org; Thu, 13 Nov 2025 10:43:16 +0000 Received: by mail-yw1-x1130.google.com with SMTP id 00721157ae682-78665368a5cso6230997b3.3 for ; Thu, 13 Nov 2025 02:43:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1763030592; x=1763635392; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=qlTRkg8dKETJ2T2H+tPYd4PVrnLiXCYJCDWdhFawdhU=; b=f76xEwscieJvDgGvggIk0WiluF0zRc+ZVVusIxtmAIWSoJG7Gvme5uGhlXggXmtkZt yU+HGbZD0EXsnzQUfom/qgFBjNZyPeT+jlXCpOKoEagEedofOQCjlYD7/BRly7m2yXJi kVtpfFREcOGahZ6Lthr40FDX4fvvNbUgDzIodwjU+clJRYwUZ3ELBe5llG0VMP3K+REW rToOCl1BXWuSvuDpPK/YhZqHsCVvJM7bgVWHgu5fqH29KLmeXGo6Dn1xGC6w5jmNq20I kUJxhoA+3b1BJyYIh281EUXohMUD/J0qMN1LbHqnaNdr49xebP8alIkOlc8QxTjFKxTo ipRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763030592; x=1763635392; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=qlTRkg8dKETJ2T2H+tPYd4PVrnLiXCYJCDWdhFawdhU=; b=lHJG/Es7NnOLN3o/B0+YWpCrQ/EPrFjbUuAtxcWejnFB5p5Sr7/7iPDsY8V5VX6xP0 rFo9PWxk5tRL6d8v6dZEGnGzBLaCuolBt/LRG7Rs1qDkdMV6H1chJfMAw5KOZXaUqXXj Bd7e55Elw4rbQumWMmL+CpoTC14Q3sZX8Y4FmXSAjKJODr+M1bp4xYPmhPZuEH6pYz8B JL7rDTita+gkFNawxGKKo/YmcEP/Y3e902RmSYLN9YHxFyKDlrk8YOPkAUauLMH7gT6m EBnFWGyBy/t5D/pxkNzUrJf+sZuHtQAXnz1URhX1+BAZ7kWSxh1gWJrmnslfczH/cFBD cuIw== X-Gm-Message-State: AOJu0Ywx2Q7h8/j3Nu8PI4y4AXq5CJ9+KSOohyNUXhyRuzVBRFc5GPuU Mxur+g2ZUVW7SRu4MlEIJCAvKdbhx3FOO4lTQtcMRiSJ16cBX/LQR9zi+JcoQ2lYDTq9EkVF55B IDhJPitCylK5JheIYU7xkDUVyLVzq7qRplBSP X-Gm-Gg: ASbGncsw5BP2J5uKEFCbrkMU2buq5w8+Jk/Sg8sW8b+xPojKZvCg359+DQiPOb+7e3o c7LX+yBuwWlvWnzqqPfQwaDmdAU4eocV05kdd/cqRSHkmSkLR/+6/zoZGVJxrQZfuknRd5ZPRRG kHfxfz9m/3i8wcNBjwy2VkcadFGsK6CisQYN5N8tV+lPJPDuboD91ZggVEItWQ8A4PljA9kdEuT pbT4LjCrnt+2A1xagEv2Xl+PnDUUGLdjpuqCxuO4M+V4wqkvqhBn2l3QEJuEaDR98XEYvSAw8UI Zkk+LI1aZg== X-Google-Smtp-Source: AGHT+IHv9ds9urFtdFcnXC7hTq/1q7FqsXQlOlgLSM4oVVmmdcAU/Yylq7v+FH8in1v7+vBVODoUhZRHXAkylzS/pbE= X-Received: by 2002:a05:690c:4988:b0:787:e9bc:f9ea with SMTP id 00721157ae682-788136464dbmr47290737b3.26.1763030591624; Thu, 13 Nov 2025 02:43:11 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Bernice Southey Date: Thu, 13 Nov 2025 10:42:35 +0000 X-Gm-Features: AWmQ_bmM_FBGOJjB6OcWFNax4IA9ZSk82x-oLB7wNjOSPVb4-_hn3uVEkaOm5uY Message-ID: Subject: Re: Is this expected concurrency behaviour for EvalPlanQual and ctid? To: pgsql-general@lists.postgresql.org Cc: pg@sophiebits.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Here's a much simpler example. This one works when I turn off enable_tidscan, so I'm very hopeful Sophie's patch applies. It's very similar to the example bug, but loses an update instead of gaining one. It MSTM that both are because the ctid value wasn't being rechecked in EvalPlanQual. --session 1 CREATE TABLE t(p BOOL, q BOOL); INSERT INTO t DEFAULT VALUES; SELECT pg_advisory_lock(1); --session 2 SELECT pg_advisory_lock_shared(1); SELECT pg_advisory_unlock_shared(1); WITH lock_t AS (SELECT ctid FROM t FOR UPDATE) UPDATE t SET p =3D TRUE FROM lock_t l WHERE t.ctid =3D l.ctid RETURNING p, q, t.ctid; --session 3 SELECT pg_advisory_lock_shared(1); SELECT pg_advisory_unlock_shared(1); WITH lock_t AS (SELECT ctid FROM t FOR UPDATE) UPDATE t SET q =3D TRUE FROM lock_t l WHERE t.ctid =3D l.ctid RETURNING p, q, t.ctid; --session 1 SELECT pg_advisory_unlock(1); Only one of the updates succeeds. This leads me to conclude that for concurrency and ctid: safe: - deleting a record, because one of the deletes will succeed - updating a record with skip lock, because the lock won't ever be rech= ecked unsafe (currently): - updating a record without skip lock The documentation for UPDATE and DELETE both include a FOR UPDATE with ctid example, so I think this isn't expected behaviour, but will be fixed in the next release. Thanks Sophie! Thanks, Bernice On Wed, Nov 12, 2025 at 5:12=E2=80=AFPM Bernice Southey wrote: > > Hi all, > > I've run into a concurrency issue and am hoping someone here with more > internal PostgreSQL knowledge than me can explain why. > > My scenario is keeping a sync table up to date with the latest message > by recipient key. If I use ctid, all the updated records are dropped > by the concurrent query, after the first query releases the locks. If > I use a standard column it works as expected. > > I'm hopeful that the recently accepted patch [1] from Sophie is the > same issue, but I'm not knowledgeable enough to tell. I tried setting > enable_tidscan to 0 based on the discussion, but it made no > difference, so I thought I best ask. If this behaviour is as expected, > I'd really like to understand why, so I can better reason about ctid > in concurrent updates. > > To test, I created 10 sync records and 200 messages for each of them. > I processed 1000 messages at a time concurrently in two different > sessions. I duplicated the test for the ctid case and my key column > (k) case. The only difference is the final where clause for the > update. The output should be each key synced to 200. > > --setup > CREATE TABLE msg(k INT, m INT); > CREATE TABLE msg2(k INT, m INT); > CREATE TABLE sync(k INT PRIMARY KEY, m INT DEFAULT 0); > CREATE TABLE sync2(k INT PRIMARY KEY, m INT DEFAULT 0); > INSERT INTO sync(k) SELECT i FROM generate_series(1, 10) i; > INSERT INTO sync2(k) SELECT i FROM generate_series(1, 10) i; > INSERT INTO msg SELECT k, i FROM sync, generate_series(1, 200) i > ORDER BY random(); > INSERT INTO msg2 SELECT k, i FROM sync, generate_series(1, 200) i > ORDER BY random(); > SELECT pg_advisory_lock(1); > > --run in two new sessions > SELECT pg_advisory_lock_shared(1); > SELECT pg_advisory_unlock_shared(1); > WITH > --by ctid > lock_msg AS ( > SELECT ctid FROM msg FOR UPDATE SKIP LOCKED LIMIT 1000), > delete_msg AS ( > DELETE FROM msg m USING lock_msg l WHERE m.ctid =3D l.ctid > RETURNING k, m), > msg_max AS ( > SELECT DISTINCT ON (k) k, m FROM delete_msg ORDER BY k, m DESC), > lock_sync AS ( > SELECT s.ctid, k, m.m from msg_max m join sync s USING (k) > ORDER BY k FOR UPDATE OF s), > update_sync AS ( > UPDATE sync s set m =3D l.m FROM lock_sync l > WHERE s.ctid =3D l.ctid AND l.m > s.m --by ctid > RETURNING s.k, s.m, s.ctid), > --by k > lock_msg2 AS ( > SELECT ctid FROM msg2 FOR UPDATE SKIP LOCKED LIMIT 1000), > delete_msg2 AS ( > DELETE FROM msg2 m USING lock_msg2 l WHERE m.ctid =3D l.ctid > RETURNING k, m), > msg_max2 AS ( > SELECT DISTINCT ON (k) k, m FROM delete_msg2 ORDER BY k, m DESC), > lock_sync2 AS ( > SELECT s.ctid, k, m.m FROM msg_max2 m join sync2 s USING (k) > ORDER BY k FOR UPDATE OF s), > update_sync2 AS ( > UPDATE sync2 s SET m =3D l.m FROM lock_sync2 l > WHERE s.k =3D l.k AND l.m > s.m --by k > RETURNING s.k, s.m, s.ctid) > --results > SELECT 'ctid', * FROM update_sync UNION ALL > SELECT 'k', * FROM update_sync2; > > --run last in original session > SELECT pg_advisory_unlock(1); > > All the records in both msg tables are deleted. In the sync tables, > 'k' is updated as expected across the two sessions, but 'ctid' dropped > all the records in the second session. > > Session 1 > ctid 1 199 (0,11) > ctid 2 200 (0,12) > ctid 3 200 (0,13) > ctid 4 199 (0,14) > ctid 5 200 (0,15) > ctid 6 200 (0,16) > ctid 7 199 (0,17) > ctid 8 200 (0,18) > ctid 9 200 (0,19) > ctid 10 200 (0,20) > k 1 200 (0,11) > k 2 195 (0,12) > k 3 200 (0,13) > k 4 200 (0,14) > k 5 199 (0,15) > k 6 199 (0,16) > k 7 196 (0,17) > k 8 194 (0,18) > k 9 200 (0,19) > k 10 196 (0,20) > > Session 2 > k 2 200 (0,21) > k 5 200 (0,22) > k 6 200 (0,23) > k 7 200 (0,24) > k 8 200 (0,25) > k 10 200 (0,26) > > Thanks, > Bernice > > 1 [https://www.postgresql.org/message-id/flat/4a6268ff-3340-453a-9bf5-c98= d51a6f729%40app.fastmail.com]