Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vgQLJ-00CJSC-2J for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Jan 2026 16:37:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vgQLI-000Whp-2x for pgsql-hackers@arkaria.postgresql.org; Thu, 15 Jan 2026 16:37:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vgQLI-000Whe-1u for pgsql-hackers@lists.postgresql.org; Thu, 15 Jan 2026 16:37:04 +0000 Received: from mail-wr1-x431.google.com ([2a00:1450:4864:20::431]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vgQLG-000eUR-0P for pgsql-hackers@lists.postgresql.org; Thu, 15 Jan 2026 16:37:04 +0000 Received: by mail-wr1-x431.google.com with SMTP id ffacd0b85a97d-432d28870ddso580604f8f.3 for ; Thu, 15 Jan 2026 08:37:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1768495021; x=1769099821; darn=lists.postgresql.org; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:from:to:cc :subject:date:message-id:reply-to; bh=BIGiiUeRIu1REleyPpRDPzxaPi5eo86xC73wb2I7bco=; b=ducRgGg/ecXnxpNZzHq77ChE5vrTIxzYsJzwNzcg8DBClkccPXXZ8LwSHPt15Owc25 aULhr23pccoEnpd15uMlm5J3aCrxP1i3ihjcoUuNU4VJZC9TI3puTKp+dA0qSliSSPWu gV8Kv2C4+gK0qqvNHXxSZ7zPwAcetKM24u0Wkbje0lxmChqbJ0/qFn8Bb3Pni1JWy7j/ lOhKQ6fGpI1K5nNM+S/qfgkUNMMQpXgN7p3mURZPZZAoiZ4v4RSJsAcjLUKYbFzB/UAO lYnozuZqMAml7wBBk6fP/yfNfERplCoPARRxOME9sRp+5uYJzaXx2DjVvZ3IhkA1xWSA JwTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768495021; x=1769099821; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=BIGiiUeRIu1REleyPpRDPzxaPi5eo86xC73wb2I7bco=; b=XTn8jpf7du91rQO5u1lpmVnfQCHyq4yxXfzqBwc8cyHz62VpwGWGb9059jKDKQwIC6 xFSwRTyiyoI9J9rturGSCI5QeIoSUNdbyKAYQ7yTIPfwGHLQpW6/VXHjhXJEVC/3ED6p jkuzfaEdeu8wwCc5+9fRP8i+bpI6CGdzDoV+CAveZbfzWRo2t14TmM9EmmkEAm3WYb60 iunOm8UOhGtv4TVOjP5lYvz1MgH/apExm9e5VVz2xEwrxZGaehEtRf0TH4aYZgbxv2B6 bmlxeIvQCkqDtZinkKJIrvuLXDLf/qvJYABi23xmyhB2oicd75ZACXkshFJMH4bgiPIi ZkAw== X-Forwarded-Encrypted: i=1; AJvYcCXQNEdPTjDmtO/jNSaCVri9sYyh5JIB5dEBuuwGkUDdDJ+e84cFm+wB/Vnbycys2HEMC9y0UnwcEWIowjD7@lists.postgresql.org X-Gm-Message-State: AOJu0YzQmoifBP+VepwzwoD7aH7cxGVhQ3dTbPyhiveNC4CaOqHFEwCi 4F+5DFZA3xAcFZmW2KZku26VGzgbfufZjjQ/yqK8Hg/TCIcExdxYhMmmsDUtEB275xs= X-Gm-Gg: AY/fxX4sjKRnrZRfI6bbqLYSiRXTsDtWSFSUUKzw6j+bbuyrVdT1EIrx+cHb5/6Cff+ VqymekA+EiYcJBVi67Rf5O6wgThOxFR6Oz3s9VkH8jS+N8zTrXpnwg/sORuWTXdNRVyCoA/Nw1j b9zs/Vl4qauez5pTWWXNYrBd2B13NJm+a+rDcw3eE0I3luZEkl/EnMlE2iWyYAdRhtCcKz9u3cx Vtk15P3WIfxKX4KUcgDaKi9sJG0Qnxkjr4Vp1KkzHl1Mz2FkaeucCieG/aaZ5wpJj87+xQJ/tS+ YhOBUjw7UU1X9keRx/jsLs2h8Ll9CNguna2ZU+a9E2OjePteamsn8Djx440I6Kr2jUTCnyrha8h mGChnE0mFDO6AbFhAMRXOB/JmR3fMNtwWMXDiyKnI2bRXXBLPtVE7UdcUYbYAyPepQ7c+oQbuMR kkhOwgSEfwF94nnztqQdofnjUafBTKNDOS1XI= X-Received: by 2002:a05:6000:2306:b0:430:fd84:3179 with SMTP id ffacd0b85a97d-4342c54867amr8067422f8f.33.1768495020844; Thu, 15 Jan 2026 08:37:00 -0800 (PST) Received: from localhost (109-81-168-246.rct.o2.cz. [109.81.168.246]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-434af6b1390sm6918251f8f.21.2026.01.15.08.37.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Jan 2026 08:37:00 -0800 (PST) From: Antonin Houska To: Mihail Nikalayeu cc: Alvaro Herrera , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: References: <202512151349.vlq3mpfniyk3@alvherre.pgsql> <11247.1767609087@localhost> <11558.1767609632@localhost> <141054.1767891540@localhost> <137668.1768235610@localhost> Comments: In-reply-to Mihail Nikalayeu message dated "Mon, 12 Jan 2026 21:54:56 +0300." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <35685.1768495019.1@localhost> Content-Transfer-Encoding: quoted-printable Date: Thu, 15 Jan 2026 17:36:59 +0100 Message-ID: <35686.1768495019@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Mihail Nikalayeu wrote: > Also, there are some crashes of stress tests for v30 (for both single sn= apshot and multiple snapshot versions). > = > --------------------- > = > Looks like something is leaking, but not sure. > = > https://cirrus-ci.com/task/5577209672368128?logs=3Dtest_world#L277 (mult= iple snapshots) > https://cirrus-ci.com/task/6439044873191424 (without multiple snapshots) As the test runs pgbench with --client=3D30 and the default value of max_worker_processes is 8, I'm not sure this is a leak. I've increased thi= s parameter I couldn't see the error anymore. > This one showed something goes wrong, the sum of the table is broken. It= may be 0 because non-MVCC safe, but I checked the logs: > = > 2026-01-12 18:41:11.656 UTC client backend[76247] 007_repack_concurrentl= y.pl LOG: statement: SELECT (490588) / 0; I agree that this is due to the missing MVCC safety feature. I commented t= hat check in the script for now. Besides that, I saw some deadlocks. I think this was due to the fact that multiple rows are updated per transaction, and that the keys are random, s= o it can happen that two transactions try to update the same rows in different order. I increased the number of rows in the test table to 10000 and don't= see the deadlocks anymore. > backend[54349] 007_repack_concurrently.pl ERROR: could not create uniqu= e index "tbl_pkey_repacknew" > 2026-01-12 18:41:12.477 UTC client backend[54349] 007_repack_concurrentl= y.pl DETAIL: Key (i)=3D(942) is duplicated. > 2026-01-12 18:41:12.477 UTC client backend[54349] 007_repack_concurrentl= y.pl STATEMENT: REPACK (CONCURRENTLY) tbl; This is tricky. I could reproduce the problem on my FreeBSD box a few time= s, never on Linux (no idea if the OS makes the difference since HW is also qu= ite different, but CI also seemed to fail more often on FreeBSD.) Something seems to be wrong about UPDATE, but I'm failing to understand ho= w it could relate to REPACK. This is an example of a duplicate value i=3D6118 SELECT i, j, xmin, xmax, ctid FROM tbl WHERE i=3D6118; i | j | xmin | xmax | ctid = ------+--------+--------+--------+--------- 6118 | 445435 | 102317 | 103702 | (1,216) 6118 | 391135 | 103702 | 0 | (56,62) According to log, xid=3D102317 is the transaction used by REPACK and xid=3D= 103702 one of the test. pageinspect shows that the old version has not only HEAP_XMIN_COMMITTED in t_infomask, but also HEAP_XMAX_INVALID. So far I could not reproduce the duplicities with the REPACK (CONCURRENTLY= ) command commented out in the test script, but that does not prove much (ev= en with REPACK, not every run fails). Also I noticed that REPACK incorrectly = sets cmin/cmax to 1 instead of 0 and it needs to be fixed, but I have no idea w= hy this bug should cause exactly this weird behavior. I even added quite a few logging messages to reveal where in the code the HEAP_XMAX_INVALID flag is set for particular ctid, but after a failure I c= ould not find the message for the problematic tuples. Ideas are appreciated. -- = Antonin Houska Web: https://www.cybertec-postgresql.com