Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vkgle-00104M-29 for pgsql-hackers@arkaria.postgresql.org; Tue, 27 Jan 2026 10:57:56 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vkgld-00DGxT-27 for pgsql-hackers@arkaria.postgresql.org; Tue, 27 Jan 2026 10:57:54 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vkglc-00DGxJ-39 for pgsql-hackers@lists.postgresql.org; Tue, 27 Jan 2026 10:57:53 +0000 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vkglW-00000000h2c-3RME for pgsql-hackers@lists.postgresql.org; Tue, 27 Jan 2026 10:57:53 +0000 Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-432d2670932so5060310f8f.2 for ; Tue, 27 Jan 2026 02:57:45 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1769511460; x=1770116260; darn=lists.postgresql.org; h=message-id:date:mime-version:comments:references:in-reply-to :subject:cc:from:from:to:cc:subject:date:message-id:reply-to; bh=kMf7dmaMIwOLkGrItF8bprOGxe6ai/F4dIov6qgVAbs=; b=MPGiExKKuRuJAt9iQohBUOUYFJGOw6Z1mw8tcQZfjQ8PEiktDVvVHAaCR4S1HgkQbD 7wksZzoCBbtCSqKTu7ZolR3O2oVeIWH2rQd/Aw6Au5lKy1bFYg6CyTNkYiAajLd7GKPt vwTKnSfs6ie+XnQ+YAaAr9xvzZunU4T6xiEszCu1A1+R2fkOeWDA0FHHodNPBXJwSZnf +ns8h7KB8+rDKs+gGRcTVucaoGhvvTJXIUVsRICYf443gzQxGPjUIN24L/KnT4f1oBDM ga6hbjP5K3zoLBwszlXrqtEt6l0puvAUPQ4Qneydr3GiZpx7MFQt9tNQtd1ypK0L3y+d 68WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769511460; x=1770116260; h=message-id:date:mime-version:comments:references:in-reply-to :subject:cc:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=kMf7dmaMIwOLkGrItF8bprOGxe6ai/F4dIov6qgVAbs=; b=Z3h52Pbp8EtOG2GgI320jm5vzVaelVYLA09BoYcDi0x9VqC+bt/+lDRK8t4RvO0i7x SsxMb2nnj76t1zEsLEst7Rv+EToF8JUX4q06it0ziXaFE3IOqJJeAKpWUhpqmSJ88N8Z 6OZcUiRlf8jnknIkQwpUDq+pc5KFA1Uh8uC64RTtoECu5R0/FhzFQpuOzY1lGWv7rg6d ZDYQv5ARXVxCfl8aexxhdTqEffQ2uljCxxCr/oGlIzMB6FLZRe9i/gGJSehrZfx+Ckv2 phH2Eu+YLG3tITfQAA1kiBn68xXeBYeMVc6/W8BClELmCEgld9PMyjNlAqkOKfKgMdzS FLjQ== X-Forwarded-Encrypted: i=1; AJvYcCV1nJTAhcxaXULBZV+42oWj543BjuuzWNqwKp67+GKb+eRSxEBsEuL+AWNoBnOZLSZK4L2BjQ5YLcfqDT+a@lists.postgresql.org X-Gm-Message-State: AOJu0YzDB+ewDEuBfSfydSKNWX57+zBq44N0tU7Y1UyifLgsnRD/IDDT A2i+G1GiB9hRcQUSRN7OXoeALFv6A8DREuPSoS7FaoTZzqVD2PHUOpNSEWcwp31FH7w= X-Gm-Gg: AZuq6aK9hQ9pQ8OG8jlFqK9yGWYhNM24/8ahcfqDIBfxyZnBYlXfVno6TJlTAhCwey5 02cm26hXbTvWf0IOW9hEpXh+CtcZexXSK6xMUUDHx0ggfhXDCmXIyifNBwqFpCNTB+Wol4RYhAa /JJnxMQb4Vw4YJEnGQZqySWAH9tYMXL2PfI5dxFhKnumERcKrSgv8KBEeIleEc/yf52w1+fF9dH vkkvjZfpsFT5EyFt8im8gGT/F0JeET1fJx6xC7KUBvhIvNzwGgvVA+CqBvVtZ9u3BlUqXCYEsa1 GXWOE0ZiRwMVfV8ZBsm8X1J6tE8i8f5EZmRsliyp83fBjMNvpJ67oUYQxh7ws6FD8bogwbKOhD2 2G6MtYa4IxNCzdHGGBzHFQm+l2opeTjN91WxLhgjbzKdHNGSs6dljuRVyqZu8Qqs5BZDAiIa4x2 XFyYL3bnHN1O6VrGf72V76OmSB X-Received: by 2002:a05:6000:4028:b0:435:add0:3d76 with SMTP id ffacd0b85a97d-435dd1cb92cmr2014206f8f.56.1769511459413; Tue, 27 Jan 2026 02:57:39 -0800 (PST) Received: from localhost (109-81-168-246.rct.o2.cz. [109.81.168.246]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-435b1f7c269sm37085502f8f.43.2026.01.27.02.57.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 27 Jan 2026 02:57:37 -0800 (PST) From: Antonin Houska cc: Mihail Nikalayeu , Alvaro Herrera , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: <3901.1769412880@localhost> References: <202512151349.vlq3mpfniyk3@alvherre.pgsql> <11247.1767609087@localhost> <11558.1767609632@localhost> <141054.1767891540@localhost> <137668.1768235610@localhost> <74802.1769071060@localhost> <3901.1769412880@localhost> Comments: In-reply-to Antonin Houska message dated "Mon, 26 Jan 2026 08:34:40 +0100." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Date: Tue, 27 Jan 2026 11:57:36 +0100 Message-ID: <88003.1769511456@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --=-=-= Content-Type: text/plain Antonin Houska wrote: > Mihail Nikalayeu wrote: > > > PART 2: > > > > I have continued working with stress tests. This time I added your WIP patch to fix the LR\CLOG race. > > > > I made the following configs: > > 1) just REPACK CONCURRENTLY - ok > > 2) + relcheckxmin (see PART1) - ok > > 3) + worker - ok > > 4) + multiple snapshots - broken in multiple ways. > > > > You may see example of run here - https://cirrus-ci.com/build/6359048020295680 > > > > Some examples: > > > > 1) 'pgbench: error: client 11 script 0 aborted in command 20 query 0: ERROR: could not read blocks 0..0 in file "base/5/16414": read only 0 > > of 8192 bytes > > 2) at /home/postgres/postgres/contrib/amcheck/t/008_repack_concurrently.pl line 51. > > [15:36:37.204] # 'pgbench: error: client 5 script 0 aborted in command 28 query 0: ERROR: division by zero > > 3) 'pgbench: error: client 12 script 0 aborted in command 6 query 0: ERROR: cache lookup failed for relation 17400 > > Thanks, I'll check these. PROC_IN_VACUUM shouldn't be used for the same reason StartupDecodingContext() avoids setting PROC_IN_LOGICAL_DECODING in transaction. I've removed that and the tests work for me. Especially the "cache lookup failed" error is almost certainly related. Please let me know if you still get the other errors (Except for 2, which is probably due to the MVCC-unsafe behavior, as discussed earlier.) The 0006 part needs more work (definitely beyond PG 19). For now I've summarized the problem in the code this way: + * As there is no snapshot, our xmin should be invalid now. + * + * TODO xid can still be valid. We can mark our transaction with the + * PROC_IN_VACUUM flag, but at the same time we need to make sure that + * anything we write is ignored by VACUUM: since our xid is >= xmin of + * our replication slot, the slot does not help. Other transaction + * might use their RecentXmin to check if our xact is still running + * (see TransactionIdIsInProgress) before they check CLOG. By using + * PROC_IN_VACUUM we'd let their RecentXmin skip our xid. Thus our + * xact would appear not running anymore, but not yet marked committed + * in CLOG either, therefore aborted: it's o.k. for VACUUM to clean up + * tuples written by aborted transaction. + * + * Perhaps we can add a new field 'relisvalid' to pg_class and + * something alike to pg_index and make sure that neither queries nor + * VACUUM can use tables / indexes which do not have this flag set + * (The existing pg_index(indisvalid) field probably should not + * control whether VACUUM is allowed or not). Then we can do the + * catalog changes in separate transactions. Only the transaction that + * copies the heap would then use the PROC_IN_VACUUM flag. However, + * even then it would probably be appropriate to do regular + * (MVCC-safe) rewriting, i.e. avoid setting the xid of the rewriting + * transaction in the tuple headers. Thanks for your testing! -- Antonin Houska Web: https://www.cybertec-postgresql.com --=-=-= Content-Type: text/x-diff; charset=utf-8 Content-Disposition: attachment; filename=v32-0001-Add-REPACK-command.patch Content-Transfer-Encoding: quoted-printable