Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wBEVo-000riJ-1q for pgsql-hackers@arkaria.postgresql.org; Fri, 10 Apr 2026 16:15:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wBEVm-00D6QP-2E for pgsql-hackers@arkaria.postgresql.org; Fri, 10 Apr 2026 16:15:15 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wBEVm-00D6QG-0i for pgsql-hackers@lists.postgresql.org; Fri, 10 Apr 2026 16:15:15 +0000 Received: from mail-yw1-x112d.google.com ([2607:f8b0:4864:20::112d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wBEVk-00000000Ni6-36PT for pgsql-hackers@lists.postgresql.org; Fri, 10 Apr 2026 16:15:14 +0000 Received: by mail-yw1-x112d.google.com with SMTP id 00721157ae682-79f855b2575so25793397b3.2 for ; Fri, 10 Apr 2026 09:15:12 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1775837711; cv=none; d=google.com; s=arc-20240605; b=gj/5BYe1eW5EWKuKV1hlSPneVtkN+Uz6Tyh40mw8vlBzhS1UjXC3s/zdZnFt/TMrmt YGec1uZh4dl39tfpjv2oLD208KAXzHd0b/qMPf+KPtECJit7X6TdJYoImWVDCOk3Jq7I CBod/N9YNjtLrugkPU8uaCbq5AruG7Nws7xutg20t+fgu5mME0X24iQXMSlBobqRv++6 BlHjF0pbcz9ykP3+wNsCtqwzzPiTSzTVP5XBSHif0lzdrCV7FEI2ZldPVaL2TjuLvnpj cPtI1M+llin0tVynEENowPnUTr9AvcacHo2O09IkZ0L7jSVbsXn2QF/Gu1AZM7EyDaHd g82A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=+R8MNH4AfFmljjG8bun85Sw/zvKZ6im2eAJy2h8DGzk=; fh=YCaa3KSouF1yllZKlqTxpsBPsF5jdctdgtf9C273mrw=; b=M7OunvyE6+kIYm0Zu+W0GOoJFg7TziXsUEQ9e2ZMufdWMjJnoqw8FSWSIYKpojZhT2 5VE5gQvx5kcqePWroqApwmDinsAMLltZxyLSLgOpTVcchGY9jmSjDpQ7li0C4zPMZ2fq kQVaWvetbEn9qJ8+H4loA0HYnyjg+snvlKfHVnPmWoaVESGQeqZ15ILLIucO24DKmw/0 T/CvCIYaO2+bNUynziOtXvNgq2dix0b28GoWvrnu0WEf/lT4cUN4f5SJfkraTJR5FplG QQru2G5BPx89YLPlococajn/+dRVM7/TGxKgHBC6jLBME46jjkty7DhHyvRq3QychIYw W/Xg==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xzilla-net.20251104.gappssmtp.com; s=20251104; t=1775837711; x=1776442511; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+R8MNH4AfFmljjG8bun85Sw/zvKZ6im2eAJy2h8DGzk=; b=IsVAdsl9k4YMaCzbsN9LdUgytSz66Mj38kAdYmvJH+j80t7nrDcnVowtD+WAeN8YeW /B31A0Oy+BATdq8x2UiXfko3rVVR77lN9V+yaQEEje509K3ZlO3nWLFWun+aP7igdrP9 XeFl1YQYdH7x94BQ941ns0XFZynbPlvSwUUeh0PAF8RPcyDCg+JaCM0YdSPKEXfnA+YP J/2pexJ+DmleyyF4sf1MbjPPLh9tcFAHT4dUJRtvR7GEzDBV46xxdO7Yoxb3atm8be3G /VP56mkAxfOQhwcN2/Nu6c344EhF4Hm9PbibZOQj1BQ45C6iykMHVJ7aQhB3kGl+SYs2 q3HQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1775837711; x=1776442511; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=+R8MNH4AfFmljjG8bun85Sw/zvKZ6im2eAJy2h8DGzk=; b=CNLGOQCBxhUX6XMbF8Ra979qHby9VZRp2Qeafy9+Rqg4Jx+jNZ7o+15VpAy5Z7UHKO mX+9vIeLg4kU/cAFVzZx7YtYvmVKJWchs8MzWV8FMCONMeXWmeRC4j8TSQJH1MFJrSX9 EbvanxQ2XaQd4fdhxGOMgNTVdTyXe+3pWqjFjcNf3Hod+mqYNGnrjcegRrz3++Vbzm81 tlSfHxIsagFU8cmleghplUQHMq1euF5BeA3YI7t9+IOElLbS9D5KIz2Zjjf1qWnky8xp LYn05oiBC2cXIUZ/lI/tYWixHsv5X1/RxYU6Utt44x8tdLvEmnlNzbCAAnEHVxI0HRDf bpFw== X-Forwarded-Encrypted: i=1; AJvYcCV8ZW2tOZFMsSQrLvDGgzEZTE1XxsgGMmzVVBqlsX7qFda8nx93OMPJ72GtlGpfbpAsKaXvlzriR7cRIYLS@lists.postgresql.org X-Gm-Message-State: AOJu0YwgcJjdUK9SjpwScaDq0phsslxpRen/DvTqw80eLH5/UHJQVZ0C D+KUNOWFG0EYUB9GpfgU4F7tZvDcfI0jwgqPJxxLaD4W4MjSwkuVUr9ZWX+WJIUedF41oOcJ77l oXXhM8WiMv2mQYpZRaYle55+XCH0mQR3P90/CoZtMHA== X-Gm-Gg: AeBDiev4/LU5wLAuu8XiJm2ETuckPqRSpxFDYhLE584lvB9JgGiTTbmj3VOhJNz60sx cK+Q7cnyb/cy4LvEdvEPHhx9C0hzonQVW12gAdtnvnsFIcv+h98YSKEGh2dg7rapFx0Xsrs9rYO KqZ/Fwgh4Hlewy2LG3pWzEf7HKFsvL48E316K31AJCuzLSFcwzqGZpCL5SzfY3W2GonwTnSafIz 4cIReQlEi5tS1C0Ewbq9rRUq1pWTlOvXr3OZWzYq8r1RyX/9VhEzSmhLkRLrJSvlzugVUB1445o KBlIm5M= X-Received: by 2002:a05:690c:101:b0:7a1:6e8:d8fb with SMTP id 00721157ae682-7af6f03b254mr41936827b3.3.1775837711140; Fri, 10 Apr 2026 09:15:11 -0700 (PDT) MIME-Version: 1.0 References: <202604062213.cgo352cdsgsm@alvherre.pgsql> <4n4q3preb3lgyhpzstebhux7b2aojhsw7gik4ivaznyggiezrs@lrznutssxlh2> <9539.1775724194@localhost> In-Reply-To: From: Robert Treat Date: Fri, 10 Apr 2026 12:14:59 -0400 X-Gm-Features: AQROBzCLbNbS-aFiB_7V9RO3wslm1oqdE2ZLWoG_8sfhcojmWtlig2ZE2x1ynOc Message-ID: Subject: Re: Adding REPACK [concurrently] To: Andres Freund Cc: Antonin Houska , Amit Kapila , Mihail Nikalayeu , Alvaro Herrera , Srinath Reddy Sadipiralla , Matthias van de Meent , Pg Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Apr 9, 2026 at 10:20=E2=80=AFAM Andres Freund = wrote: > On 2026-04-09 10:43:14 +0200, Antonin Houska wrote: > > What Andres proposed (AFAIU) should help to avoid this problem because > > REPACK's request for AEL would get in front of the VACUUM's request for= SUEL > > in the queue. > > Note that that already happens today. > > This works today (without the error triggering patch): > > S1: REPACK starts > S2: LOCK TABLE / VACUUM / ... starts waiting > S1: REPACK tries to get AEL > S1: REPACK's lock requests get reordered in the wait queue to be before S= 2 and > just gets the lock > S1: REPACK finishes > S2: lock acquisition completes. > > That's because we do already have this "jumping the wait queue" logic, wh= ich I > had forgotten about. > You know, I was wondering how this wasn't already a problem for pg_repack/pg_squeeze, and I guess this explains it :-P > > What does *not* work is this: > > S1: REPACK starts > S2: BEGIN; SELECT 1 FROM table LIMIT 1; > S2: LOCK TABLE / VACUUM / ... starts waiting > S1: REPACK tries to get AEL > S1: lock is not granted, can't be reordered to be before S2, because S2 h= olds > conflicting lock, deadlock detector triggers > S2: lock acquisition completes > > But with my proposal to properly teach the deadlock detector about assumi= ng > there's a wait edge for the eventual lock upgrade by S1, the first exampl= e > would still work, because the lock upgrade would not be considered a hard > cycle, and the second example would have S2 error out. > In the above S2 will error out if you try to run a VACUUM, but the point still stands that calling an explicit LOCK or similar could lead to this issue. In the current repack world, we document the need for lock escalation at the end of the repacking and caution that doing things like DDL or explicit LOCKing could cause trouble, so don't do that. What you're proposing above would be an improvement though, IMHO. > > > Anti-wraparound (failsafe) VACUUM is a bit different case [1] (i.e. it = should > > possibly have higher priority than REPACK), but I think this prioritiza= tion > > should be implemented in other way than just letting it get in the way = of > > REPACK (at the time REPACK is nearly finished). > > Yea, it makes no sense to interrupt the long running repack, given that t= he > new relation will have much less stuff for vacuum to do. > We might be talking about 2 different scenarios. In the case where we are at the point of lock escalation, you would probably want the repack to get priority over a waiting vacuum, even a failsafe vacuum. But outside of that scenario, we can't know that the repack is the better option (and statistically it probably isn't) since a repack that is actively copying rows might still need to rebuild some large number of indexes (or just some really expensive index) which could take significantly longer than a failsafe vacuum would need to ensure wraparound avoidance. I don't think we'd go as far as saying the failsafe vacuum should cancel the repack, but I think ideally we'd like it to not be canceled either, since that would increase likelihood for dba/monitoring to pick up on the situation, and in the case that REPACK fails for some reason, the failsafe vacuum could immediately start working without having to go through any additional hoops. Robert Treat https://xzilla.net