Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wD1ZJ-002WxC-1g for pgsql-hackers@arkaria.postgresql.org; Wed, 15 Apr 2026 14:50:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wD1ZI-00HZAR-2Q for pgsql-hackers@arkaria.postgresql.org; Wed, 15 Apr 2026 14:50:16 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wD1ZI-00HZAB-1L for pgsql-hackers@lists.postgresql.org; Wed, 15 Apr 2026 14:50:16 +0000 Received: from mail-wm1-x32e.google.com ([2a00:1450:4864:20::32e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wD1ZG-00000001EiL-06Nh for pgsql-hackers@lists.postgresql.org; Wed, 15 Apr 2026 14:50:16 +0000 Received: by mail-wm1-x32e.google.com with SMTP id 5b1f17b1804b1-488b0e1b870so106285615e9.2 for ; Wed, 15 Apr 2026 07:50:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1776264613; x=1776869413; darn=lists.postgresql.org; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:from:to:cc :subject:date:message-id:reply-to; bh=3uLc1TAovks29Jfr+frYfW80z81J2StAVKVWUcdC4SQ=; b=epZ00odukV5CjubTPv6NyN5ySmetoqnaLwJdDA0T/QKtCcxtLrn0MAIGxyFAMPBDCf BX8WbbPF5UYRBZkozeTE9Z9r/WBKKzAWwmyjrpDRaYsUii45VcmyDbehKyvx27LOfvVT a/j+v8yGbNgQi8VxGfGfGo2Xf5yrZ6j2VBH5MMYb+NqqUEWeebThV+kMuTf3Ds244NdX fxBy8kL7umYy1qUDHQ74X9R1y2wGY46md7jkaBicXFNJD80OkDTWDY0pKvnKlIDPpIcR ZFLcq8kHRM/buIRVyGJx2hTaeeYd3pZF3xrEIEw7sAQ9xia7y5xNZ2sV9xi0+JF+I7lf 1knQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1776264613; x=1776869413; h=message-id:date:content-transfer-encoding:content-id:mime-version :comments:references:in-reply-to:subject:cc:to:from:x-gm-gg :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3uLc1TAovks29Jfr+frYfW80z81J2StAVKVWUcdC4SQ=; b=qittUuC/SG4Hu3b4pfNxMHYiUT5GATJXsQitpUJUPMbizSJjXdci3scK6imfC6XAEl Xy6WdHZUf3FdAfTeGGyhcWKM1stnLc0vcno3EyHXENZdoFv1DoWhJUKv+arJNpa8S+LU tz6+8qzD7Qs+8+sDA7vjZU03k94RBE7pOXLpyTDoGSIz0IGw5PzE5jB/CBNplN1QZUPj UZZsRJfPgV5DDFjuZsY49joRoIUgVmZCrBOSrcc9xj2K6HEStf5ayrjRurxk/A/z32NY o7ro2DeoohIVgKK4Nl4xu5I4LcvIeXKnzeod1eDx+m93tljrSSuLsK3jjO29HJSmlOJ1 wnjA== X-Forwarded-Encrypted: i=1; AFNElJ+gvLVLF3hYjbKwCqUMjiGQulcg5WruxZs0naEsT51OWhThAjAE1JhGoWzcYlNGlDq4skxy93Pc28hd+0zK@lists.postgresql.org X-Gm-Message-State: AOJu0Ywp1+DJ73e1ONCBP723X4AOt8wuO4iDjeCio0nmYrngl/6k+btM SLhEXrNHnATgDZMYe9rCgjbEc87uKoAOZ0DwYhMYXGLzKyDQjpPxHpS9XtQKPYwqCrI= X-Gm-Gg: AeBDieuUTSWL26b4IMl+sSZQt9zGL/nW+IlL6snH0uomZuoipR1tC/q34vBbms+8MpM bS++VBoizJSL0AVCx3TTVCzkLVCsXrJ4XhnsMmbMVb7TvZi7bzvSjRJnNhfZj5KuxQWvSsSbLqN XSYbmpGrBhgwm1OyvKhZ4CZ6BxrtudIN4tHFGJvMZWojD8PFJ+28IfqG7WagHaIJL/1Zs6dvTpY dW1hMp314LqG7sxkxHFpPVpsGVqV5wCrj5sqcbmxNoftU4OZz/GZDcgvahx8ZTK1CCwjGLSbIVe uY6iqSOvoiIEFe3S4QFYsc3yK5BB7A1JK0bqYOUc1RS2KGPL2pGthCHtIzwOLiXlJo20ePxDv8H lM/vr/K+7WCvm/5W8/d0mdOBv70MGjmLBt1nYPtp7DOSUUtImo34tPIt2Tm4zJ63IAnJCnnLXc3 ki26sltgHJ7gIG0zO9PPPWZyDfTWYkjaiqFl93HvsiKdpnKmo= X-Received: by 2002:a05:600c:a415:b0:487:1108:48af with SMTP id 5b1f17b1804b1-488d67bf7c7mr230872795e9.4.1776264612660; Wed, 15 Apr 2026 07:50:12 -0700 (PDT) Received: from localhost (109-81-168-142.rct.o2.cz. [109.81.168.142]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-488f1deb9besm55322925e9.6.2026.04.15.07.50.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 15 Apr 2026 07:50:11 -0700 (PDT) From: Antonin Houska To: Andres Freund cc: Mihail Nikalayeu , Amit Kapila , Alvaro Herrera , Srinath Reddy Sadipiralla , Matthias van de Meent , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: References: <4n4q3preb3lgyhpzstebhux7b2aojhsw7gik4ivaznyggiezrs@lrznutssxlh2> <9539.1775724194@localhost> <112208.1776173876@localhost> Comments: In-reply-to Andres Freund message dated "Tue, 14 Apr 2026 09:58:55 -0400." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <25513.1776264611.1@localhost> Content-Transfer-Encoding: quoted-printable Date: Wed, 15 Apr 2026 16:50:11 +0200 Message-ID: <25514.1776264611@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Andres Freund wrote: > On 2026-04-14 15:37:56 +0200, Antonin Houska wrote: > > Andres Freund wrote: > > = > > > On 2026-04-12 15:31:20 +0200, Mihail Nikalayeu wrote: > > > > Instead of cancelling the backend entered the deadlock detector - = it > > > > cancel some another (nearest hard edge) until it is possible to ge= t the > > > > lock (either by > > > > reordering or directly). > > > = > > > I don't think that's as good. The problem is that that way you're o= nly > > > detecting the deadlocks once they have materialized (i.e. once repac= k actually > > > does the lock upgrade), rather than cancelling when we know that the= problem > > > starts. > > = > > This is my hack that tries to do that. > = > I still think this needs to be in the deadlock detector. The lock cycle= just > needs to be a bit more complicated for a hack in JoinWaitQueue not to wo= rk. > There's no guarantee that the wait that triggers the deadlock is actuall= y on > the relation being repacked. ok, I see. I thought of a "hypothetical graph", which would include the to-be-granted lock, but the major issue is that it will not work correctly without the locking the LMGR's LW locks we do in CheckDeadLock(): for (i =3D 0; i < NUM_LOCK_PARTITIONS; i++) LWLockAcquire(LockHashPartitionLockByIndex(i), LW_EXCLUSIVE); And obviously, doing this each time we want to insert a lock into the queu= e would be bad for performance. It's even mentioned in the storage/lmgr/READ= ME that the current approach is optimistic, so I think that major rework woul= d be needed if we wanted to entirely avoid waiting that leads to deadlock. The approach proposed by Mihail [1] seems the least problematic to me, and something like that occurred to me when I thought about the problem the fi= rst time. However, when we wake up the other processes in order to run the deadlock detection, they should do that immediately. I've got no good idea about implementation at the moment, since latch can be set for unrelated reasons. (Besides that, I have some more questions about this patch, which= I can post separately.) [1] https://www.postgresql.org/message-id/CADzfLwURKVNQ%2B%2BDpi7bjoGfj-8p= chDQEVex3eWBx0NCYn6TbDQ%40mail.gmail.com -- = Antonin Houska Web: https://www.cybertec-postgresql.com