Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rd05p-003KaH-DH for pgsql-bugs@arkaria.postgresql.org; Thu, 22 Feb 2024 03:49:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1rd05l-007uUY-Pa for pgsql-bugs@arkaria.postgresql.org; Thu, 22 Feb 2024 03:49:50 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1rd05l-007uUQ-Dl for pgsql-bugs@lists.postgresql.org; Thu, 22 Feb 2024 03:49:50 +0000 Received: from mail-lf1-x130.google.com ([2a00:1450:4864:20::130]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1rd05e-000H0W-IV for pgsql-bugs@lists.postgresql.org; Thu, 22 Feb 2024 03:49:49 +0000 Received: by mail-lf1-x130.google.com with SMTP id 2adb3069b0e04-512bd533be0so1607946e87.0 for ; Wed, 21 Feb 2024 19:49:42 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708573781; x=1709178581; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Yv0MHQdqbQ+8xzapZsE6VB9DhgiUGSTz+8rrHP4itAY=; b=eHOzEsNLWgEGfu88CyDdnOI29rDYJVVf5j7H3mMNn3NH8kO5E+9rLOaUr67QFDDGgk 4JKS6092EmTxoHjSFrvratCFypQPQb1rj2HDqzYWH2i3FGc/slVSQzOfxJGd65QfmHUY 409VryObozoqa/twThtxfZL+87uKzUXBLT5nFZXql8JxlYWBY6ho123uSEiZePRUZEFn 8c/O7RHb8DEzl1Wnxzr5NhFoGXJ5Gfsr5WtostD6zaULjfdxFQ8kksxjop1GDcjCsyg9 6GVkTzl9eRThUT99VHHOaTJFnTXfDEF6I7qSY3OvHAuWstAxzj2U/2Rxfrtr9bErS43U LFcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708573781; x=1709178581; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Yv0MHQdqbQ+8xzapZsE6VB9DhgiUGSTz+8rrHP4itAY=; b=n8otQml1V2wGnBGXyDHAg5UcB1gCfxsJLbnAj+1s8P/BcAi6R8Vc/+E7CYMQG1HQeI NBzDrNHJ8tm0re+bSblOle2kza+FhsLRlnniRkK6cAURK9GWKoe1vXonqhKlazv0ls+K yVgRBdynkoz0EVLbIj/EoSWfiEfunngVQ/ckdaaMqFW8L4cKDuJvQm1lKBduhCKM/qUB PDgC2RVKuaLURUGQhde8w69Y2lWrgTcnpT6ydb1EKMbUxhunx0pRAB50C+KxkdaWeAr9 7i9i7uNzy3Q9802qCyLb9DRh9P0ugvEchpZ5t/jegr0XQ98b2O08GP3gAPzbv1zpp0WQ oPhw== X-Forwarded-Encrypted: i=1; AJvYcCXC+YVdWsuWiD2o9R6Z71Kf2ujPm/H+IuxfRtOaWL16ldySr1yfpUbPlmvl1irPKTJjxRfHjAkNemEywvIwgsbGBO2hQJ1nTMk8LFjlD94E X-Gm-Message-State: AOJu0YyLXSOU1eg9+qhACXzzJJB+BDqxGboHev2nICj9/t7ZMuZJ8gdJ Og6Xw1fT24uhdSDSXzLv2h6qJtL2bo3zaPlXwRSKsbZBkjdIlit2q0YkdRTeIk0ps8+TVqkTane XYVg7ZZBKH7mpMFM3XNgpn3IJwcp8rXG/vIU= X-Google-Smtp-Source: AGHT+IGehXja+eCiJIgAPJsn3hVNucs/AyNDzUJuxaU/aI5gvq7Y9Fc+sO9GLbIGL+vIbsQm+7X1Baw6qAEhIix/4hM= X-Received: by 2002:a05:6512:15a7:b0:512:a6cd:b37c with SMTP id bp39-20020a05651215a700b00512a6cdb37cmr9782730lfb.47.1708573780758; Wed, 21 Feb 2024 19:49:40 -0800 (PST) MIME-Version: 1.0 References: <18354-140864c09686b5a6@postgresql.org> <20240222.114600.1019580904613326727.horikyota.ntt@gmail.com> In-Reply-To: <20240222.114600.1019580904613326727.horikyota.ntt@gmail.com> From: Tender Wang Date: Thu, 22 Feb 2024 11:49:29 +0800 Message-ID: Subject: Re: BUG #18354: Aborted transaction aborted during cleanup when temp_file_limit exceeded To: Kyotaro Horiguchi Cc: exclusion@gmail.com, pgsql-bugs@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000004652c00611f0554e" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000004652c00611f0554e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable When the first error was reported, we entered AbortTransaction(), proc->xid was changed to InvalidTransactionId in ProcArrayEndTransactionInternal(). After AbortTransaction() is done, we enter into CleanupTransaction(), and error would be reported again. So we enter AbortTransaction() again because the transaction block state is still TBLOCK_STARTED. MyProc->xid now is InvalidTransactionId, so the assert failed. Changing the behavior of tuplestore can work for this issue, but I'm not sure if this change will affect other components which depend on BufFile(like nodeMaterial) Kyotaro Horiguchi =E4=BA=8E2024=E5=B9=B42=E6=9C= =8822=E6=97=A5=E5=91=A8=E5=9B=9B 10:46=E5=86=99=E9=81=93=EF=BC=9A > At Wed, 21 Feb 2024 12:00:01 +0000, PG Bug reporting form < > noreply@postgresql.org> wrote in > > triggers two errors, a warning, and an assertion failure: > > ERROR: temporary file size exceeds temp_file_limit (100kB) > > WARNING: AbortTransaction while in ABORT state > > ERROR: temporary file size exceeds temp_file_limit (100kB) > > server closed the connection unexpectedly > ... > > For the second error: > > 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|ERROR: > temporary > > file size exceeds temp_file_limit (100kB) > > 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|BACKTRACE: > > FileWrite at fd.c:2183:5 > > BufFileDumpBuffer at buffile.c:537:18 > > BufFileFlush at buffile.c:723:3 > > BufFileClose at buffile.c:419:9 > > tuplestore_end at tuplestore.c:459:5 > > MemoryContextSwitchTo at palloc.h:142:23 > > (inlined by) PortalDrop at portalmem.c:587:3 > > AtCleanup_Portals at portalmem.c:907:3 > > Therefore, BufFileClose should not flush the content during error > handling. In the first place, tuplestore doesn't need to flush the > underlying files in _end and _clear. In this case, I would choose to > change the general behavior of tuplestore. The attached PoC patch > fixes the issue for me. It introduces a new "extended" function to > control flushing, avoiding the addition of an unnatural parameter to > BufFileClose. I suspect that it is usable in some other places, but I > haven't checked that. > > regards. > > -- > Kyotaro Horiguchi > NTT Open Source Software Center > --=20 Tender Wang OpenPie: https://en.openpie.com/ --0000000000004652c00611f0554e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
When the first error was reported, we entered=C2=A0AbortTr= ansaction(),=C2=A0 proc->xid was changed to InvalidTransactionId in=C2= =A0ProcArrayEndTransactionInternal().
After AbortTransaction() is done,= we enter=C2=A0 into CleanupTransaction(), and error would be reported agai= n. So we enter AbortTransaction() again because
the transaction b= lock state is still TBLOCK_STARTED. MyProc->xid now is InvalidTransactio= nId, so the assert failed.=C2=A0

Changing the beha= vior of tuplestore can work=C2=A0for this issue,=C2=A0 but I'm not sure= if this change will affect other components which depend on BufFile(like n= odeMaterial)

Kyotaro Horiguchi <horikyota.ntt@gmail.com> =E4=BA=8E2024=E5=B9=B42=E6=9C=8822= =E6=97=A5=E5=91=A8=E5=9B=9B 10:46=E5=86=99=E9=81=93=EF=BC=9A
At Wed, 21 Feb 2024 12:00:01 += 0000, PG Bug reporting form <noreply@postgresql.org> wrote in
> triggers two errors, a warning, and an assertion failure:
> ERROR:=C2=A0 temporary file size exceeds temp_file_limit (100kB)
> WARNING:=C2=A0 AbortTransaction while in ABORT state
> ERROR:=C2=A0 temporary file size exceeds temp_file_limit (100kB)
> server closed the connection unexpectedly
...
> For the second error:
> 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|ERROR:=C2= =A0 temporary
> file size exceeds temp_file_limit (100kB)
> 2024-02-21 11:40:07.001 UTC|law|regression|65d5e116.13cfcb|BACKTRACE:= =C2=A0
> FileWrite at fd.c:2183:5
> BufFileDumpBuffer at buffile.c:537:18
> BufFileFlush at buffile.c:723:3
> BufFileClose at buffile.c:419:9
> tuplestore_end at tuplestore.c:459:5
> MemoryContextSwitchTo at palloc.h:142:23
>=C2=A0 (inlined by) PortalDrop at portalmem.c:587:3
> AtCleanup_Portals at portalmem.c:907:3

Therefore, BufFileClose should not flush the content during error
handling.=C2=A0 In the first place, tuplestore doesn't need to flush th= e
underlying files in _end and _clear. In this case, I would choose to
change the general behavior of tuplestore. The attached PoC patch
fixes the issue for me. It introduces a new "extended" function t= o
control flushing, avoiding the addition of an unnatural parameter to
BufFileClose. I suspect that it is usable in some other places, but I
haven't checked that.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center


--
Tender Wang
OpenPie:=C2=A0=C2=A0https://en.openpie.com/
--0000000000004652c00611f0554e--