Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5vbJ-003loQ-04 for pgsql-bugs@arkaria.postgresql.org; Fri, 27 Mar 2026 01:03:01 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w5vbG-006uuv-0R for pgsql-bugs@arkaria.postgresql.org; Fri, 27 Mar 2026 01:02:58 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w5vbF-006uun-2r for pgsql-bugs@lists.postgresql.org; Fri, 27 Mar 2026 01:02:58 +0000 Received: from sss.pgh.pa.us ([68.162.161.243]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1w5vbD-00000001O7h-2IpL for pgsql-bugs@lists.postgresql.org; Fri, 27 Mar 2026 01:02:58 +0000 Received: from sss1.sss.pgh.pa.us (localhost [127.0.0.1]) by sss.pgh.pa.us (8.15.2/8.15.2) with ESMTP id 62R12pcF1106027; Thu, 26 Mar 2026 21:02:51 -0400 From: Tom Lane To: kuzmin.db4@gmail.com cc: pgsql-bugs@lists.postgresql.org, David Rowley Subject: Re: BUG #19438: segfault with temp_file_limit inside cursor In-reply-to: <19438-9d37b179c56d43aa@postgresql.org> References: <19438-9d37b179c56d43aa@postgresql.org> Comments: In-reply-to PG Bug reporting form message dated "Wed, 25 Mar 2026 13:27:49 -0000" MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----- =_aaaaaaaaaa0" Content-ID: <1105830.1774573242.0@sss.pgh.pa.us> Date: Thu, 26 Mar 2026 21:02:51 -0400 Message-ID: <1106026.1774573371@sss.pgh.pa.us> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------- =_aaaaaaaaaa0 Content-Type: text/plain; charset="us-ascii" Content-ID: <1105830.1774573242.1@sss.pgh.pa.us> PG Bug reporting form writes: > I experimented with setting temp_file_limit within a cursor and discovered a > segmentation fault under certain circumstances. > The issue exist in the current minors of 14 and 15 (14.22 and 15.17), but I > was unable to reproduce it in version 16 or higher. > To reproduce, simply run the following code. > begin; > declare cur1 cursor for select c, c c2 from generate_series(0, 1000000) > x(c) order by c; > \o /dev/null > fetch all from cur1; > set temp_file_limit TO '1MB'; > fetch backward all from cur1; > rollback ; Many thanks for the report! I confirm your results that this fails in v14 and v15 but not later branches. However, I'm quite mystified why v16 and v17 don't fail. The attached patch fixes it in v15, and I think we need to apply it to all branches. What is happening is that the last FETCH is trying to fill the holdStore of the Portal holding the FETCH execution, and we soon run out of work_mem and start dumping the tuples into a temp file. While doing that, we run up against the temp_file_limit and fd.c throws an error. This leaves the Portal's holdStore in a corrupted state, as a result of the oversight described and fixed in the attached patch: we've already deleted some tuples from its in-memory array, but the tuplestore's state doesn't reflect that. Then during transaction abort we must clean up the tuplestore (since it's part of a long-lived data structure), and tuplestore_end therefore tries to delete all the tuples in the in-memory array. Double free. Kaboom. At least, that's what happens in v15 and (probably) all prior branches for a long way back. v18 and later fortuitously avoid the failure because they got rid of tuplestore_end's retail tuple deletion loop in favor of a memory context deletion (cf 590b045c3). v16 and v17 *should* fail, but somehow they don't, and I don't understand why not. I bisected it and determined that the failures stop with c6e0fe1f2a08505544c410f613839664eea9eb21 is the first new commit commit c6e0fe1f2a08505544c410f613839664eea9eb21 Author: David Rowley Date: Mon Aug 29 17:15:00 2022 +1200 Improve performance of and reduce overheads of memory management which makes no sense whatsoever. Somehow, we are not crashing on a double free with the new memory chunk header infrastructure. David, have you any idea why not? Even though no failure manifests with this example in v16+, we are clearly at risk by leaving corrupted tuplestore state behind, so I think the attached has to go into all branches. regards, tom lane ------- =_aaaaaaaaaa0 Content-Type: text/x-diff; name="0001-fix-tuplestore-corruption-15.patch"; charset="us-ascii" Content-ID: <1105830.1774573242.2@sss.pgh.pa.us> Content-Description: 0001-fix-tuplestore-corruption-15.patch Content-Transfer-Encoding: quoted-printable diff --git a/src/backend/utils/sort/tuplestore.c b/src/backend/utils/sort/= tuplestore.c index f605ece721e..f12e8d23a9c 100644 --- a/src/backend/utils/sort/tuplestore.c +++ b/src/backend/utils/sort/tuplestore.c @@ -1221,6 +1221,17 @@ dumptuples(Tuplestorestate *state) if (i >=3D state->memtupcount) break; WRITETUP(state, state->memtuples[i]); + + /* + * Increase memtupdeleted to track the fact that we just deleted that + * tuple. Think not to remove this on the grounds that we'll reset + * memtupdeleted to zero below. We might not get there, if some later + * WRITETUP fails (e.g. due to overrunning temp_file_limit). If so, + * we'd error out leaving an effectively-corrupt tuplestore, which + * would be quite bad if it's a persistent data structure such as a + * Portal's holdStore. + */ + state->memtupdeleted++; } state->memtupdeleted =3D 0; state->memtupcount =3D 0; ------- =_aaaaaaaaaa0--