Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wIOQX-0082v7-2k for pgpool-hackers@arkaria.postgresql.org; Thu, 30 Apr 2026 10:15:26 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wIOQW-00716C-0M for pgpool-hackers@arkaria.postgresql.org; Thu, 30 Apr 2026 10:15:24 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wIOQV-007164-2m for pgpool-hackers@lists.postgresql.org; Thu, 30 Apr 2026 10:15:23 +0000 Received: from meldrar.postgresql.org ([2a02:c0:301:0:ffff::31]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.98.2) (envelope-from ) id 1wIOQT-00000003yG8-0LMc for pgpool-hackers@lists.postgresql.org; Thu, 30 Apr 2026 10:15:23 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=postgresql.org; s=20171124; h=Content-Transfer-Encoding:Content-Type: Mime-Version:From:Subject:To:Message-Id:Date:Sender:Reply-To:Cc:Content-ID: Content-Description:In-Reply-To:References; bh=zK70AFK0OCkQVyRD5TVP1T9+EWT9ewaI6oiOpHtpci4=; b=VZUKLgcopXUTPoEpfq90UmLBXa 2RDBVKAAmcZq0l3vHGF/LyLZQUEqDFPWL6Gzk/z58gVh8mXp7eVMrmWD2kwGqFkU4+dSU4O14icw2 tZnaaW7fIxK7Psdx676LTE50eAqdVGpY8MXBwCiSZqtZKKzCF3Vg2HirHeSjoQ2Kvqq9GmzMA1cW7 pRNlkIw3qR6wkup9Rn72gKfPXy7RtQQ4kpAVJcYxw8KehVyXoChHPzSLNzB4KQb0SKLv0XAPaMK9S aiELWVwUO0jcRWfSIDPr98XCmvefXap5NH/jvRAKRQku51VEzhXNViHvjnVHL0eqvaXwYwfUpAaBj +WtwVzmg==; Received: from [2409:11:4120:300:54bc:305:6133:5c] (helo=localhost) by meldrar.postgresql.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wIOQQ-009Ocb-1N for pgpool-hackers@lists.postgresql.org; Thu, 30 Apr 2026 10:15:20 +0000 Date: Thu, 30 Apr 2026 19:15:09 +0900 (JST) Message-Id: <20260430.191509.901412343921235299.ishii@postgresql.org> To: pgpool-hackers@lists.postgresql.org Subject: 120.memory_leak_extended_memqcache fails on master From: Tatsuo Ishii X-Mailer: Mew version 6.8 on Emacs 29.3 Mime-Version: 1.0 Content-Type: Multipart/Mixed; boundary="--Next_Part(Thu_Apr_30_19_15_09_2026_242)--" Content-Transfer-Encoding: 7bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2409:11:4120:300:54bc:305:6133:5c (failed) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ----Next_Part(Thu_Apr_30_19_15_09_2026_242)-- Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Summary: - Recent 120.memory_leak_extended_memqcache regression test failures on master branch do not mean we have memory leaks with query cache and/or extended query processing. It's false positive because the test script is not prepared for child process going away by FATAL error. - The FATAL error is caused by an oversight in commit a350afd70 "Feature: send sync message only to necessary backends.". - Proposed fix attached. Recently 120.memory_leak_extended_memqcache regression test occasionally fails on master. So I dug into the buildfarm log and found following error. Note that in this case the load balance node is 1. If it is 0, the test succeeds. The test runs pgbench and the log below emitted at the end of bench marking. When pgbench finishes, pgpool executes queries registered in reset_query_list. "DISCARD ALL" comes from it. 2026-04-29 01:00:58.287: pgbench pid 62605: LOG: DB node id: 1 backend pid: 62625 statement: Sync 2026-04-29 01:00:58.287: pgbench pid 62605: LOG: DB node id: 0 backend pid: 62624 statement: DISCARD ALL 2026-04-29 01:00:58.287: pgbench pid 62605: LOG: DB node id: 1 backend pid: 62625 statement: DISCARD ALL 2026-04-29 01:00:58.288: pgbench pid 62605: LOG: pool_send_and_wait: Error or notice message from backend: DB node id: 0 backend pid: 62624 statement: "DISCARD ALL" message: "DISCARD ALL cannot run inside a transaction block" 2026-04-29 01:00:58.288: pgbench pid 62605: WARNING: packet kind of backend 1 ['C'] does not match with main/majority nodes packet kind ['E'] 2026-04-29 01:00:58.288: pgbench pid 62605: FATAL: failed to read kind from backend 2026-04-29 01:00:58.288: pgbench pid 62605: DETAIL: kind mismatch among backends. Possible last query was: "DISCARD ALL" kind details are:0[E: DISCARD ALL cannot run inside a transaction block] 1[C] 2026-04-29 01:00:58.288: pgbench pid 62605: HINT: check data consistency among db nodes The FATAL error terminated child process 62605, caues the test failure. From 129 test's test.sh: # run pgbench for a while $PGBENCH -M extended -S -T 30 test after_size=`ps l $pid|tail -1|awk '{print $7}'` delta=`expr $after_size - $init_size` echo "initial process size: $init_size after size: $after_size delta: $delta" test $delta -eq 0 if [ $? != 0 ];then echo "memory leak in $delta KB in mode:$mode" ./shutdownall exit 1 fi and from src/test/regression/log/120.memory_leak_extended_memqcache: expr: non-integer argument initial process size: 156900 after size: VSZ delta: ./test.sh: line 59: test: -eq: unary operator expected memory leak in KB in mode:s Since ps command does not see the data of process 62605 any more, the "test $delta -eq 0" command failed and it makes the script believe that there's a memory leak. Probably we should fix test.sh here but I think FATAL error should be fixed in the first place. Once we fix it, there's no need to fix test.sh. So I think we can leave test.sh as it is now. The reason why DISCARD ALL failed on node 0 PostgreSQL is, DISCARD ALL executes inside a transaction block according to the log. However the queries executed by pgbench do not use an explicit transactions. I was puzzled. So I look into PostgreSQL log on node 0. Grepping related process (62624) from the PostgreSQL log shows this: 62624 2026-04-29 01:00:28.288 UTC LOG: execute pgpool62605/pgpool62605: SELECT count(*) FROM pg_catalog.pg_class AS c, pg_catalog.pg_namespace AS n WHERE c.relname = 'pgbench_accounts' AND c.relnamespace = n.oid AND n.nspname ~ '^pg_temp_' 62624 2026-04-29 01:00:29.068 UTC LOG: execute pgpool62605/pgpool62605: SELECT COALESCE(pg_catalog.to_regclass('"pgbench_accounts"')::oid, 0) 62624 2026-04-29 01:00:58.287 UTC LOG: statement: DISCARD ALL 62624 2026-04-29 01:00:58.287 UTC ERROR: DISCARD ALL cannot run inside a transaction block 62624 2026-04-29 01:00:58.287 UTC STATEMENT: DISCARD ALL Those two execute commands were issued by do_query, which is responsible for getting information for the relation cache module. When pgpool runs under extended query mode and outside an explicit transaction, do_query does not issue a sync message because it could break unnamed portal. So in process 62624 on node 0 continues to be in an extended query mode without sending sync. In this case PostgreSQL keeps on opening an implicit transaction until sync message is sent. Why pgpool did not send sync to node 0? Commit a350afd70 allows to send sync messages only to necessary backend nodes. Any queries comes from clients are tracked for this purpose. However it forgot to the case when do_query generates internal queries. I think we should track such that queries, so that a sync message is issued to the server which do_query sent to, when the client sends a sync message. For this purpose I add a new member "bool pending_sync_map[MAX_NUM_BACKENDS]" to session_context, which a node id is set to when do_query sends a extended query to backend. Note that it is set only when do_query is called outside an explicit transaction. In add_sync_pending_message, which is called when client sends a sync message, we check the map and set sync_map so that subsequent call to SimpleForwardToBackend will send sync message to the backend node. Attached patch implements this. Regards, -- Tatsuo Ishii SRA OSS K.K. English: http://www.sraoss.co.jp/index_en/ Japanese:http://www.sraoss.co.jp ----Next_Part(Thu_Apr_30_19_15_09_2026_242)-- Content-Type: Text/X-Patch; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="v1-0001-Fix-to-send-sync-message-to-necessary-backend-nod.patch"