Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2ltV-000abz-1l for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Mar 2026 08:04:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1w2lna-008XyD-1p for pgsql-hackers@arkaria.postgresql.org; Wed, 18 Mar 2026 07:58:38 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1w2lna-008Xy5-0v for pgsql-hackers@lists.postgresql.org; Wed, 18 Mar 2026 07:58:38 +0000 Received: from mail-pg1-x536.google.com ([2607:f8b0:4864:20::536]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1w2lnX-00000000tVf-2N1O for pgsql-hackers@lists.postgresql.org; Wed, 18 Mar 2026 07:58:38 +0000 Received: by mail-pg1-x536.google.com with SMTP id 41be03b00d2f7-c73f107789cso1224464a12.1 for ; Wed, 18 Mar 2026 00:58:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1773820713; x=1774425513; darn=lists.postgresql.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=G8IMrcLOjXeSe7kPB4yzV3GcfZke06ePqjbih+lwANU=; b=REEBtDsF3JghlGnm/Zy93GDv6C9TTyRuyCHRxMbWJ6aqerDUv4Fskzx54f2i631Dmg iNNJKsI2PPm8AIIRNsM+nyTGW8jbuqvdyy3Ls48TlMNfE4yxuHGTZ5VFhlFLOIPwarZ6 a44VJ4htilx6h+zc6LGsjCec/iYIIqYKTPnbMrinNkRd7SLnxduyu/MCVhGQTBEIQVrV KzEM1LKK+0IGaeTxUs47dVquoiPRqAe3QM2nfEcmBb/YbNOMtXeYBq6KTRr77764chtg 1iX+tj61BiHDmN673E/YlGlOUhJT7XOAqFxRsshWX2cw1wS1Tou+aXzom+RVLLTT49LA 9Dow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1773820713; x=1774425513; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=G8IMrcLOjXeSe7kPB4yzV3GcfZke06ePqjbih+lwANU=; b=Qzgrc+OaEgvApdq+11B56VVer3lzb1iEd2w6QXTVQbfGyPFk0hNnofo0Nv9Qyy23lG 3drwYc8NbFvAPQk1UjtTuvJDkRt3FlP0DXmtafXA0tsSjAh9yacK/TuBUgyn2jLFzRlv Cc3YJ6BNFgly6MnprYGPAyR4tZLEN++QVfBvyIvhHm/MEpi4MdEnvvFQ1nGEpKswcmMX C8gmkJKWfNc90DDnsXHWwGfi0+1CqIh/84xdXEbOQ6nyn5uMMrfr7k+MQaMp0DCyprf5 4a/1tKTic/doL9IOXDc8EkOAfvLucJUGwmL3ygSDN8WM1vpVeK0wuqofiRA6xkkPu2sd sNew== X-Gm-Message-State: AOJu0Yy85ikTbV9hGDvCH2voe0ce0rxw0iKg7g7t8z19wJMGY++hGwAX KqVOtsV+dvRNCxFyO5Dsw8ZJtpkA2XbZq1UtLMDXL7bTaMMDSqsyzAMG X-Gm-Gg: ATEYQzw5Wj5ay3dLyWjyUIJmlZh/tUicw4kBN9tTETOCOB6MYbPnLxscwnOm0jK0gQ9 WsECst+st/2pjL8+vHHCb0rIOMdK8r/mPm6j0Y1eWEx96kmQCygBLRJtWNWDUHZ2+jaUCObThk7 eebG/gj6cOxmARS7HdEJxYGO/iLQawcbsfEefykjbWqEtwXup+BA6oSP16Ojeo0E+wFwzcb/rYs iJIMQarIYJu24hJ/5k6VlvXjyGYVLeIAbT0zufxfuwp3x/V4e2rc2Pzfx6qbjqwe15ZRWMJnn1X iZ7Up7uD4qBpVId5rmJbhWtb0CEotbAhNIPCTzhrr/O47Ukp7kpq3ptk54LSWny3E8QK9TuZChR /tUSb/+MiuG8TWUF0bT80aWW1uaYWc0qfRJ/Xei/zesoHhXUHa1jMRiS2JjlxS3pWoWZJ2SHvuR UrQmNfPktwy2U4CAYSuE4R47anD/swNmKl57P/5nYK7Q== X-Received: by 2002:a05:6a21:778a:b0:39b:a3d8:d81c with SMTP id adf61e73a8af0-39ba3d8e864mr799351637.61.1773820713426; Wed, 18 Mar 2026 00:58:33 -0700 (PDT) Received: from smtpclient.apple ([45.32.121.103]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-c741e583626sm1562231a12.30.2026.03.18.00.58.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 18 Mar 2026 00:58:32 -0700 (PDT) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3864.400.21\)) Subject: Re: Avoiding memory leakage in jsonpath evaluation From: Chao Li In-Reply-To: Date: Wed, 18 Mar 2026 15:57:53 +0800 Cc: pgsql-hackers@lists.postgresql.org Content-Transfer-Encoding: quoted-printable Message-Id: References: <569394.1773783211@sss.pgh.pa.us> To: Tom Lane X-Mailer: Apple Mail (2.3864.400.21) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk > On Mar 18, 2026, at 15:52, Chao Li wrote: >=20 >=20 >=20 >> On Mar 18, 2026, at 05:33, Tom Lane wrote: >>=20 >> I got an off-list report that a query like this consumes >> an unreasonable amount of memory: >>=20 >> SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM = generate_series(1,10000) i), >> '$[*] ? (@ < $)'); >>=20 >> For me, that eats about 6GB by the time it's done executing. >> If that doesn't seem like a lot to you, just add another zero to the >> generate_series call, and then it'll be more like 600GB, because the >> leakage is O(N^2). >>=20 >> Admittedly, this isn't an especially useful query: its runtime is >> also O(N^2), because that path expression basically requires us to >> compare every element of the input JSON array to every other element. >> But it's not cool that it leaks so much memory while at it. >>=20 >> I poked into this and found that the leakage is entirely composed of >> "JsonValueList"s that are built during path evaluation and then just >> left to rot until the end of jsonb_path_query(). We can fix it by >> being careful to free those lists on the way out of each jsonpath >> evaluation function that creates one. However, just doing that would >> mean adding pfree overhead on top of palloc overhead, so I went a bit >> further and reimplemented JsonValueList to be more compact and = cheaper >> to allocate/free. The attached seems to be a bit faster than the >> existing code as well as not leaking so much memory. See the draft >> commit message for more details. >>=20 >> regards, tom lane >=20 > This patch looks like a big win. It not only saves memory, but also = makes the query much faster. >=20 > I tested the query on my MacBook M4, increasing the iteration count = from 10000 to 50000. >=20 > Current master (3b4c2b9db25): > ``` > evantest=3D# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM = generate_series(1,50000) i), '$[*] ? (@ < $)'); > Time: 208581.771 ms (03:28.582) > evantest=3D# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM = generate_series(1,50000) i), '$[*] ? (@ < $)'); > Time: 217269.595 ms (03:37.270) > ``` >=20 > With the patch: > ``` > evantest=3D# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM = generate_series(1,50000) i), '$[*] ? (@ < $)'); > Time: 18674.580 ms (00:18.675) > evantest=3D# SELECT jsonb_path_query((SELECT jsonb_agg(i) FROM = generate_series(1,50000) i), '$[*] ? (@ < $)'); > Time: 18889.329 ms (00:18.889) > ``` >=20 > My observations were: >=20 > * Before the patch, the backend process memory usage fluctuated = between roughly 50GB and 145GB, while CPU usage stayed around 30%. > * With the patch, the backend process memory usage stayed stable at = around 30MB, while CPU usage stayed around 100%. >=20 > After reviewing the patch, I thought JsonValueListLength() might be = worth optimizing, since it is O(n). I tried adding an ntotal_items field = to JsonValueList to track the total number of items, similar to the last = pointer that is only meaningful in the base chunk. But that did not help = in my test, and I realized JsonValueListLength() is not on the hottest = path, so I dropped that idea. >=20 > =46rom the MacOS Instruments tool, the most expensive parts seem to be = fillJsonbValue, JsonbIteratorNext, cmp_var_common, and cmp_numerics. But = those look like separate topics. >=20 > Overall, this looks like a solid patch. >=20 Forgot to mention that, to run the tests, I turned off debug and = assertion, and compiled with -O2. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/