Message-ID: <gh-pgjdbc-pgjdbc-3483-c2599012897@github.com>
From: "james-johnston-thumbtack (@james-johnston-thumbtack)" <noreply+james-johnston-thumbtack@github.com>
To: "pgjdbc/pgjdbc" <noreply+pgjdbc-pgjdbc@github.com>
Date: Fri, 17 Jan 2025 19:01:51 +0000
Subject: Re: [pgjdbc/pgjdbc] issue #3483: Support adaptive fetching without enforcing memory limits, and/or have a separate buffer size for it
In-Reply-To: <gh-pgjdbc-pgjdbc-3483@github.com>
References: <gh-pgjdbc-pgjdbc-3483@github.com>
List-Id: <gh-pgjdbc-pgjdbc.github.com>
X-GitHub-Author-Login: james-johnston-thumbtack
X-GitHub-Comment-Id: 2599012897
X-GitHub-Comment-Type: issue_comment
X-GitHub-Issue: 3483
X-GitHub-Repo: pgjdbc/pgjdbc
X-GitHub-Type: comment
X-GitHub-Url: https://github.com/pgjdbc/pgjdbc/issues/3483#issuecomment-2599012897
Content-Type: text/plain; charset=utf-8

> Even a single json can exceed the limits, so I am not sure we can do much from the client side to make it 100% safe.

Indeed - a single very large JSON could even exceed all physical memory/disk on the client machine.  Not much you can do about that.

But, if the buffer/adaptive fetching configuration was more flexible, then the user would have more control to make it _safer_ (even though it cannot be theoretically proved to be 100% safe).  I could at least tune the adaptive fetching to give some extra safety margin so that the adaptive fetching doesn't so aggressively try to come so close to hard memory limits (whether imposed by pgJDBC in the hard buffer limit discussed in this issue - last row in my table, or imposed by the JVM itself if pgJDBC's buffer limit was disabled - second row in my table).

> I would be nice if the backend could support "fetch at most N rows and stop the fetch after M bytes sent".

I think this would also be a fantastic idea and a good improvement to PostgreSQL, although like Dave, I would imagine it would require changes to the server so you can tell it when to stop sending data.

I would add that one would also need to think about what to do if a single row size exceeds `M`.  In order to make progress, we would need to exceed `M` just for that one row (alternatively, the backend could return an error).  For example, suppose you have a 50 byte limit, and the rows to be retrieved are 20, 20, and 70 bytes.  You first fetch the first two rows for 40 bytes.  Then the second fetch gets the 70 byte row, which does exceed the configured 50 byte limit.  (Or, an error is returned by the server on the second fetch).  Realistically I would imagine the choice could be the user's, e.g. the server will exceed the limit for one row, but the client can still error out if a hard limit was set on the client with `maxResultBuffer`.  (It's kind of like in Kafka clients, where they will temporarily exceed the configured limit in order to make progress if there is a single large message - see [max.partition.fetch.bytes](https://kafka.apache.org/documentation/#consumerconfigs_max.partition.fetch.bytes) as an example in the consumer where they will exceed the fetch size for large messages.  Although do note they also still have a configured max message size, so there is still an upper bound.)

One would also consider that someone may want to only limit by byte size, and not by row count.  For example, maybe you pass `0` rows to [Execute](https://www.postgresql.org/docs/current/protocol-message-formats.html#PROTOCOL-MESSAGE-FORMATS-EXECUTE) but then still request a byte size limit using the proposed mechanism - in this case, PG would fetch as many rows as possible that fit within the byte size limit.