[pgjdbc/pgjdbc] issue #3483: Support adaptive fetching without enforcing memory limits, and/or have a separate buffer size for it - james-johnston-thumbtack (@james-johnston-thumbtack)

pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed

From: james-johnston-thumbtack (@james-johnston-thumbtack) <[email protected]>
To: pgjdbc/pgjdbc <[email protected]>
Subject: [pgjdbc/pgjdbc] issue #3483: Support adaptive fetching without enforcing memory limits, and/or have a separate buffer size for it
Date: Fri, 17 Jan 2025 09:08:04 +0000
Message-ID: <[email protected]> (raw)

The `maxResultBuffer` configuration parameter effectively performs two roles:
- **Hard memory buffer limit**: Enforce a hard maximum on the result set memory buffer: if the server returns more rows than can fit in the buffer, then an exception is thrown as implemented at https://github.com/pgjdbc/pgjdbc/blob/032d0e225a91e866d7dae680ebb784507392e803/pgjdbc/src/main/java/...
- **Adaptive caching memory target**: Act as a target for the adaptive fetching to try to aim for - it is used in the calculation of the next fetch size by dividing buffer limit by max row size seen so far, as implemented at https://github.com/pgjdbc/pgjdbc/blob/032d0e225a91e866d7dae680ebb784507392e803/pgjdbc/src/main/java/...

Unfortunately, I don't see a way to decouple these behaviors.  Thus, there is a possible failure mode: suppose the first few batches of rows have small row sizes.  But then another batch arrives with larger rows such that _the average row size is larger than the maximum row size seen so far_.  In this case, the new batch won't fit in the buffer, and the exception would be thrown.  A hypothetical example data set of where this might fail is if there is a JSON column, rows are sorted by creation time, older rows have empty JSON, but then the application starts filling out a fat JSON blob in every newer row.  Due to this, I'm a little reluctant to turn it on in production (or invest significant time testing), because I know we have some data sets like this and I'm not optimistic.

If these behaviors could be independently configured and decoupled, I think it would be quite useful.  We can imagine there could be a number of different possibilities when decoupled:

| Hard memory buffer limit | Adaptive fetching memory target | Comment |
|-|-|-|
| Disabled | Disabled | Default behavior of driver when nothing configured (fixed fetch size and no buffer limit) |
| Disabled | Enabled | Adaptive fetching tries to hit a memory target, but a temporary excursion won't encounter any hard memory limit enforced by the driver.  No way to configure this today. |
| Enabled | Disabled | Fixed fetch size (no adaptive fetching), and a hard buffer limit enforced resulting in exception |
| Enabled | Enabled, and equal to hard memory buffer limit | Adaptive fetching, and hard memory limit enforced by driver: this is the status quo and only way to enable adaptive fetching today, which is more vulnerable to the above-described problem |
| Enabled | Enabled, and smaller than hard memory buffer limit | Adaptive fetching is enabled, and tries to hit a buffer size target smaller than the enforced limit.  For example, I could configure a 100 MB hard limit, and a 60 MB adaptive fetching target.  If the average row size in a fetch exceeds the maximum row size so far, there's still a 40 MB margin of safety before exceptions would be thrown. |

The final possibility in this list I think is the most interesting, but maybe other users would like other options in this table to tune and customize this buffer and fetching behavior.  I could see the option of enabling adaptive fetching and disabling the hard memory limit to also be of interest.

(I do know of `adaptiveFetchMaximum`, but that value is not... adaptive... the appropriate max value to avoid encountering the buffer limit is hard to generically set in advance if the row size is not known.)

(One other final, somewhat unrelated thought.... I have wondered why the ResultSet buffering is really necessary in the first place... couldn't the `ResultSet.next` function stream the rows directly from the TCP socket without significant buffering?  This would avoid the whole issue of buffer size management in the first place.)

view thread (20+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: github://pgjdbc/pgjdbc
  Cc: [email protected], [email protected]
  Subject: Re: [pgjdbc/pgjdbc] issue #3483: Support adaptive fetching without enforcing memory limits, and/or have a separate buffer size for it
  In-Reply-To: <<[email protected]>>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox