Re: [pgjdbc/pgjdbc] issue #194: PgJDBC can experience client/server deadlocks during batch execution

pgjdbc/pgjdbc GitHub issues and pull requests (mirror)  
help / color / mirror / Atom feed

From: ringerc (@ringerc) <[email protected]>
To: pgjdbc/pgjdbc <[email protected]>
Subject: Re: [pgjdbc/pgjdbc] issue #194: PgJDBC can experience client/server deadlocks during batch execution
Date: Thu, 02 Oct 2014 13:53:25 +0000
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

An alternative to completely changing the data exchange mechanism is to instead get PgJDBC to manage its _send buffer_ properly. PgJDBC currently ignores its send buffer and tries to manage the server's buffer. This is backwards.

The only buffer PgJDBC can completely control is its own send buffer. So what we really need to do is avoid blocking on writing to that if we know that there's already a pending query response. (If there's no pending query it's fine to block, the server will continue consuming our input even if there's an error).

# Using Non-blocking reads/writes with java.nio streams?

Java doesn't expose any API to query the available space in the TCP send buffer, and there's no portable way to query it from the underlying platform. You need Linux-specific hacks like SIOCOUTQ. 

In java.nio (since Java 1.4) there's now the option of creating a non-blocking `Channel`, then wrapping that up as a stream socket with the `Channels` class, providing a stream-socket-compatible API. Writes to the output stream on such a socket throw [`IllegalBlockingModeException`](http://docs.oracle.com/javase/7/docs/api/java/nio/channels/IllegalBlockingModeException.html) instead of blocking. So we can potentially just `write(byte[])` whole messages until we get an exception, then consume input from the socket. However, non-blocking reads are a problem with SSL, so we might just block on the read side and get deadlocked there, with the server waiting for us to send more data, and us waiting for the server to send more data. 

We  could guarantee that it's safe to read from the receive stream by forcing the server to send more data by writing a `Sync` message. However, we can't guarantee that there's enough space in the send buffer to write that after an `IllegalBlockingModeException`. Writing the `Sync` message might just fail too, leaving us unable to be sure if it's safe to read from the input channel.

Even if we solved the SSL issue and got a guaranteed non-blocking input stream too, we'd have to muck around with a control loop that select()s the next readable/writeable socket and pipelines more data. This is complicated by the fact that the output socket might still be writable, just not with the message size we want. So doing this with a non-blocking approach would require a pretty major change to the driver.

# Writing up to the send buffer size, then syncing and flushing

Instead, we can just avoid blocking on the socket by never filling the send buffer without ending it in a `Sync` message. If PgJDBC avoids filling its send buffer when there's one or more queries already pipelined without forcing a sync + flush and consuming input before continuing, it can know for certain that the send buffer is empty and the server's send buffer is empty (or nearly so; there could be asynchronous messages/notifications sent after the Sync). It then knows it's safe to write up to the buffer size again before switching to consuming input.

This is deadlock-proof, but greatly limits the number of big queries that PgJDBC can pipeline in a batch. Currently with an assumed 250 byte reply and 64k buffer PgJDBC assumes it can safely pipeline 64000 / 250 = 256 queries before needing to sync and consume input.

If we instead use the real send buffer size on a typical system, as determined by poking in the driver's guts reflectively, e.g.:

```
                // We must use AbstractJdbc2Connection directly, as that's the declaring class
                Field pgProtoConnField = AbstractJdbc2Connection.class.getDeclaredField("protoConnection");
                pgProtoConnField.setAccessible(true);

                ProtocolConnection pc = (ProtocolConnection)pgProtoConnField.get(conn); 
                Field pgstreamField = pc.getClass().getDeclaredField("pgStream");
                pgstreamField.setAccessible(true);

                PGStream pgs = (PGStream) pgstreamField.get(pc);
                Socket s = pgs.getSocket();
                System.err.println("PgJDBC send buffer size is: " + s.getSendBufferSize());   
```

I can see that my default is `1313280` (bytes), i.e. 1.2MB. So that's the most we could queue up in a sub-batch with total safety unless we have non-blocking writes and reads or separate reader/writer threads. If each query sends 100kb of data it's 12 queries, but if each query sends a more reasonable 5kb it's 250 queries - the current limit. For small queries like 500 byte ones, we can queue up 2500, much more than our current limit.

That's still a lot of sanely sized queries, and bigger queries will be less affected by round trip costs anyway. So we should consider moving deadlock prevention logic from attempting to control the server's send buffer to trying to control our own send buffer. That's much safer, and lets us safely batch prepared statements that request generated keys.

view thread (36+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: github://pgjdbc/pgjdbc
  Cc: [email protected], [email protected]
  Subject: Re: [pgjdbc/pgjdbc] issue #194: PgJDBC can experience client/server deadlocks during batch execution
  In-Reply-To: <<[email protected]>>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox