Message-ID: From: "zkorhone (@zkorhone)" To: "pgjdbc/pgjdbc" Date: Mon, 20 Nov 2023 11:52:19 +0000 Subject: Re: [pgjdbc/pgjdbc] issue #2963: BufferedOutputStream in PGStream should be replaced to something more efficient In-Reply-To: References: List-Id: X-GitHub-Author-Login: zkorhone X-GitHub-Comment-Id: 1818909689 X-GitHub-Comment-Type: issue_comment X-GitHub-Edited-At: 2023-11-20T14:37:20Z X-GitHub-Issue: 2963 X-GitHub-Repo: pgjdbc/pgjdbc X-GitHub-Type: comment X-GitHub-Url: https://github.com/pgjdbc/pgjdbc/issues/2963#issuecomment-1818909689 Content-Type: text/plain; charset=utf-8 I did run some microbenchmarks for you: [No threads] OutputStream: 150.25ms [ -56 ] [No threads] BufferedOutputStream(OutputStream): 155.564ms [ -56 ] [No threads] FilteringOutputStream(BufferedOutputStream(OutputStream)): 3919.915ms [ -56 ] [Threads] OutputStream: 22.315ms [ -56 ] [Threads] BufferedOutputStream(OutputStream): 21.27ms [ -56 ] [Threads] FilteringOutputStream(BufferedOutputStream(OutputStream)): 626.437ms [ -56 ] [Threads+Pool] OutputStream: 21.532ms [ -56 ] [Threads+Pool] BufferedOutputStream(OutputStream): 27.802ms [ -56 ] [Threads+Pool] FilteringOutputStream(BufferedOutputStream(OutputStream)): 613.602ms [ -56 ] In above results: * No threads - no threads were used * Threads - a thread pool equal to half of size of available cores was used * Threads+Pool - a thread pool equal to half of size of available cores was used and resource pooling was used * OutputStream - data is written directly to target OutputStream * BufferedOutputStream(OutputStream) - data is written via BufferedOutputStream to target OutputStream * FilteringOutputStream(BufferedOutputStream(OutputStream)) - data is written via FilteringOutputStream to BufferedOutputStream and finally to target OutputStream Note: Resource pooling is my guess on how postgres driver to perform when using connection pooling. I did this because in theory resource pooling could have impact on how HotSpot optimizes locks (lock elision). There's no way really to guarantee that my simulation is correct. In results execution time is total execution time for test. I have 16 cpu cores, which in test results to 8 threads being used for running a threaded test. This explains why for threaded tests execution time is ~1/8 of single threaded tests. Based on these results I'd suggest replacing FilteringOutputStream with custom OutputStream. I also tried with custom version of BufferedOutputStream that doesn't implement locks. There are some gains (< 1%), but I wouldn't say they are significant enough to warrant custom implementation. [microbenchmark.java.txt](https://github.com/pgjdbc/pgjdbc/files/13413623/microbenchmark.java.txt) ``` static class NoLockBufferedOutputStream extends OutputStream { private final OutputStream dst; private final byte buffer[]; private int length; public NoLockBufferedOutputStream(OutputStream dst) { this.dst = dst; this.buffer = new byte[8192]; this.length = 0; } @Override public void write(int b) throws IOException { if (length == buffer.length) { flushBuffer(); } buffer[length++] = (byte)b; if (length >= buffer.length) { flushBuffer(); } } @Override public void write(byte[] b, int off, int len) throws IOException { int capacityAfter = buffer.length - length - len; if (capacityAfter < 0) { int toCopy = buffer.length - length; appendToBuffer(b, off, toCopy); flushBuffer(); off += toCopy; len -= toCopy; if (len >= buffer.length) { // more than our buffer dst.write(b, off, len); } else { appendToBuffer(b, off, len); } } else { appendToBuffer(b, off, len); } } private void appendToBuffer(byte[] src, int off, int toCopy) { System.arraycopy(src, off, buffer, length, toCopy); length += toCopy; } private void flushBuffer() throws IOException { try { dst.write(buffer, 0, length); } finally { length = 0; } } @Override public void flush() throws IOException { flushBuffer(); dst.flush(); } @Override public void close() throws IOException { dst.close(); } } ```