Re: IO related waits - Adrian Klaver

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Adrian Klaver <[email protected]>
To: veem v <[email protected]>
To: Greg Sabino Mullane <[email protected]>
Cc: Christophe Pettus <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: IO related waits
Date: Tue, 17 Sep 2024 16:37:28 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAB+=1TV-b8=RGKVFN0qj5Nxdgbj-JYCNh4ZqFjxxmQPpFD6=PQ@mail.gmail.com>
References: <CAB+=1TWZNvMhVthJ2iKs_Q4qBzMw-v_oaSz7HbFE_P_qC5jMFA@mail.gmail.com>
	<[email protected]>
	<CAB+=1TWdRd2sBw7-vXCovH_VHLANh+aSaU-WyJ2m8tL4TkF=8g@mail.gmail.com>
	<[email protected]>
	<CAB+=1TV-b8=RGKVFN0qj5Nxdgbj-JYCNh4ZqFjxxmQPpFD6=PQ@mail.gmail.com>

On 9/17/24 12:34, veem v wrote:
> 
> On Tue, 17 Sept 2024 at 21:24, Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> 
> 
>     Which means you need to on Flink end:
> 
>     1) Use Flink async I/O .
> 
>     2) Find a client that supports async or fake it by using multiple
>     synchronous clients.
> 
>     On Postgres end there is this:
> 
>     https://www.postgresql.org/docs/current/wal-async-commit.html
>     <https://www.postgresql.org/docs/current/wal-async-commit.html;
> 
>     That will return a success signal to the client quicker if
>     synchronous_commit is set to off. Though the point of the Flink async
>     I/O is not to wait for the response before moving on, so I am not sure
>     how much synchronous_commit = off would help.
> 
> 
>   Got it. So it means their suggestion was to set the asynch_io at flink 
> level but not DB level, so that the application will not wait for the 
> commit response from the database. But in that case , won't it overload 
> the DB with more and more requests if database will keep doing the 
> commit ( with synchronous_commit=ON)  and waiting for getting the 
> response back from its storage for the WAL's to be flushed to the disk, 
> while the application will not wait for its response back(for those 
> inserts) and keep flooding the database with more and more incoming 
> Insert requests?

My point is this is a multi-layer cake with layers:

1) Flink asycnc io

2) Database client async/sync

3) Postgres sync status.

That is a lot of moving parts and determining whether it is suitable is 
going to require rigorous testing over a representative data load.


See more below.

> 
> Additionally as I mentioned before, we see that from "pg_stat_database" 
> from the column "xact_commit" , it's almost matching with the sum of 
> "tup_inserted", "tup_updated", "tup_deleted" column. And also we 
> verified in pg_stats_statements the  "calls" column is same as the 
> "rows" column for the INSERT queries, so it means also we are inserting 
> exactly same number of rows as the number of DB calls, so doesn't it 
> suggest that we are doing row by row operations/dmls.
> 
> Also after seeing above and asking application team to do the batch 
> commit ,we are still seeing the similar figures from pg_stat_database 
> and pg_stat_statements, so does it mean that we are looking into wrong 
> stats? or the application code change has not been done accurately? and 
> we see even when no inserts are running from the application side, we do 
> see "xact_commit" keep increasing along with "tup_fetched" , why so?
> 
> Finally we see in postgres here, even if we just write a DML statement 
> it does commit that by default, until we explicitly put it in a 
> "begin... end" block. Can that be the difference between how a "commit" 
> gets handled in postgres vs other databases?

It does if autocommit is set in the client, that is common to other 
databases also:

https://dev.mysql.com/doc/refman/8.4/en/commit.html

https://docs.oracle.com/en/database/oracle/developer-tools-for-vscode/getting-started/disabling-and-...

https://learn.microsoft.com/en-us/sql/t-sql/statements/set-implicit-transactions-transact-sql?view=s...

You probably need to take a closer look at the client/driver you are 
using and the code that interacting with it.

In fact I would say you need to review the entire data transfer process 
to see if there are performance gains that can be obtained without 
adding an entirely new async component.

> 
> 

-- 
Adrian Klaver
[email protected]

view thread (2+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: IO related waits
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox