public inbox for [email protected]help / color / mirror / Atom feed
IO related waits 7+ messages / 4 participants [nested] [flat]
* IO related waits @ 2024-09-16 20:24 veem v <[email protected]> 0 siblings, 2 replies; 7+ messages in thread From: veem v @ 2024-09-16 20:24 UTC (permalink / raw) To: pgsql-general <[email protected]> Hi, One of our application using RDS postgres. In one of our streaming applications(using flink) which processes 100's of millions of transactions each day, we are using row by row transaction processing for inserting data into the postgres database and commit is performed for each row. We are seeing heavy IO:XactSynch wait events during the data load and also high overall response time. Architecture team is suggesting to enable asynch io if possible, so that the streaming client will not wait for the commit confirmation from the database. So I want to understand , how asynch io can be enabled and if any downsides of doing this? Regards Veem ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: IO related waits @ 2024-09-16 20:28 Christophe Pettus <[email protected]> parent: veem v <[email protected]> 1 sibling, 0 replies; 7+ messages in thread From: Christophe Pettus @ 2024-09-16 20:28 UTC (permalink / raw) To: veem v <[email protected]>; +Cc: pgsql-general <[email protected]> > On Sep 16, 2024, at 13:24, veem v <[email protected]> wrote: > Architecture team is suggesting to enable asynch io if possible, so that the streaming client will not wait for the commit confirmation from the database. So I want to understand , how asynch io can be enabled and if any downsides of doing this? "Async I/O" has a specific meaning that's not currently applicable to PostgreSQL. What is available is "synchronous_commit". This setting is by default on. When it's on, each commit waits until the associated WAL information has been flushed to disk, and then returns. If it is turned off, the commit returns more or less immediately, and the WAL flush happens asynchronously from the commit. The upside is that the session can proceed without waiting for the WAL flush. The downside is that on a server crash, some transactions may not have been fully committed to the database, and will be missing when the database restarts. The database won't be corrupted (as in, you try to use it and get errors), but it will be "earlier in time" than the application might expect. It's pretty common to turn it off for high-ingestion-rate situations, especially where the application can detect and replay missing transactions on a crash. ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: IO related waits @ 2024-09-16 22:11 Adrian Klaver <[email protected]> parent: veem v <[email protected]> 1 sibling, 1 reply; 7+ messages in thread From: Adrian Klaver @ 2024-09-16 22:11 UTC (permalink / raw) To: veem v <[email protected]>; pgsql-general <[email protected]> On 9/16/24 13:24, veem v wrote: > Hi, > One of our application using RDS postgres. In one of our streaming > applications(using flink) which processes 100's of millions of > transactions each day, we are using row by row transaction processing > for inserting data into the postgres database and commit is performed > for each row. We are seeing heavy IO:XactSynch wait events during the > data load and also high overall response time. > > Architecture team is suggesting to enable asynch io if possible, so that > the streaming client will not wait for the commit confirmation from the > database. So I want to understand , how asynch io can be enabled and if > any downsides of doing this? Are you referring to this?: https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/ If not then you will need to be more specific. > > Regards > Veem -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: IO related waits @ 2024-09-17 03:55 veem v <[email protected]> parent: Adrian Klaver <[email protected]> 0 siblings, 2 replies; 7+ messages in thread From: veem v @ 2024-09-17 03:55 UTC (permalink / raw) To: Adrian Klaver <[email protected]>; Christophe Pettus <[email protected]>; +Cc: pgsql-general <[email protected]> On Tue, 17 Sept 2024 at 03:41, Adrian Klaver <[email protected]> wrote: > > Are you referring to this?: > > > https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/ > > If not then you will need to be more specific. > > Yes, I was referring to this one. So what can be the caveats in this approach, considering transactions meant to be ACID compliant as financial transactions.Additionally I was not aware of the parameter "synchronous_commit" in DB side which will mimic the synchronous commit. Would both of these mimic the same asynchronous behaviour and achieves the same, which means the client data load throughput will increase because the DB will not wait for those data to be written to the WAL and give a confirmation back to the client and also the client will not wait for the DB to give a confirmation back on the data to be persisted in the DB or not?. Also, as in the backend the flushing of the WAL to the disk has to happen anyway(just that it will be delayed now), so can this method cause contention in the database storage side if the speed in which the data gets ingested from the client is not getting written to the disk , and if it can someway impact the data consistency for the read queries? ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: IO related waits @ 2024-09-17 13:13 Greg Sabino Mullane <[email protected]> parent: veem v <[email protected]> 1 sibling, 1 reply; 7+ messages in thread From: Greg Sabino Mullane @ 2024-09-17 13:13 UTC (permalink / raw) To: veem v <[email protected]>; +Cc: Adrian Klaver <[email protected]>; Christophe Pettus <[email protected]>; pgsql-general <[email protected]> On Mon, Sep 16, 2024 at 11:56 PM veem v <[email protected]> wrote: > So what can be the caveats in this approach, considering transactions > meant to be ACID compliant as financial transactions. > Financial transactions need to be handled with care. Only you know your business requirements, but as Christophe pointed out, disabling synchronous commit means your application may think a particular transaction has completed when it has not. Usually that's a big red flag for financial applications. we are using row by row transaction processing for inserting data into the > postgres database and commit is performed for each row. This is a better place to optimize. Batch many rows per transaction. Remove unused indexes. flushing of the WAL to the disk has to happen anyway(just that it will be > delayed now), so can this method cause contention in the database storage > side if the speed in which the data gets ingested from the client is not > getting written to the disk , and if it can someway impact the data > consistency for the read queries? > Not quite clear what you are asking here re data consistency. The data will always be consistent, even if synchronous_commit is disabled. The only danger window is on a server crash. (Keep in mind that RDS is not Postgres, so take tuning recommendations and advice with a grain of salt.) Cheers, Greg ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: IO related waits @ 2024-09-17 15:54 veem v <[email protected]> parent: Greg Sabino Mullane <[email protected]> 0 siblings, 0 replies; 7+ messages in thread From: veem v @ 2024-09-17 15:54 UTC (permalink / raw) To: Greg Sabino Mullane <[email protected]>; +Cc: Adrian Klaver <[email protected]>; Christophe Pettus <[email protected]>; pgsql-general <[email protected]> On Tue, 17 Sept 2024 at 18:43, Greg Sabino Mullane <[email protected]> wrote: > > This is a better place to optimize. Batch many rows per transaction. > Remove unused indexes. > > flushing of the WAL to the disk has to happen anyway(just that it will be >> delayed now), so can this method cause contention in the database storage >> side if the speed in which the data gets ingested from the client is not >> getting written to the disk , and if it can someway impact the data >> consistency for the read queries? >> > > Not quite clear what you are asking here re data consistency. The data > will always be consistent, even if synchronous_commit is disabled. The only > danger window is on a server crash. > > (Keep in mind that RDS is not Postgres, so take tuning recommendations and > advice with a grain of salt.) > > > Thank you Greg. Yes, our Java application was doing row by row commit and we saw that from pg_stat_database from the column "xact_commit" which was closely the same as the sum of tup_inserted, tup_updated, tup_deleted column. And also we verified in pg_stats_statements the number against the "calls" column were matching to the "rows" column for the INSERT queries, so it means also we are inserting exactly same number of rows as the number of DB calls, so it also suggest that we are doing row by row operations/dmls. And we then asked the application tema to make the inserts in batches, but still seeing those figures in these above two views are not changing much the number "xact_commit" is staying almost same and also even the "calls" and the "rows" column in pg_stats_statements also staying almost same. So does it mean that the application code is somehow still doing the same row by row processing or we are doing something wrong in the above analysis? And another thing we noticed , even after the data load finished , even then the "xact_commit" was keep increasing along with "tup_fetched", so does it mean that its doing some kind of implicit commit even for the fetch type queries which must be "select" queries i believe? Also not sure if its expected, but here in postgres i have seen unless we put a code within begin and end block , it's default gets committed even we just run it on the console , it doesn't ask for a explicit commit/rollback ,so not sure if that is someway playing a role here. Regards Veem > > ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: IO related waits @ 2024-09-17 15:54 Adrian Klaver <[email protected]> parent: veem v <[email protected]> 1 sibling, 0 replies; 7+ messages in thread From: Adrian Klaver @ 2024-09-17 15:54 UTC (permalink / raw) To: veem v <[email protected]>; Christophe Pettus <[email protected]>; +Cc: pgsql-general <[email protected]> On 9/16/24 20:55, veem v wrote: > > > On Tue, 17 Sept 2024 at 03:41, Adrian Klaver <[email protected] > <mailto:[email protected]>> wrote: > > > Are you referring to this?: > > https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/ <https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/datastream/operators/asyncio/; > > If not then you will need to be more specific. > > > Yes, I was referring to this one. So what can be the caveats in this > approach, considering transactions meant to be ACID compliant as > financial transactions.Additionally I was not aware of the parameter > "synchronous_commit" in DB side which will mimic the synchronous commit. > > Would both of these mimic the same asynchronous behaviour and achieves > the same, which means the client data load throughput will increase > because the DB will not wait for those data to be written to the WAL and > give a confirmation back to the client and also the client will not wait > for the DB to give a confirmation back on the data to be persisted in > the DB or not?. Also, as in the backend the flushing of the WAL to the > disk has to happen anyway(just that it will be delayed now), so can this > method cause contention in the database storage side if the speed in > which the data gets ingested from the client is not getting written to > the disk , and if it can someway impact the data consistency for the > read queries? This is not something that I am that familiar with. I suspect though this is more complicated then you think. From the link above: " Prerequisites # As illustrated in the section above, implementing proper asynchronous I/O to a database (or key/value store) requires a client to that database that supports asynchronous requests. Many popular databases offer such a client. In the absence of such a client, one can try and turn a synchronous client into a limited concurrent client by creating multiple clients and handling the synchronous calls with a thread pool. However, this approach is usually less efficient than a proper asynchronous client. " Which means you need to on Flink end: 1) Use Flink async I/O . 2) Find a client that supports async or fake it by using multiple synchronous clients. On Postgres end there is this: https://www.postgresql.org/docs/current/wal-async-commit.html That will return a success signal to the client quicker if synchronous_commit is set to off. Though the point of the Flink async I/O is not to wait for the response before moving on, so I am not sure how much synchronous_commit = off would help. -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 7+ messages in thread
end of thread, other threads:[~2024-09-17 15:54 UTC | newest] Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-09-16 20:24 IO related waits veem v <[email protected]> 2024-09-16 20:28 ` Christophe Pettus <[email protected]> 2024-09-16 22:11 ` Adrian Klaver <[email protected]> 2024-09-17 03:55 ` veem v <[email protected]> 2024-09-17 13:13 ` Greg Sabino Mullane <[email protected]> 2024-09-17 15:54 ` veem v <[email protected]> 2024-09-17 15:54 ` Adrian Klaver <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox