Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ssavX-009S8Q-8I for pgsql-general@arkaria.postgresql.org; Mon, 23 Sep 2024 04:44:00 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ssavW-0014Wr-8E for pgsql-general@arkaria.postgresql.org; Mon, 23 Sep 2024 04:43:58 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ssavV-0014Wi-1M for pgsql-general@lists.postgresql.org; Mon, 23 Sep 2024 04:43:57 +0000 Received: from omr000.pc5.atmailcloud.com ([103.150.252.0]) by makus.postgresql.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ssavR-000aLC-CY for pgsql-general@lists.postgresql.org; Mon, 23 Sep 2024 04:43:55 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=tpg.com.au; s=202309; h=MIME-Version:Content-Type:Date:To:From:Subject:Message-ID; bh=t3vI2hj3hru8W4wPvHsxYRMA2uXfMdNiXPCJv8XvEU0=; b=NjqPPHkSlJhfAKp3aLytUsmAmb 0stkjowHX/Yr9wRxU2ZLa+ccfuf+qXRb6E6ezY60oxC1iy18KADLiclYDPGgf6Q+co0sruC1oUytS A6gxgNNAD/i1LVesM21yIKB0DQyR/jMxTr7oHbfvNbgjzzECQWvhotO5vF79sxBCbylDkMANpIuER +/1eAZKgR0BOLB17+BCCM2fKhc62SDLER7A0MulaGahzEkuJcf5V/QMKVHRKYLVPgsswfoh2Jg30j 5GyfCONOAE99m/kMmXB12hdOQQxplB38mE0Dzqly4pKnELJN0J+Q7P1cQppiUIH0ixSp5jqh/vKIs cM9kB+wg==; Received: from CMR-KAKADU02.i-065a418d7a6cc0c37 by OMR.i-011229ae50fa0cd71 with esmtps (envelope-from ) id 1ssavN-000000000QF-1mJC; Mon, 23 Sep 2024 04:43:49 +0000 Received: from [203.214.58.171] (helo=[192.168.1.101]) by CMR-KAKADU02.i-065a418d7a6cc0c37 with esmtpsa (envelope-from ) id 1ssavN-000000000uQ-023C; Mon, 23 Sep 2024 04:43:49 +0000 Message-ID: Subject: Re: IO related waits From: rob stone To: Adrian Klaver , veem v , pgsql-general@lists.postgresql.org Date: Mon, 23 Sep 2024 14:43:46 +1000 In-Reply-To: <55697e68-ee97-4571-9b71-c91a9373d40b@aklaver.com> References: <586d988d-978b-4b08-84b4-ef3ebb8736a8@aklaver.com> <4cfe720e-4283-43fc-8d72-8fa11e2f741a@aklaver.com> <20240920221718.ctq5whqfyqlrjwr2@hjp.at> <55697e68-ee97-4571-9b71-c91a9373d40b@aklaver.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.54.0-1 MIME-Version: 1.0 X-Atmail-Id: floriparob@tpg.com.au X-atmailcloud-spam-action: no action X-Cm-Analysis: v=2.4 cv=Uq5lNfwB c=1 sm=1 tr=0 ts=66f0f205 a=ukEzZ6GA2DQNk4vPvkOeLw==:117 a=ukEzZ6GA2DQNk4vPvkOeLw==:17 a=IkcTkHD0fZMA:10 a=EaEq8P2WXUwA:10 a=FVG_Qj_OW4wA:10 a=x7bEGLp0ZPQA:10 a=Hig9X_WtUzFAWa6L_sMA:9 a=QEXdDO2ut3YA:10 X-Cm-Envelope: MS4xfLzT30pZAmBN28j1ohPR4uOf0rfB2RA++tG4Vyo0w6fE1BcpfGUhXedY+zOTwRmpGnKsUWI+gX0Xt17bn+VnoXLZ18AqVUgrHeKrpOwW5BFJ5Ridt6MC p50hlOyPqbnyAnMvbU1cfVwMhE7iug5+Q6gFfHzG1hY4vFOVTe14vyg7Q4NTpb/guur1Z71/0tNogQ== X-atmailcloud-route: unknown List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello, On Sat, 2024-09-21 at 10:20 -0700, Adrian Klaver wrote: > On 9/21/24 02:36, veem v wrote: > >=20 > >=20 >=20 > >=20 > > Actually here the application is using kafka and=C2=A0 flink stream and > > is=20 > > using one of existing code in=C2=A0which it was doing row by row=C2=A0c= ommit > > which=20 > > is now changed to Batch commit i.e. the commit point is shifted > > from row=20 > > by row to batch now. There are multiple sessions spawning=C2=A0at the > > same=20 > > time to process the incoming messages 24/7. And also as described > > in=20 > > another ongoing thread and also we saw in the past we did not see > > much=20 > > difference between "batch commit" and "Batch insert" performance. > > We=20 > > only realized the row by row commit is having worse=C2=A0performance. >=20 > The bottom line is that to solve this a cost is going to have to be > paid=20 > somewhere. Previously=C2=A0 it was done with autocommit in the form of > slow=20 > insert performance. You improved the speed of the inserts by wrapping > multiple inserts in transactions and that led you to this problem, > where=20 > open transactions across sessions is leading to deadlock issues due > to=20 > the same id being inserted in concurrent open sessions. Per my and > Greg=20 > Sabino Mullane comments the solution is going to need planning. Right > now you are playing a round of Whack-a-Mole by making ad-hoc changes > of=20 > portions of the process without reference to the process as a > whole.At=20 > some point the parallel duplicate ids(records) need to be > straightened=20 > out into a serial application of data. You and the other people > involved=20 > need to come up with a coherent view of the process as whole with a > goal=20 > to achieving that. Then you can start planning on where that cost is=20 > best paid: 1) In the data before the streaming. 2) In the streaming=20 > process itself 3) In the database or 4) Spread out across 1-4. >=20 You are using Apache's flink to pump data into your database. It seems to me that you have this occurring:- pipelineA --> sessionA pipelineB --> sessionB etc. You haven't said if the SQL code doing the inserts is coming out of flink or if it is your own code. If it is your own code make sure you are handling SQLException events properly.=20 If there is a flink mailing list or user group, you'll probably find more help with other flink users. Follow Adrian's advice.=20 HTH, Rob