Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sxRHM-00A7Qd-W9 for pgsql-general@arkaria.postgresql.org; Sun, 06 Oct 2024 13:26:33 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sxRHL-00EoDX-8E for pgsql-general@arkaria.postgresql.org; Sun, 06 Oct 2024 13:26:31 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sxRHK-00EoDO-Oz for pgsql-general@lists.postgresql.org; Sun, 06 Oct 2024 13:26:30 +0000 Received: from mail-yw1-x1135.google.com ([2607:f8b0:4864:20::1135]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sxRHD-002nUv-V1 for pgsql-general@postgresql.org; Sun, 06 Oct 2024 13:26:29 +0000 Received: by mail-yw1-x1135.google.com with SMTP id 00721157ae682-6e2f4c1f79bso1564517b3.1 for ; Sun, 06 Oct 2024 06:26:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1728221183; x=1728825983; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=F6065JMDRyJKMUz8pguZRKbnDSj+WFm4bIYL1QkmY1Q=; b=ItRZSIdQl5wkf87Ho9y1xZxfeQX8rXC52Nwdz4D4pFf/HUtWawlCa2nHsyhoQGAgb1 KMf6cz57jhc4RcC2aYZ5R/xVvoeGeWNqruJgVXh7T68kG2soVjiOogqI1+En7Ii4BlTO o784g08VEcQr05AkQCCGUiNbCWlyieh2YrUSHX+B80tfW58g68mLgMrLD8u6l1/x3PML NpGZtOUdlvN0lUwZxQfhhJMvQKjRgsF+jkvgInoj01P5qdHtFk7/Bz73nZ8cvfT+myG/ ntVde9kH7qanVn7+K1m5+RxYcIS7sgTEleeJalkvjyqzHPW5Gvx49bueFC1yWwqBvo5u Kekg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1728221183; x=1728825983; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=F6065JMDRyJKMUz8pguZRKbnDSj+WFm4bIYL1QkmY1Q=; b=LFIQ2onJgpRp+NUG7+acTtm75tea99CoufebiK6+5cGQAUbL+luo8HAOKkQlSTaUVy qXcgUKruNiF8GFHbPujp3RO/rg3ovDwhkhoAI+S4p8ZvwVnZaaSMba19wlpSxcrQPFMH ZsHpuAb43w2+SUFMQHFoFpilMH5BxEstTc4mw0rY8dUNFoiKVIBzKxNzKm4FFqlpJURz 2OnhAbUjZVQ9ubDZPCpTtJGiPuTkC8QrjHkKnhekkW0+zQcV7R8C/O6SqHoMANxuVFfE CVoHGHhLp1azlqMRysirNtMGmotHy0b37sQl1TI1iW3ZQCvkxvRktmZIc7mFe2qkoRrB 9X1Q== X-Gm-Message-State: AOJu0Yw4lnzalAIQ0z682Ji2T5ykTKhCaW1hkN++YtOuLujeQEZg2eIH l8w5zfK7Kw2iAPFk0Z62SzCxt9gBeyhQhN3PAjOG7YGxEqoOfUBDkixaQxIf9webSoB7aLomrec fU+ZKRFnpAXlhGIdSi5Uiz1VLkttbrA== X-Google-Smtp-Source: AGHT+IGo1mn4J/FqwRgdmOT1RUHmWnrboRV64BW8wb9vE4sznfcPpbYsfHg2Y68nFWd2LMocObQpORqSXMhUEE73FIw= X-Received: by 2002:a05:690c:d07:b0:6e2:448c:cbb with SMTP id 00721157ae682-6e2c72be9f2mr66767207b3.45.1728221182951; Sun, 06 Oct 2024 06:26:22 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: KK CHN Date: Sun, 6 Oct 2024 18:56:11 +0530 Message-ID: Subject: Re: CLOSE_WAIT pileup and Application Timeout To: Adrian Klaver Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000b40dcc0623ced9e8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000b40dcc0623ced9e8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Oct 4, 2024 at 9:17=E2=80=AFPM Adrian Klaver wrote: > On 10/3/24 21:29, KK CHN wrote: > > List, > > > > I am facing a network (TCP IP connection closing issue) . > > > > Running a mobile tablet application, Android application to update the > > status of vehicles fleet say around 1000 numbers installed with the app > > on each vehicle along with a vehicle tracking application server > > solution based on Java and Wildfly with PosrgreSQL16 backend. > > > > > > > The running vehicles may disconnect or be unable to send the location > > data in between if the mobile data coverage is less or absent in a > > particular area where data coverage is nil or signal strength less. > > > > The server on which the backend application runs most often ( a week's > > time or so) shows connection timeout and is unable to serve tracking > > of the vehicles further. > > > > When we restart the Wildfly server the application returns to normal. > > again the issue repeats after a week or two. > > Seems the issue is in the application server. What is not clear to me is > whether the connection timeout you refer to is from the mobile devices > to the application or the application to the Postgres server? its from mobile devices to application server. When I do a restart of application server everything backs to normal. But after a period of time again it cripples. That time when I netstat on Application VM lots of CLOSE_WAIT states as indicated. > I'm > guessing the latter as I would expect the mobile devices to drop > connections more often then weekly. > Yes mobile devices may drops connections at any point of time if it > reaches an area where signal strength is poor( eg; Underground parking or > near the areas where mobile data coverage is poor. > > The topology is mobile devices connect and update the location via application VM then finally in PGSQL VM. The application server and Database server both separate virtual machines. Application server hangs most often not the database VM. Since there are other applications which update to the database VM without any issue. The DB VM caters all the writes from other applications. But those applications are different, not fleet management one. > > > In the Server machine when this bottleneck occurs I am seeing a lot > > of TCP/IP CLOSE_WAIT ( 3000 to 5000 ) when the server backend become= s > > unresponsive. > > Again not clear, are you referring to the application or the Postgres > database running on the server? > > > > > What is the root cause of this issue ? Is it due to the android > > application unable to send the CLOSE_WAIT ACK due to poor mobile data > > connectivity ? > > > > > > If so, how do people address this issue ? and what may be a fix ? > > > > Any directions / or reference material most welcome. > > > > Thank you, > > Krishane > > > > > > > > > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > > --000000000000b40dcc0623ced9e8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
On Fri, Oct 4, 2024 at 9:17=E2=80=AFP= M Adrian Klaver <adrian.kla= ver@aklaver.com> wrote:
On 10/3/24 21:29, KK CHN wrote:
> List,
>
> I am facing a=C2=A0 network (TCP IP connection closing issue) .
>
> Running a=C2=A0 mobile tablet application, Android application to upda= te the
> status of vehicles fleet say around 1000 numbers installed with the ap= p
> on each vehicle along=C2=A0 with a=C2=A0 vehicle tracking=C2=A0 applic= ation server
> solution based on Java and Wildfly with=C2=A0 PosrgreSQL16=C2=A0backen= d.
>

>
> The=C2=A0=C2=A0running vehicles may disconnect=C2=A0 or be unable to s= end the location
> data in between if the mobile data coverage is less or absent in a > particular area where data coverage is nil or signal strength=C2=A0les= s.
>
> The server on which the backend application runs most often (=C2=A0a w= eek's
> time=C2=A0 or so) shows connection timeout and is unable to serve trac= king=C2=A0
> of=C2=A0 the vehicles further.
>
> When we restart the=C2=A0 Wildfly server=C2=A0 the application returns= to normal.=C2=A0
> again the issue repeats=C2=A0 after a week or two.

Seems the issue is in the application server. What is not clear to me is whether the connection timeout you refer to is from the mobile devices
to the application or the application to the Postgres server?
<= div>its from mobile devices to application server.=C2=A0 When I do a restar= t of application server everything backs to normal.=C2=A0 But after a perio= d of time again it cripples.=C2=A0 That time when I netstat on Application = VM lots of=C2=A0 CLOSE_WAIT states as indicated.=C2=A0
=C2=A0
I'm
guessing the latter as I would expect the mobile devices to drop
connections more often then weekly.=C2=A0 =C2=A0 =C2=A0
= =C2=A0
Yes mobile de= vices may drops connections at any point of time if it reaches an area wher= e signal strength is poor( eg; Underground parking or near the areas where = mobile data coverage is poor.=C2=A0
>

The topology is mobile devices=C2=A0 c= onnect and update the location via application VM then=C2=A0 =C2=A0finally = in=C2=A0 PGSQL VM.

The application server and=C2= =A0 Database server both separate virtual machines.=C2=A0 =C2=A0 =C2=A0 App= lication server hangs most often not the database VM. Since there are other= applications which update to the database VM without any issue.=C2=A0 The = DB VM caters all the writes from other applications. But those applications= are different, not fleet management one.=C2=A0
=C2=A0

> In the Server machine when this bottleneck occurs=C2=A0 I am seeing=C2= =A0 a lot
> of=C2=A0 TCP/IP CLOSE_WAIT=C2=A0 =C2=A0( 3000 to 5000 ) when the serve= r backend becomes
> unresponsive.

Again not clear, are you referring to the application or the Postgres
database running on the server?

>
> What is the root cause of this issue ?=C2=A0 =C2=A0Is it due to the an= droid
> application unable to send the CLOSE_WAIT ACK due to poor mobile data =
> connectivity ?
>
>
>=C2=A0 =C2=A0If so, how do people=C2=A0 address this issue ?=C2=A0 and = what may be a fix ?
>
>=C2=A0 =C2=A0Any=C2=A0 directions / or reference material most welcome.=
>
> Thank you,
> Krishane
>
>
>
>
>

--
Adrian Klaver
adrian.klave= r@aklaver.com

--000000000000b40dcc0623ced9e8--