Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tEGwR-006RXp-UH for pgsql-general@arkaria.postgresql.org; Thu, 21 Nov 2024 23:50:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tEGwP-0031rD-SQ for pgsql-general@arkaria.postgresql.org; Thu, 21 Nov 2024 23:50:29 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tEGwP-0031r3-Cb for pgsql-general@lists.postgresql.org; Thu, 21 Nov 2024 23:50:29 +0000 Received: from mail-oa1-x31.google.com ([2001:4860:4864:20::31]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tEGwM-003BVf-Ul for pgsql-general@postgresql.org; Thu, 21 Nov 2024 23:50:28 +0000 Received: by mail-oa1-x31.google.com with SMTP id 586e51a60fabf-296c184ed6bso981454fac.0 for ; Thu, 21 Nov 2024 15:50:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732233025; x=1732837825; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=SuW9J9FynAGHz/rNUAUixTFpw+3jxLLahpWGuuB4Hvw=; b=LagELQ60O+7O2r12rfOsJVUa2pooenpQHKW7w8dMmgCEshD5+YqIBpiAPWIWZFUFT/ Vy8Vry1WzFONY8b2ZxIxmKUc88wjjFPPFlFjPSI8cIzvUcOIIT3zmi6mBzDqb/vwfquf Ms6JXhDFjvGncU/9wODxDCDmqWvVfA+j+mpPrug//uQXfoWNOB9oZ2AvEIUhP0IE/OH4 hyBHMqfEQYsbamY1Yzl/nHs8bfkNgTE3USzdAygVayn70qogHsxGwgBmL9lsgBojPLcn yTPczDMTNoPYXcfRpNSrwmzwVLWtPDvskClpuzanfegYjbQHGkl7NAjplTGbSMC0zG3w MtSA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732233025; x=1732837825; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SuW9J9FynAGHz/rNUAUixTFpw+3jxLLahpWGuuB4Hvw=; b=HutYqol3kJN+0DCh7w5umxxraddghUDZFaZf1nNDFMRuI4XGh+MY+qslgixI52dSId ay60X30gVEaCgxmtGMDOn2tCThz/1hpZKUDv27cb90ejL8UPR1ZIgtbNJwT1ZTx/Bn0g eyLyeAs8y/j8yqYIjrCLGjOMSqpbkpBiZuXgvU+UEzJLSmAGhBVdQxUxjn+JgM+m1+lT OdYlMy+gXEteDIVuXmlsdAEf/lKFq5iBY8FnRa+hXdFpi1bKa6YipKM4dTRqWZW7JplF F3Ikny5fmmWK0gIACHZakE3Qld6JdHEBw2wO08V56KiNSZnh9Atgi+lgTywP9IrcNYmh 2lEg== X-Gm-Message-State: AOJu0Ywk4M1zxe22/E4MWscGBPJrVx/mRlnJ1D7W/YBs91y2ZKeT2BL1 L93Vpds2Q5vAo78pZ4bvQL+uWFcIE6WFobVaf1yMJFsEDqkSACGXH4TyySFZACw6EDqeiZnLwA1 MQCK0NCz5oXaAoEUW9LfsXw9B6qZl/mRw X-Gm-Gg: ASbGnctzAQR8DqB8hoi8p8TUiiVtT/BCeJcQzTEiJH9Vimj3g9eHP9w3dsbUd9x3QqM 35ee5Ik0WmIRSoRwcCGb9EdhZhHqIbOw= X-Google-Smtp-Source: AGHT+IG3tcNYX1o6v12Hg711O1UUPIvHooVL+aYj8OPcUH+9YsDOCc9tAWWMscNoqBT/jbeArHwIHleDngTD51L4YBk= X-Received: by 2002:a05:6870:5d88:b0:296:d7f1:4203 with SMTP id 586e51a60fabf-29720c77e7dmr893416fac.22.1732233025077; Thu, 21 Nov 2024 15:50:25 -0800 (PST) MIME-Version: 1.0 References: <6a157286-5948-48ac-bc50-7bb4e65ee5e5@aklaver.com> <38725031-5e80-46ee-b403-a92107b24c39@aklaver.com> <1e2db3f5-00f4-4ec2-800f-46136c2fad59@aklaver.com> <1270021b-c5b5-4ca8-b48d-7dfb163bb6f7@aklaver.com> In-Reply-To: <1270021b-c5b5-4ca8-b48d-7dfb163bb6f7@aklaver.com> From: Steeve Boulanger Date: Thu, 21 Nov 2024 17:50:13 -0600 Message-ID: Subject: Re: Database stats ( pg_stat_database.stats_reset ) get reset on daily basis - why? To: Adrian Klaver Cc: pgsql-general Content-Type: multipart/alternative; boundary="00000000000020c882062774eef4" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000020c882062774eef4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable > 1) Do the 77 share some trait the other 80 don't. No pattern found yet .. but still verifying a few things > 2) Do the OS system logs reveal anything? Nothing found in syslog > 3) What was happening in the databases just prior to the time the stats reset? Here's an example (log extracts) for a stats reset occurrence: select datname, stats_reset, now()-stats_reset as since_reset from pg_stat_database where ( now()-stats_reset ) < interval '1 day' order by 3 limit 1; datname | stats_reset | since_reset ----------------+-------------------------------+----------------- MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304 <--LOGS--> 2024-11-21 13:48:34.324 UTC pid=3D[322035][2] db=3D[MyDB] usr=3D[user1] client=3D[host1] app=3D[[unknown]]LOG: connection authorized: user=3Duser1 database=3DMyDB applicatio n_name=3Dapp1 <..> <.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..> 2024-11-21 13:48:34.336 UTC pid=3D[322035][3] db=3D[MyDB] usr=3D[user1] client=3D[host1] app=3D[app1]LOG: duration: 1.071 ms parse : SEL= ECT <..> <--LOGS--> As you can see from above, the stats for MyDB were reset at ".332" . The only logs before/after for the db was the connection (at .324), and then the parse (at .336). NB: I also checked the logs at ".333" in case there would have been a rounding up, but nothing relevant was found. With that said, I only verified one occurence - tomorrow I'll check a few more just to validate. > 4) Do you have external tools accessing these databases? We have internal micro-services accessing the databases, as well as a monitoring tool (Netdata), and some of the Devs use pgAdmin. I discarded the scenario where someone would inadvertently do a "pg_stat_reset" via pgAdmin, just because a lot of databases have their stats reset within a short period of time. On the other hand, Netdata does connect to most (if not all) databases frequently by its nature - so as a test, I stopped the Netdata service today to see if tomorrow we're still seeing the stats reset or not. I can report back tomorrow on this. > 5) Is the cluster directly open to the world? No. It's an on-premise installation. Only local applications can connect to it. -Steeve On Thu, Nov 21, 2024 at 4:32=E2=80=AFPM Adrian Klaver wrote: > On 11/21/24 13:31, Steeve Boulanger wrote: > > > All I can think to do is look at the logs around the stats_reset > times > > > for the databases and see if there is anything relevant. > > > > That was already done, but nothing relevant was found unfortunately. > > Unless it was not recognized as relevant. Since for the time being I am > eliminating magic as the cause, something concrete is causing this and > it should be leaving a trace. In your post you had this affecting 77 out > of 157 databases in the cluster. > > 1) Do the 77 share some trait the other 80 don't. > > 2) Do the OS system logs reveal anything? > > 3) What was happening in the databases just prior to the time the stats > reset? > > 4) Do you have external tools accessing these databases? > > 5) Is the cluster directly open to the world? > > > > > -Steeve > > > > On Thu, Nov 21, 2024 at 3:12=E2=80=AFPM Adrian Klaver > > wrote: > > > > On 11/21/24 12:57, Steeve Boulanger wrote: > > > > > > > Please reply to list also. > > > > > > My apologies - I thought I did a "Reply all", but apparently not= . > > I'm a > > > little bit of a noob with email distrib lists. > > > > > > > 1) What is log_min_error_statement set to? > > > > > > name | setting | pending_restart > > > -------------------------+---------+----------------- > > > log_min_error_statement | error | f > > > > > > > 2) Did you reload the server when changing?: > > > > > > yes - pg_reload_conf() > > > > All I can think to do is look at the logs around the stats_reset > times > > for the databases and see if there is anything relevant. > > > > > > > > -Steeve > > > > > > -- > > Adrian Klaver > > adrian.klaver@aklaver.com > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > > --00000000000020c882062774eef4 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
>=C2=A0 1) Do the 77 share some trait the other 80 don't.

No= pattern found yet .. but still verifying a few things

> 2) Do th= e OS system logs reveal anything?

Nothing found in= syslog=C2=A0

> 3) What was happening in the databases just prior= to the time the stats
reset?

Here's an example (log extracts= ) for a stats reset occurrence:=C2=A0

select datname, stats_reset, now()-stats_reset as since_rese= t
from pg_stat_database
where ( now()-stats_reset ) < interval = 9;1 day'
order by 3 =C2=A0limit 1;


=C2=A0 =C2=A0 datname =C2=A0 =C2=A0 | =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0stats_reset =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| =C2=A0 since_reset
--= --------------+-------------------------------+-----------------
=C2=A0<= font color=3D"#ff0000">MyDB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | = 2024-11-21 13:48:34.332785+00 | 00:00:22= .266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=3D[322035][2] =C2=A0db=3D[= MyDB] usr=3D[user1] client=3D[host1] app=3D[[unknown]]LOG: =C2=A0connection authorized: user=3Duser1 database=3D= MyDB applicatio
n_name=3Dapp1 <..>

<.. no calls at "2024-11-21 13:48:34.= 332" - WHY?? ..>
<= br>2024-11-21 13:48:34.336 UTC pid=3D[322035][3] =C2=A0db=3D[M= yDB] usr=3D[user1] client=3D[host1] app=3D[app1]LOG: =C2=A0duration:= 1.071 ms =C2=A0parse <unnamed>: SELEC= T <..>
<--L= OGS-->

As you can see from above, the st= ats for MyDB were reset at=C2=A0 ".332" . The only logs before/af= ter for the db was the connection (at .324), and then the parse (at .336).= =C2=A0 NB: I also checked the logs at ".333" in case there would = have been a rounding up, but nothing relevant was found. With that said, I = only verified one occurence - tomorrow I'll check a few more just to va= lidate.=C2=A0


> 4) Do you have external too= ls accessing these databases?

We have internal micro-services access= ing the databases, as well as a monitoring tool (Netdata), and some of the = Devs use pgAdmin. I discarded the scenario where someone would inadvertentl= y do a "pg_stat_reset" via pgAdmin, just because a lot of databas= es have their stats reset within a short period of time.=C2=A0
On the other hand, Netdata does connect to most (if not all) d= atabases frequently by its nature - so as a test, I stopped the Netdata ser= vice today to see if tomorrow we're still seeing the stats reset or not= . I can report back tomorrow on this.=C2=A0

> 5) Is the cl= uster directly open to the world?

No. It's an = on-premise installation. Only local applications can connect to it.=C2=A0


-Steeve

On Thu, Nov 21, 2024 at= 4:32=E2=80=AFPM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 11/21/24 13:31, Steeve Boulanger wrote: >=C2=A0 > All I can think to do is look at the logs=C2=A0 around the = stats_reset times
>=C2=A0 > for the databases and see if there is anything relevant. >
> That was already done, but nothing relevant was found unfortunately.
Unless it was not recognized as relevant. Since for the time being I am eliminating magic as the cause, something concrete is causing this and
it should be leaving a trace. In your post you had this affecting 77 out of 157 databases in the cluster.

1) Do the 77 share some trait the other 80 don't.

2) Do the OS system logs reveal anything?

3) What was happening in the databases just prior to the time the stats reset?

4) Do you have external tools accessing these databases?

5) Is the cluster directly open to the world?

>
> -Steeve
>
> On Thu, Nov 21, 2024 at 3:12=E2=80=AFPM Adrian Klaver <adrian.klaver@aklaver.co= m
> <mailto:adrian.klaver@aklaver.com>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0On 11/21/24 12:57, Steeve Boulanger wrote:
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 > Please reply to list also.
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 > My apologies - I thought I did a "Reply = all", but apparently not.
>=C2=A0 =C2=A0 =C2=A0I'm a
>=C2=A0 =C2=A0 =C2=A0 > little bit of a noob with email distrib lists= .
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 > 1) What is log_min_error_statement= set to?
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 name= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 | setting | pending_restart
>=C2=A0 =C2=A0 =C2=A0 > -------------------------+---------+---------= --------
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 =C2=A0log_min_error_statement | error = =C2=A0 | f
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 >=C2=A0 > 2) Did you reload the server when = changing?:
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 > yes - pg_reload_conf()
>
>=C2=A0 =C2=A0 =C2=A0All I can think to do is look at the logs=C2=A0 aro= und the stats_reset times
>=C2=A0 =C2=A0 =C2=A0for the databases and see if there is anything rele= vant.
>
>=C2=A0 =C2=A0 =C2=A0 >
>=C2=A0 =C2=A0 =C2=A0 > -Steeve
>
>
>=C2=A0 =C2=A0 =C2=A0--
>=C2=A0 =C2=A0 =C2=A0Adrian Klaver
>=C2=A0 =C2=A0 =C2=A0adrian.klaver@aklaver.com <mailto:adrian.klaver@aklaver.com><= br> >

--
Adrian Klaver
adrian.klave= r@aklaver.com

--00000000000020c882062774eef4--