Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tDkFl-002tjN-4v for pgsql-general@arkaria.postgresql.org; Wed, 20 Nov 2024 12:56:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tDkFj-005GOo-9V for pgsql-general@arkaria.postgresql.org; Wed, 20 Nov 2024 12:56:15 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tDkFi-005GOf-UQ for pgsql-general@lists.postgresql.org; Wed, 20 Nov 2024 12:56:14 +0000 Received: from cloud.gatewaynet.com ([185.90.37.94]) by magus.postgresql.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tDkFe-002vLM-KB for pgsql-general@lists.postgresql.org; Wed, 20 Nov 2024 12:56:13 +0000 Content-Type: multipart/alternative; boundary="------------SfgJW6fjZgt0mQWTzrEsO39h" Message-ID: Date: Wed, 20 Nov 2024 14:56:05 +0200 MIME-Version: 1.0 Subject: Re: Suddenly all queries moved to seq scan To: pgsql-general@lists.postgresql.org References: Content-Language: en-US From: Achilleas Mantzios - cloud In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk This is a multi-part message in MIME format. --------------SfgJW6fjZgt0mQWTzrEsO39h Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 11/20/24 12:50, Sreejith P wrote: > Hi, > > We are using PostgresQL 10 in our production database. We have around > 890 req /s request on peak time. > > We have 1 primary and 4 slave databases as well in the same postgres > cluster. > > 2 days back we applied some patches in the primary server and > restarted. We didn't do anything on the secondary server. > > Next day, After 18 hours all our queries from secondary servers > started taking too much time.  queries were working in 2 sec started > taking 80 seconds. Almost all queries behaved the same way. > > After half an hour of outage we restarted all db servers and system > back to normal. > > Still we are not able to understand the root case. We couldn't find > any error log or fatal errors.  During the incident, in  one of the > read server disks was full. We couldn't see any replication lag or > query cancellation due to replication. > > please help Still you say seq scan, how do you know that? You run explain (buffers, analyze, settings) ? If this is indeed the case, then you need to check your indexes. Check when they were last used/scanned and if they are usable. Did the disk error had any of your PostgreSQL tablespaces / tables/indexes  ? > > Regards > Sreejith > > > > > > *Solutions for Care Anywhere* > > ------------------------------------------------------------------------ > *dWise HealthCare IT Solutions Pvt. Ltd.*| www.lifetrenz.com > > *Disclaimer*: The information and attachments contained in this email > are intended for exclusive use of the addressee(s) and may contain > confidential or privileged information. If you are not the intended > recipient, please notify the sender immediately and destroy all copies > of this message and any attachments. The views expressed in this email > are, unless otherwise stated, those of the author and not those of > dWise HealthCare IT Solutions or its management. --------------SfgJW6fjZgt0mQWTzrEsO39h Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit

On 11/20/24 12:50, Sreejith P wrote:

Hi,

We are using PostgresQL 10 in our production database.  We have around 890 req /s request on peak time.

We have 1 primary and 4 slave databases as well in the same postgres cluster. 

2 days back we applied some patches in the primary server and restarted. We didn't do anything on the secondary server.

Next day, After 18 hours all our queries from secondary servers started taking too much time.  queries were working in 2 sec started taking 80 seconds. Almost all queries behaved the same way.

After half an hour of outage we restarted all db servers and system back to normal.

Still we are not able to understand the root case. We couldn't find any error log or fatal errors.  During the incident, in  one of the read server disks was full. We couldn't see any replication lag or query cancellation due to replication.

please help
Still you say seq scan, how do you know that? You run explain (buffers, analyze, settings) ? If this is indeed the case, then you need to check your indexes. Check when they were last used/scanned and if they are usable. Did the disk error had any of your PostgreSQL tablespaces / tables/indexes  ?

Regards
Sreejith





 

Solutions for Care Anywhere


dWise HealthCare IT Solutions Pvt. Ltd. | www.lifetrenz.com
Disclaimer: The information and attachments contained in this email are intended for exclusive use of the addressee(s) and may contain confidential or privileged information. If you are not the intended recipient, please notify the sender immediately and destroy all copies of this message and any attachments. The views expressed in this email are, unless otherwise stated, those of the author and not those of dWise HealthCare IT Solutions or its management.
--------------SfgJW6fjZgt0mQWTzrEsO39h--