public inbox for [email protected]  
help / color / mirror / Atom feed
From: Roman Khapov <[email protected]>
To: Andrey Borodin <[email protected]>
Cc: [email protected] <[email protected]>
Subject: [PATCH v2] Re: Cancel problems of query to pg_stat_statements
Date: Mon, 12 May 2025 15:48:29 +0500
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>

Hi! Thanks for the review!

Its true that qtext_load_file() can be called with acquired lock in pg_stat_statements_internal()…
So I perform INTERRUPTS_PENDING_CONDITION in v2 patch and call CHECK_FOR_INTERRUPTS later, after cycle ended and lock released.






> On 28 Apr 2025, at 11:16, Andrey Borodin <[email protected]> wrote:
> 
> 
> 
>> On 24 Apr 2025, at 22:49, Roman Khapov <[email protected]> wrote:
>> 
>> Hi!
>> Recently we faced a problem in out production psql installation, which was that we had to cancel all requests to the db, including performance monitoring requests, that uses ps_stat_statements. But we could not cancel the request in usual way, and had to kill -9 the pg process of it.
> 
> Interesting problem, thanks for raising it!
> 
>> We've noticed that the the query execution stuck on PGSS_TEXT_FILE file reading in function qtext_load_file, which doesn't have CHECK_FOR_INTERRUPTS in the read cycle. In addition to our case with large PGSS_TEXT_FILE (and maybe the problems with virtual disk i/o) that can explain uncancellable pg_stat_statements queries.
> 
> I'm afraid it might be not so easy to add CHECK_FOR_INTERRUPTS there. Most probably you was holding a LWLockAcquire(pgss->lock, LW_SHARED) somewhere (do you have a backtrace?), which prevent interrupts anyway.
> 
> Thanks!
> 
> 
> Best regards, Andrey Borodin.



Attachments:

  [application/octet-stream] v2-cancelable-qtext_load_file.patch (4.4K, 2-v2-cancelable-qtext_load_file.patch)
  download | inline diff:
From f642c28124e1bd57764bd12296253b78fc9c083c Mon Sep 17 00:00:00 2001
From: rkhapov <[email protected]>
Date: Mon, 12 May 2025 15:29:17 +0500
Subject: [PATCH v2] pg_stat_statements.c: cancelable qtext_load_file

In the case of a large PGSS_TEXT_FILE, the work time of the qtext_load_file
function will be quite long, and the query to the pg_stat_statements table
will not be cancellable, as there is no CHECK_FOR_INTERRUPT in the function.

Also, the amount of bytes read can reach 1 GB, which leads to a slow read
system call that does not allow cancellation of the query. Testing the speed
of sequential read using fio with different block sizes shows that there is
no significant difference between 16 MB blocks and 1 GB blocks.

Therefore, this patch changes the maximum read value from 1 GB to 16 MB and
adds INTERRUPTS_PENDING_CONDITION() check in the read loop of qtext_load_file to make it cancellable.
For now, only statement execution is cancellable (fail_on_interrupt is true only for calls from pg_stat_statements_internal)

Signed-off-by: rkhapov <[email protected]>
---
 .../pg_stat_statements/pg_stat_statements.c   | 29 ++++++++++++++-----
 1 file changed, 22 insertions(+), 7 deletions(-)

diff --git a/contrib/pg_stat_statements/pg_stat_statements.c b/contrib/pg_stat_statements/pg_stat_statements.c
index 9778407cba3..587d49f4330 100644
--- a/contrib/pg_stat_statements/pg_stat_statements.c
+++ b/contrib/pg_stat_statements/pg_stat_statements.c
@@ -365,7 +365,7 @@ static pgssEntry *entry_alloc(pgssHashKey *key, Size query_offset, int query_len
 static void entry_dealloc(void);
 static bool qtext_store(const char *query, int query_len,
 						Size *query_offset, int *gc_count);
-static char *qtext_load_file(Size *buffer_size);
+static char *qtext_load_file(Size *buffer_size, bool fail_on_interrupts);
 static char *qtext_fetch(Size query_offset, int query_len,
 						 char *buffer, Size buffer_size);
 static bool need_gc_qtexts(void);
@@ -767,7 +767,7 @@ pgss_shmem_shutdown(int code, Datum arg)
 	if (fwrite(&num_entries, sizeof(int32), 1, file) != 1)
 		goto error;
 
-	qbuffer = qtext_load_file(&qbuffer_size);
+	qbuffer = qtext_load_file(&qbuffer_size, false /* fail_on_interrupts */ );
 	if (qbuffer == NULL)
 		goto error;
 
@@ -1767,7 +1767,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 
 		/* No point in loading file now if there are active writers */
 		if (n_writers == 0)
-			qbuffer = qtext_load_file(&qbuffer_size);
+			qbuffer = qtext_load_file(&qbuffer_size, true /* fail_on_interrupts */ );
 	}
 
 	/*
@@ -1800,7 +1800,7 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 			pgss->gc_count != gc_count)
 		{
 			free(qbuffer);
-			qbuffer = qtext_load_file(&qbuffer_size);
+			qbuffer = qtext_load_file(&qbuffer_size, true /* fail_on_interrupts */ );
 		}
 	}
 
@@ -1819,6 +1819,12 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 		memset(values, 0, sizeof(values));
 		memset(nulls, 0, sizeof(nulls));
 
+		/* Can't process interrupts here - pgss-lock is acquired */
+		if (INTERRUPTS_PENDING_CONDITION())
+		{
+			break;
+		}
+
 		values[i++] = ObjectIdGetDatum(entry->key.userid);
 		values[i++] = ObjectIdGetDatum(entry->key.dbid);
 		if (api_version >= PGSS_V1_9)
@@ -2014,6 +2020,8 @@ pg_stat_statements_internal(FunctionCallInfo fcinfo,
 
 	LWLockRelease(pgss->lock);
 
+	CHECK_FOR_INTERRUPTS();
+
 	free(qbuffer);
 }
 
@@ -2312,7 +2320,7 @@ error:
  * the caller is responsible for verifying that the result is sane.
  */
 static char *
-qtext_load_file(Size *buffer_size)
+qtext_load_file(Size *buffer_size, bool fail_on_interrupts)
 {
 	char	   *buf;
 	int			fd;
@@ -2365,7 +2373,14 @@ qtext_load_file(Size *buffer_size)
 	nread = 0;
 	while (nread < stat.st_size)
 	{
-		int			toread = Min(1024 * 1024 * 1024, stat.st_size - nread);
+		int			toread = Min(32 * 1024 * 1024, stat.st_size - nread);
+
+		if (fail_on_interrupts && INTERRUPTS_PENDING_CONDITION())
+		{
+			free(buf);
+			CloseTransientFile(fd);
+			return NULL;
+		}
 
 		/*
 		 * If we get a short read and errno doesn't get set, the reason is
@@ -2502,7 +2517,7 @@ gc_qtexts(void)
 	 * file is only going to get bigger; hoping for a future non-OOM result is
 	 * risky and can easily lead to complete denial of service.
 	 */
-	qbuffer = qtext_load_file(&qbuffer_size);
+	qbuffer = qtext_load_file(&qbuffer_size, false /* fail_on_interrupts */ );
 	if (qbuffer == NULL)
 		goto gc_fail;
 
-- 
2.39.5 (Apple Git-154)



view thread (4+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: [PATCH v2] Re: Cancel problems of query to pg_stat_statements
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox