[PATCH] Fix loose polling in 019_replslot

public inbox for [email protected]  
help / color / mirror / Atom feed

[PATCH] Fix loose polling in 019_replslot_limit.pl test
4+ messages / 2 participants
[nested] [flat]

* [PATCH] Fix loose polling in 019_replslot_limit.pl test
@ 2026-06-06 20:32 Zakariyah Ali <[email protected]>
  2026-06-08 04:18 ` Re: [PATCH] Fix loose polling in 019_replslot_limit.pl test Kyotaro Horiguchi <[email protected]>
  2026-06-19 13:52 ` Re: [PATCH] Fix loose polling in 019_replslot_limit.pl test Zakariyah Ali <[email protected]>
  2026-06-20 17:49 ` [PATCH v2] Standardize log polling using wait_for_log() across recovery tests Zakariyah Ali <[email protected]>
  0 siblings, 3 replies; 4+ messages in thread

From: Zakariyah Ali @ 2026-06-06 20:32 UTC (permalink / raw)
  To: [email protected]; +Cc: Zakariyah Ali <[email protected]>

Signed-off-by: Zakariyah Ali <[email protected]>
---
 src/test/recovery/t/019_replslot_limit.pl | 78 +++++------------------
 1 file changed, 16 insertions(+), 62 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index a412faf51c6..3fdce739965 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -186,18 +186,9 @@ $node_primary->advance_wal(7);
 $node_primary->safe_psql('postgres',
 	'ALTER SYSTEM RESET max_wal_size; SELECT pg_reload_conf()');
 $node_primary->safe_psql('postgres', "CHECKPOINT;");
-my $invalidated = 0;
-for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
-{
-	if ($node_primary->log_contains(
-			'invalidating obsolete replication slot "rep1"', $logstart))
-	{
-		$invalidated = 1;
-		last;
-	}
-	usleep(100_000);
-}
-ok($invalidated, 'check that slot invalidation has been logged');
+ok( $node_primary->wait_for_log(
+		'invalidating obsolete replication slot "rep1"', $logstart),
+	'check that slot invalidation has been logged');
 
 $result = $node_primary->safe_psql(
 	'postgres',
@@ -208,17 +199,8 @@ is($result, "rep1|f|t|lost|",
 	'check that the slot became inactive and the state "lost" persists');
 
 # Wait until current checkpoint ends
-my $checkpoint_ended = 0;
-for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
-{
-	if ($node_primary->log_contains("checkpoint complete: ", $logstart))
-	{
-		$checkpoint_ended = 1;
-		last;
-	}
-	usleep(100_000);
-}
-ok($checkpoint_ended, 'waited for checkpoint to end');
+ok( $node_primary->wait_for_log("checkpoint complete: ", $logstart),
+	'waited for checkpoint to end');
 
 # The invalidated slot shouldn't keep the old-segment horizon back;
 # see bug #17103: https://postgr.es/m/[email protected]
@@ -238,18 +220,10 @@ is($oldestseg, $redoseg, "check that segments have been removed");
 $logstart = -s $node_standby->logfile;
 $node_standby->start;
 
-my $failed = 0;
-for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
-{
-	if ($node_standby->log_contains(
-			"This replication slot has been invalidated due to \"wal_removed\".",
-			$logstart))
-	{
-		$failed = 1;
-		last;
-	}
-	usleep(100_000);
-}
+my $failed =
+  $node_standby->wait_for_log(
+	"This replication slot has been invalidated due to \"wal_removed\".",
+	$logstart);
 ok($failed, 'check that replication has been broken');
 
 $node_primary->stop;
@@ -374,20 +348,10 @@ $logstart = -s $node_primary3->logfile;
 kill 'STOP', $senderpid, $receiverpid;
 $node_primary3->advance_wal(2);
 
-my $msg_logged = 0;
-my $max_attempts = $PostgreSQL::Test::Utils::timeout_default;
-while ($max_attempts-- >= 0)
-{
-	if ($node_primary3->log_contains(
-			"terminating process $senderpid to release replication slot \"rep3\"",
-			$logstart))
-	{
-		$msg_logged = 1;
-		last;
-	}
-	sleep 1;
-}
-ok($msg_logged, "walsender termination logged");
+ok( $node_primary3->wait_for_log(
+		"terminating process $senderpid to release replication slot \"rep3\"",
+		$logstart),
+	"walsender termination logged");
 
 # Now let the walsender continue; slot should be killed now.
 # (Must not let walreceiver run yet; otherwise the standby could start another
@@ -398,19 +362,9 @@ $node_primary3->poll_query_until('postgres',
 	"lost")
   or die "timed out waiting for slot to be lost";
 
-$msg_logged = 0;
-$max_attempts = $PostgreSQL::Test::Utils::timeout_default;
-while ($max_attempts-- >= 0)
-{
-	if ($node_primary3->log_contains(
-			'invalidating obsolete replication slot "rep3"', $logstart))
-	{
-		$msg_logged = 1;
-		last;
-	}
-	sleep 1;
-}
-ok($msg_logged, "slot invalidation logged");
+ok( $node_primary3->wait_for_log(
+		'invalidating obsolete replication slot "rep3"', $logstart),
+	"slot invalidation logged");
 
 # Now let the walreceiver continue, so that the node can be stopped cleanly
 kill 'CONT', $receiverpid;
-- 
2.43.0







^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix loose polling in 019_replslot_limit.pl test
  2026-06-06 20:32 [PATCH] Fix loose polling in 019_replslot_limit.pl test Zakariyah Ali <[email protected]>
@ 2026-06-08 04:18 ` Kyotaro Horiguchi <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Kyotaro Horiguchi @ 2026-06-08 04:18 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

Hello.

At Sat,  6 Jun 2026 21:32:22 +0100, Zakariyah Ali <[email protected]> wrote in 
> -git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
> index a412faf51c6..3fdce739965 100644
> --- a/src/test/recovery/t/019_replslot_limit.pl
> +++ b/src/test/recovery/t/019_replslot_limit.pl
> @@ -186,18 +186,9 @@ $node_primary->advanc

(Some explanation of the motivation would be helpful.)

These are test scripts, so unless there is some functional issue with
the existing code, I'm not sure we should actively replace existing
implementations with wait_for_log(). In other words, I would normally
expect this kind of change to happen only when the surrounding code is
being modified for some functional reason.

As for this patch specifically, there are still other places using
open-coded log searches (for example, 033_replay_tsp_drops.pl). If we
decide to make this kind of change, I think it would make more sense
to update similar cases together.

Regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: [PATCH] Fix loose polling in 019_replslot_limit.pl test
  2026-06-06 20:32 [PATCH] Fix loose polling in 019_replslot_limit.pl test Zakariyah Ali <[email protected]>
@ 2026-06-19 13:52 ` Zakariyah Ali <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Zakariyah Ali @ 2026-06-19 13:52 UTC (permalink / raw)
  To: Kyotaro Horiguchi <[email protected]>; +Cc: pgsql-hackers

Hi Kyotaro,

Thank you for the review and the helpful feedback.

You are absolutely right that I should have included the motivation. The original intent was to reduce code duplication and use the existing `wait_for_log()` helper for better consistency in the test suite. 

I understand your concern about not touching functioning test code without a strong functional reason. Also, I agree with your point about inconsistency, leaving some tests using `wait_for_log()` while others (like `033_replay_tsp_drops.pl`) use open-coded log searches makes the codebase less uniform.

To address this, I've decided to take the broader approach. I am currently reviewing the rest of the test suite to find similar open-coded usages. I will prepare and submit a v2 patch that standardizes all these instances to use `wait_for_log()` together.

Thanks again for pointing me in the right direction. I'll follow up with the updated patch shortly.

Best Regards,
Zakariyah Ali.

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* [PATCH v2] Standardize log polling using wait_for_log() across recovery tests
  2026-06-06 20:32 [PATCH] Fix loose polling in 019_replslot_limit.pl test Zakariyah Ali <[email protected]>
@ 2026-06-20 17:49 ` Zakariyah Ali <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Zakariyah Ali @ 2026-06-20 17:49 UTC (permalink / raw)
  To: Kyotaro Horiguchi <[email protected]>; +Cc: pgsql-hackers; Zakariyah Ali <[email protected]>

This patch replaces open-coded log wait loops with the existing wait_for_log() helper in the recovery test suite, specifically updating 019_replslot_limit.pl, 033_replay_tsp_drops.pl, and 041_checkpoint_at_promote.pl. This removes duplicated polling logic and ensures a consistent approach to waiting for specific log lines during tests.

Signed-off-by: Zakariyah Ali <[email protected]>
---
 src/test/recovery/t/019_replslot_limit.pl     | 78 ++++---------------
 src/test/recovery/t/033_replay_tsp_drops.pl   | 16 ++--
 .../recovery/t/041_checkpoint_at_promote.pl   | 13 +---
 3 files changed, 23 insertions(+), 84 deletions(-)

diff --git a/src/test/recovery/t/019_replslot_limit.pl b/src/test/recovery/t/019_replslot_limit.pl
index a412faf51c6..3fdce739965 100644
--- a/src/test/recovery/t/019_replslot_limit.pl
+++ b/src/test/recovery/t/019_replslot_limit.pl
@@ -186,18 +186,9 @@ $node_primary->advance_wal(7);
 $node_primary->safe_psql('postgres',
 	'ALTER SYSTEM RESET max_wal_size; SELECT pg_reload_conf()');
 $node_primary->safe_psql('postgres', "CHECKPOINT;");
-my $invalidated = 0;
-for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
-{
-	if ($node_primary->log_contains(
-			'invalidating obsolete replication slot "rep1"', $logstart))
-	{
-		$invalidated = 1;
-		last;
-	}
-	usleep(100_000);
-}
-ok($invalidated, 'check that slot invalidation has been logged');
+ok( $node_primary->wait_for_log(
+		'invalidating obsolete replication slot "rep1"', $logstart),
+	'check that slot invalidation has been logged');
 
 $result = $node_primary->safe_psql(
 	'postgres',
@@ -208,17 +199,8 @@ is($result, "rep1|f|t|lost|",
 	'check that the slot became inactive and the state "lost" persists');
 
 # Wait until current checkpoint ends
-my $checkpoint_ended = 0;
-for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
-{
-	if ($node_primary->log_contains("checkpoint complete: ", $logstart))
-	{
-		$checkpoint_ended = 1;
-		last;
-	}
-	usleep(100_000);
-}
-ok($checkpoint_ended, 'waited for checkpoint to end');
+ok( $node_primary->wait_for_log("checkpoint complete: ", $logstart),
+	'waited for checkpoint to end');
 
 # The invalidated slot shouldn't keep the old-segment horizon back;
 # see bug #17103: https://postgr.es/m/[email protected]
@@ -238,18 +220,10 @@ is($oldestseg, $redoseg, "check that segments have been removed");
 $logstart = -s $node_standby->logfile;
 $node_standby->start;
 
-my $failed = 0;
-for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
-{
-	if ($node_standby->log_contains(
-			"This replication slot has been invalidated due to \"wal_removed\".",
-			$logstart))
-	{
-		$failed = 1;
-		last;
-	}
-	usleep(100_000);
-}
+my $failed =
+  $node_standby->wait_for_log(
+	"This replication slot has been invalidated due to \"wal_removed\".",
+	$logstart);
 ok($failed, 'check that replication has been broken');
 
 $node_primary->stop;
@@ -374,20 +348,10 @@ $logstart = -s $node_primary3->logfile;
 kill 'STOP', $senderpid, $receiverpid;
 $node_primary3->advance_wal(2);
 
-my $msg_logged = 0;
-my $max_attempts = $PostgreSQL::Test::Utils::timeout_default;
-while ($max_attempts-- >= 0)
-{
-	if ($node_primary3->log_contains(
-			"terminating process $senderpid to release replication slot \"rep3\"",
-			$logstart))
-	{
-		$msg_logged = 1;
-		last;
-	}
-	sleep 1;
-}
-ok($msg_logged, "walsender termination logged");
+ok( $node_primary3->wait_for_log(
+		"terminating process $senderpid to release replication slot \"rep3\"",
+		$logstart),
+	"walsender termination logged");
 
 # Now let the walsender continue; slot should be killed now.
 # (Must not let walreceiver run yet; otherwise the standby could start another
@@ -398,19 +362,9 @@ $node_primary3->poll_query_until('postgres',
 	"lost")
   or die "timed out waiting for slot to be lost";
 
-$msg_logged = 0;
-$max_attempts = $PostgreSQL::Test::Utils::timeout_default;
-while ($max_attempts-- >= 0)
-{
-	if ($node_primary3->log_contains(
-			'invalidating obsolete replication slot "rep3"', $logstart))
-	{
-		$msg_logged = 1;
-		last;
-	}
-	sleep 1;
-}
-ok($msg_logged, "slot invalidation logged");
+ok( $node_primary3->wait_for_log(
+		'invalidating obsolete replication slot "rep3"', $logstart),
+	"slot invalidation logged");
 
 # Now let the walreceiver continue, so that the node can be stopped cleanly
 kill 'CONT', $receiverpid;
diff --git a/src/test/recovery/t/033_replay_tsp_drops.pl b/src/test/recovery/t/033_replay_tsp_drops.pl
index 93c89550511..f9e5c990bcb 100644
--- a/src/test/recovery/t/033_replay_tsp_drops.pl
+++ b/src/test/recovery/t/033_replay_tsp_drops.pl
@@ -130,16 +130,10 @@ $node_primary->safe_psql(
 # Standby should fail and should not silently skip replaying the wal
 # In this test, PANIC turns into WARNING by allow_in_place_tablespaces.
 # Check the log messages instead of confirming standby failure.
-my $max_attempts = $PostgreSQL::Test::Utils::timeout_default * 10;
-while ($max_attempts-- >= 0)
-{
-	last
-	  if (
-		$node_standby->log_contains(
-			qr!WARNING: ( [A-Z0-9]+:)? creating missing directory: pg_tblspc/!,
-			$logstart));
-	usleep(100_000);
-}
-ok($max_attempts > 0, "invalid directory creation is detected");
+ok(
+	$node_standby->wait_for_log(
+		qr!WARNING: ( [A-Z0-9]+:)? creating missing directory: pg_tblspc/!,
+		$logstart),
+	"invalid directory creation is detected");
 
 done_testing();
diff --git a/src/test/recovery/t/041_checkpoint_at_promote.pl b/src/test/recovery/t/041_checkpoint_at_promote.pl
index d0783fef9ae..932647e35d5 100644
--- a/src/test/recovery/t/041_checkpoint_at_promote.pl
+++ b/src/test/recovery/t/041_checkpoint_at_promote.pl
@@ -107,17 +107,8 @@ $node_standby->safe_psql('postgres',
 
 # Wait until the previous restart point completes on the newly-promoted
 # standby, checking the logs for that.
-my $checkpoint_complete = 0;
-foreach my $i (0 .. 10 * $PostgreSQL::Test::Utils::timeout_default)
-{
-	if ($node_standby->log_contains("restartpoint complete", $logstart))
-	{
-		$checkpoint_complete = 1;
-		last;
-	}
-	usleep(100_000);
-}
-is($checkpoint_complete, 1, 'restart point has completed');
+ok($node_standby->wait_for_log("restartpoint complete", $logstart),
+	'restart point has completed');
 
 # Kill with SIGKILL, forcing all the backends to restart.
 my $psql_timeout = IPC::Run::timer(3600);
-- 
2.43.0







^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2026-06-20 17:49 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-06-06 20:32 [PATCH] Fix loose polling in 019_replslot_limit.pl test Zakariyah Ali <[email protected]>
2026-06-08 04:18 ` Kyotaro Horiguchi <[email protected]>
2026-06-19 13:52 ` Zakariyah Ali <[email protected]>
2026-06-20 17:49 ` [PATCH v2] Standardize log polling using wait_for_log() across recovery tests Zakariyah Ali <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox