Re[2]: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Дмитрий <[email protected]>
To: Adrian Klaver <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re[2]: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284
Date: Tue, 28 Jan 2025 16:09:07 +0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<[email protected]>


Colleagues confirmed that the problem is with the network between data centers. Thank you!
воскресенье, 26 января 2025г., 20:33 +03:00 от Adrian Klaver  [email protected] :

>On 1/26/25 03:29, Дмитрий wrote:
> "How was it shut down, on purpose or a hardware/software issue?"
> - I reboot the receiver every 2 minutes on purpose. I determined this 
> time empirically, because replication breaks down approximately every 
> minute and a half. The reboot helps to advance the receiver.
>
> "Also do you have corresponding logs from primary?"
> - Attached to this message.
>
> "Unless, is there cascading replication going on?"
> - No, this is replication from the leader. The leader has its two 
> replicas and they are all in one data center. And the problematic 
> replica is needed to migrate to another data center.
>
> "Was that a manual intervention?"
> - Yes, reboot on schedule, every two minutes.
>
> "Is that what is shown above or have you restarted since the above and
> the server is running?"
> - Sometimes replication works without problems for several hours. But 
> when a breakdown occurs, rebooting every two minutes helps to catch up 
> with this replica.
>1) It would make life easier if the log line entry prefix timestamp was 
>set to same precision on primary and standby. As of now it looks like 
>the primary has %t (Time stamp without milliseconds) and the standby has
>%m (Time stamp with milliseconds)
>
>2) From the logs.
>
>Primary:
>
>2025-01-26 12:21:27 MSK [656]: [11-1] 
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT: 
>  START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61
>
>2025-01-26 12:21:27 MSK [656]: [12-1] 
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG: 
>disconnection: session time: 0:01:05.329 user=replicator database= 
>host=192.168.5.1 port=58380
>
>
>Standby:
>
>2025-01-26 12:21:27.113 MSK [10824] FATAL:  could not send data to WAL 
>stream: lost synchronization with server: got message type "0", length 
>825373235
>
>
>Do you know what is doing START_REPLICATION SLOT?
>
>
> Another interesting point. In addition to this replication, there are 
> two more, to the same data center. One replication had the same problem, 
> but a one-time restart helped to solve the problem, the replication is 
> still working normally. And the second replication does not have such 
> problems, it has been working since its launch, more than a month ago.
>
> --
>
>
>
>-- 
>Adrian Klaver
>[email protected]

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Re[2]: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox