public inbox for [email protected]
help / color / mirror / Atom feedFrom: Дмитрий <[email protected]>
To: Adrian Klaver <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re[2]: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284
Date: Tue, 28 Jan 2025 16:09:07 +0300
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<[email protected]>
Colleagues confirmed that the problem is with the network between data centers. Thank you!
воскресенье, 26 января 2025г., 20:33 +03:00 от Adrian Klaver [email protected] :
>On 1/26/25 03:29, Дмитрий wrote:
> "How was it shut down, on purpose or a hardware/software issue?"
> - I reboot the receiver every 2 minutes on purpose. I determined this
> time empirically, because replication breaks down approximately every
> minute and a half. The reboot helps to advance the receiver.
>
> "Also do you have corresponding logs from primary?"
> - Attached to this message.
>
> "Unless, is there cascading replication going on?"
> - No, this is replication from the leader. The leader has its two
> replicas and they are all in one data center. And the problematic
> replica is needed to migrate to another data center.
>
> "Was that a manual intervention?"
> - Yes, reboot on schedule, every two minutes.
>
> "Is that what is shown above or have you restarted since the above and
> the server is running?"
> - Sometimes replication works without problems for several hours. But
> when a breakdown occurs, rebooting every two minutes helps to catch up
> with this replica.
>1) It would make life easier if the log line entry prefix timestamp was
>set to same precision on primary and standby. As of now it looks like
>the primary has %t (Time stamp without milliseconds) and the standby has
>%m (Time stamp with milliseconds)
>
>2) From the logs.
>
>Primary:
>
>2025-01-26 12:21:27 MSK [656]: [11-1]
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT:
> START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61
>
>2025-01-26 12:21:27 MSK [656]: [12-1]
>app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG:
>disconnection: session time: 0:01:05.329 user=replicator database=
>host=192.168.5.1 port=58380
>
>
>Standby:
>
>2025-01-26 12:21:27.113 MSK [10824] FATAL: could not send data to WAL
>stream: lost synchronization with server: got message type "0", length
>825373235
>
>
>Do you know what is doing START_REPLICATION SLOT?
>
>
> Another interesting point. In addition to this replication, there are
> two more, to the same data center. One replication had the same problem,
> but a one-time restart helped to solve the problem, the replication is
> still working normally. And the second replication does not have such
> problems, it has been working since its launch, more than a month ago.
>
> --
>
>
>
>--
>Adrian Klaver
>[email protected]
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Re[2]: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox