public inbox for [email protected]
help / color / mirror / Atom feedFrom: Adrian Klaver <[email protected]>
To: Дмитрий <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284
Date: Sun, 26 Jan 2025 09:33:06 -0800
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
On 1/26/25 03:29, Дмитрий wrote:
> "How was it shut down, on purpose or a hardware/software issue?"
> - I reboot the receiver every 2 minutes on purpose. I determined this
> time empirically, because replication breaks down approximately every
> minute and a half. The reboot helps to advance the receiver.
>
> "Also do you have corresponding logs from primary?"
> - Attached to this message.
>
> "Unless, is there cascading replication going on?"
> - No, this is replication from the leader. The leader has its two
> replicas and they are all in one data center. And the problematic
> replica is needed to migrate to another data center.
>
> "Was that a manual intervention?"
> - Yes, reboot on schedule, every two minutes.
>
> "Is that what is shown above or have you restarted since the above and
> the server is running?"
> - Sometimes replication works without problems for several hours. But
> when a breakdown occurs, rebooting every two minutes helps to catch up
> with this replica.
1) It would make life easier if the log line entry prefix timestamp was
set to same precision on primary and standby. As of now it looks like
the primary has %t (Time stamp without milliseconds) and the standby has
%m (Time stamp with milliseconds)
2) From the logs.
Primary:
2025-01-26 12:21:27 MSK [656]: [11-1]
app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT:
START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61
2025-01-26 12:21:27 MSK [656]: [12-1]
app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG:
disconnection: session time: 0:01:05.329 user=replicator database=
host=192.168.5.1 port=58380
Standby:
2025-01-26 12:21:27.113 MSK [10824] FATAL: could not send data to WAL
stream: lost synchronization with server: got message type "0", length
825373235
Do you know what is doing START_REPLICATION SLOT?
>
> Another interesting point. In addition to this replication, there are
> two more, to the same data center. One replication had the same problem,
> but a one-time restart helped to solve the problem, the replication is
> still working normally. And the second replication does not have such
> problems, it has been working since its launch, more than a month ago.
>
> --
>
--
Adrian Klaver
[email protected]
view thread (4+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox