Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tc6VX-00FXpG-BI for pgsql-general@arkaria.postgresql.org; Sun, 26 Jan 2025 17:33:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tc6VV-007owl-0N for pgsql-general@arkaria.postgresql.org; Sun, 26 Jan 2025 17:33:13 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tc6VU-007owc-46 for pgsql-general@lists.postgresql.org; Sun, 26 Jan 2025 17:33:12 +0000 Received: from fhigh-a5-smtp.messagingengine.com ([103.168.172.156]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tc6VQ-001d6G-1h for pgsql-general@postgresql.org; Sun, 26 Jan 2025 17:33:10 +0000 Received: from phl-compute-04.internal (phl-compute-04.phl.internal [10.202.2.44]) by mailfhigh.phl.internal (Postfix) with ESMTP id A321911400F8; Sun, 26 Jan 2025 12:33:07 -0500 (EST) Received: from phl-mailfrontend-02 ([10.202.2.163]) by phl-compute-04.internal (MEProxy); Sun, 26 Jan 2025 12:33:07 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aklaver.com; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to; s=fm2; t=1737912787; x=1737999187; bh=0pbhdEpvKxaQMyJaWW9YuLDXAAFwSKRquWZCaGhcKd4=; b= KLPHkM7kBEvzkJZxBKUkitgu7COTZbseHdF0KnVl1+ItwK/fSM/57msa/jjQitGN wB+glzV3D4k8NV9XkVfTncuheF4w4bqrhvLJejoKyblyGGZxDqbC3dLrfiQLYjDI kLZTZUdbSOVk6/jXGQJHJXakmvKTop1M+3cnmkjD2a3OXXcX5aII31Zr76+FiEc6 dEn9M/mFCxvD5ARE0dXdZmG+Owy9d4m6RzLV/u0b8ZeHS5satasEvKRhgrOpJxCO l+vvScrNug/oxMdLZWSSssownc/pfWgjH3w10NNNwpwt0XTPqLsmUED2aBD3UbkK kLRNr0FSVHALuv7MT//7lw== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:subject:subject:to:to:x-me-proxy :x-me-sender:x-me-sender:x-sasl-enc; s=fm3; t=1737912787; x= 1737999187; bh=0pbhdEpvKxaQMyJaWW9YuLDXAAFwSKRquWZCaGhcKd4=; b=M 6op80DQJODx9X3NjJgjNalzI1NUNyL7DZfMHuCOJ42OC0o+Ne6MvXWFC7aeg+67F awkfulB+ImteqtSAVnTCzJ2J5dEZwc1zT+19Hk9dXmEyd/YPBAQyL/A3nDx3dUST XuSetzWcYonezSg2slGmzdg0CQfwq57cGN1eIhHJzyUbRa/tC/UjG2MHu2m4ASyx /JL8cEmJkKvNyil4YVxGKAxINPbX4D2mwCYqlZaPhNmKTq1kNO6DpN9fnNK+vJee PiK4ZlCMCZxI0XZKigCnoIUavl5uND4G5TwOSqq74S+XE3zduRfnqmm/gnNYhKFf xVeOu8NxbnIkHxw0Jib+g== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefuddrudejgedguddtkeekucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdggtfgfnhhsuhgsshgtrhhisggv pdfurfetoffkrfgpnffqhgenuceurghilhhouhhtmecufedttdenucenucfjughrpefkff ggfgfuvfevfhfhjggtgfesthekredttddvjeenucfhrhhomheptegurhhirghnucfmlhgr vhgvrhcuoegrughrihgrnhdrkhhlrghvvghrsegrkhhlrghvvghrrdgtohhmqeenucggtf frrghtthgvrhhnpeefgeefieeutdfggfetgefgheekjeehteeileeigfetieekjedvieev iefgheevtdenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhroh hmpegrughrihgrnhdrkhhlrghvvghrsegrkhhlrghvvghrrdgtohhmpdhnsggprhgtphht thhopedvpdhmohguvgepshhmthhpohhuthdprhgtphhtthhopegushholhhikhesmhgrih hlrdhruhdprhgtphhtthhopehpghhsqhhlqdhgvghnvghrrghlsehpohhsthhgrhgvshhq lhdrohhrgh X-ME-Proxy: Feedback-ID: i76984098:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 26 Jan 2025 12:33:06 -0500 (EST) Message-ID: <3a8c2c7b-937f-4ebe-a0da-7f4845c843e9@aklaver.com> Date: Sun, 26 Jan 2025 09:33:06 -0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 892351284 To: =?UTF-8?B?0JTQvNC40YLRgNC40Lk=?= Cc: pgsql-general References: <1737817110.817816585@f378.i.mail.ru> <1737824596.412916010@f733.i.mail.ru> <108b4789-190e-4b1d-a49b-d15215074351@aklaver.com> <1737890956.919358915@f75.i.mail.ru> Content-Language: en-US From: Adrian Klaver In-Reply-To: <1737890956.919358915@f75.i.mail.ru> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 1/26/25 03:29, Дмитрий wrote: > "How was it shut down, on purpose or a hardware/software issue?" > - I reboot the receiver every 2 minutes on purpose. I determined this > time empirically, because replication breaks down approximately every > minute and a half. The reboot helps to advance the receiver. > > "Also do you have corresponding logs from primary?" > - Attached to this message. > > "Unless, is there cascading replication going on?" > - No, this is replication from the leader. The leader has its two > replicas and they are all in one data center. And the problematic > replica is needed to migrate to another data center. > > "Was that a manual intervention?" > - Yes, reboot on schedule, every two minutes. > > "Is that what is shown above or have you restarted since the above and > the server is running?" > - Sometimes replication works without problems for several hours. But > when a breakdown occurs, rebooting every two minutes helps to catch up > with this replica. 1) It would make life easier if the log line entry prefix timestamp was set to same precision on primary and standby. As of now it looks like the primary has %t (Time stamp without milliseconds) and the standby has %m (Time stamp with milliseconds) 2) From the logs. Primary: 2025-01-26 12:21:27 MSK [656]: [11-1] app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 STATEMENT: START_REPLICATION SLOT "slot_migration_to_rcod" 106B6/52000000 TIMELINE 61 2025-01-26 12:21:27 MSK [656]: [12-1] app=v-host-n1,user=replicator,db=[unknown],client=192.168.5.1 LOG: disconnection: session time: 0:01:05.329 user=replicator database= host=192.168.5.1 port=58380 Standby: 2025-01-26 12:21:27.113 MSK [10824] FATAL: could not send data to WAL stream: lost synchronization with server: got message type "0", length 825373235 Do you know what is doing START_REPLICATION SLOT? > > Another interesting point. In addition to this replication, there are > two more, to the same data center. One replication had the same problem, > but a one-time restart helped to solve the problem, the replication is > still working normally. And the second replication does not have such > problems, it has been working since its launch, more than a month ago. > > -- > -- Adrian Klaver adrian.klaver@aklaver.com