Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1shbtq-00521v-2U for pgsql-admin@arkaria.postgresql.org; Fri, 23 Aug 2024 21:32:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1shbtm-00HTek-W2 for pgsql-admin@arkaria.postgresql.org; Fri, 23 Aug 2024 21:32:47 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1shbqW-00HOXT-CT for pgsql-admin@lists.postgresql.org; Fri, 23 Aug 2024 21:29:25 +0000 Received: from mail-lj1-x22c.google.com ([2a00:1450:4864:20::22c]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1shbqT-0017YQ-Vr for pgsql-admin@lists.postgresql.org; Fri, 23 Aug 2024 21:29:23 +0000 Received: by mail-lj1-x22c.google.com with SMTP id 38308e7fff4ca-2f40a1a2c1aso18537861fa.3 for ; Fri, 23 Aug 2024 14:29:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=seawardmoon.com; s=google; t=1724448560; x=1725053360; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=nBAJWPKjv6UCaro8B2NUyuwu5TtzOxU7465OZV6CRdc=; b=aTKCm+eaFVdaEPBTTQm9Rd5YqmDGq7kaeUpsYKJ7zE416UTeXYbn8YHgmCr2KWM93o QEvq3zP/3bkEDIoEto2Q87RwxkPr9Dl117cq+dk0pDfTq0iKevxbYRERtLohQH+klZac vtO4cXiqUhULVNfKJaYqw4L/9s+g7SFq/4g8WjeD+vw8F5+4qoLNa1W7xmLbjrhptN4B pY5FJEOoMO1/l9orlN3r+CTKlhNdbNEQ5IVVjat3phaGlS0kbjDgA368msGi77u3yH+0 k599cdyjPmALSxoUhLmgD0TRFCxBwH6JavfBmAbEbd13VRXjRHZVbjAK2t59iRzDMi9c EYLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724448560; x=1725053360; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nBAJWPKjv6UCaro8B2NUyuwu5TtzOxU7465OZV6CRdc=; b=guWXhElMp/QVuGZEbZ1mI/SLK3f5QkgpRyF0T8obZ2ESASeyMF2qNvf7TDXPjE50g5 2r0EM5Y/3rDRoDD3nkNXTlp5wWt7lpVDFviNhqGaWZ9i9TIr17WSZJCW+teytmqjloU1 BuWIxdN/uY98AVkrDy7za1r8hwmhhVuVHdyknDXKfvXf9GaJpbaR+s7yovT09UXP7Nbq FGHMk19mfILeuZmUc+142lzN1cQkILXm2LuzSuw9URWJ2ScAyzvi9zAHIQLJ5xRX6eoI tJQmcIHzHLbVCwHE11JbvwrNSpgFKGhK/IAYpqWSsBP2CN9ZenoIDsDfQIlmhHwjy3GT hWww== X-Gm-Message-State: AOJu0YyfVDYI8NxSApG2gz34Lx7cLO9VuzyuNs8wmQNr1Jdjd+k6YsOz V0JImy9RSLxgBy1pl/6Kudvrl9wGCh6rBdWd8MtgRaLrHST7TPW1kcSTIOD88tnx+q3A76MU/gB uRH1UK6HNvPRoVfu0+fuF9CNGeOvznBm1KYkIbIN3hnJhUCNWNpM= X-Google-Smtp-Source: AGHT+IHw4SMex03a1bDqaxEbEnNsfNzA2yniQSt3iseDFazJflilSMu1/5ASbdaByY+1bdjQgXrnvLFXOgtwqv+QmHY= X-Received: by 2002:a2e:b16d:0:b0:2ee:8555:4742 with SMTP id 38308e7fff4ca-2f4f577b884mr15976141fa.27.1724448558645; Fri, 23 Aug 2024 14:29:18 -0700 (PDT) MIME-Version: 1.0 From: Dennis White Date: Fri, 23 Aug 2024 17:29:07 -0400 Message-ID: Subject: Logical replication stopped suddenly claiming wal_status lost when max_slot_wal_keep_size was unlimited To: Pgsql-admin Content-Type: multipart/alternative; boundary="000000000000c5a684062060772f" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c5a684062060772f Content-Type: text/plain; charset="UTF-8" After running continuously for perhaps a year or more, my project's logical replication stopped on our test DB this morning claiming wal was lost due to size limits when there aren't any limits. The system is running Centos7 and I was planning on moving to Rhel8 and 14.12 today, but so much for that. Is this a bug that was fixed in a later release of 14? Is there some other setting that must be set to get the wal retained? Here are the details: Version: PostgreSQL 14.7 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit Log entries: (log entries that followed the last listed just continued to say the slot was invalid) 2024-08-23 03:07:45.926 UTC [1121] LOG: starting logical decoding for slot "track_subscription" 2024-08-23 03:07:45.926 UTC [1121] DETAIL: Streaming transactions committing after AB17/4A0C9F40, reading WAL from AB17/46D98068. 2024-08-23 03:07:45.926 UTC [1121] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL AB17/554088B0 (proto_version '2', publication_names '"track_ingestion"') 2024-08-23 03:07:45.926 UTC [1121] LOG: logical decoding found consistent point at AB17/46D98068 2024-08-23 03:07:45.926 UTC [1121] DETAIL: There are no running transactions. 2024-08-23 03:07:45.926 UTC [1121] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL AB17/554088B0 (proto_version '2', publication_names '"track_ingestion"') 2024-08-23 03:08:17.161 UTC [48799] LOG: terminating process 1121 to release replication slot "track_subscription" 2024-08-23 03:08:17.161 UTC [1121] FATAL: terminating connection due to administrator command 2024-08-23 03:08:17.161 UTC [1121] CONTEXT: slot "track_subscription", output plugin "pgoutput", in the change callback, associated LSN AB17/663138F0 2024-08-23 03:08:17.161 UTC [1121] STATEMENT: START_REPLICATION SLOT "track_subscription" LOGICAL AB17/554088B0 (proto_version '2', publication_names '"track_ingestion"') 2024-08-23 03:08:17.190 UTC [1121] LOG: disconnection: session time: 0:00:33.502 user=sysrep database=trackdb host=postgresqldb03.s2a.nrl.navy.mil.31.250.132.in-addr.arpa port=36840 2024-08-23 03:08:17.195 UTC [48799] LOG: invalidating slot "track_subscription" because its restart_lsn AB17/4D0E3320 exceeds max_slot_wal_keep_size trackdb=# select * from pg_replication_slots; slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase --------------------+----------+-----------+--------+------- ---+-----------+--------+------------+------+--------------+ -------------+---------------------+------------+---------------+----------- track_subscription | pgoutput | logical | 16386 | trackdb | f | f | | | 130568429 | | AB17/554088B0 | lost | | f (1 row) show max_slot_wal_keep_size; max_slot_wal_keep_size ------------------------ -1 (1 row) Thanks, Dennis --000000000000c5a684062060772f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

A= fter running continuously for perhaps a year or more, my project's logi= cal replication stopped on our test DB this morning claiming wal was lost d= ue to size limits when there aren't any limits.

The system is running Centos7 and I was planning on moving to Rhel8 = and 14.12 today, but so much for that.


Is this a bug that was fixed in a later release of= 14?

Is there some other setting that must be set= to get the wal retained?


Here are the details:

Version:

PostgreSQL 14.7 on x86_64-pc-linux-gnu, compile= d by gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44), 64-bit


Log entries: (log entries th= at followed the last listed just continued to say the slot was invalid)
=

2024-08-23 03:07:45.926 UTC [1121] LOG:=C2=A0 st= arting logical decoding for slot "track_subscription"

2024-08-23 03:07:45.926 UTC [1121] DETAIL:=C2=A0 Streaming transactions committing=20 after AB17/4A0C9F40, reading WAL from AB17/46D98068.

2024-08-23 03:07:45.926 UTC [1121] STATEMENT:=C2=A0 START_REPLICATION SLOT=20 "track_subscription" LOGICAL AB17/554088B0 (proto_version '2&= #39;,=20 publication_names '"track_ingestion"')

<= p class=3D"MsoNormal">2024-08-23 03:07:45.926 UTC [1121] LOG:=C2=A0 logical= decoding found consistent point at AB17/46D98068

2024-08-23 03:07:45.926 UTC [1121] DETAIL:=C2=A0 There are n= o running transactions.

2024-08-23 03:07:45.926 UTC [1121] STATEMENT:=C2=A0 START_REPLICATION SLOT=20 "track_subscription" LOGICAL AB17/554088B0 (proto_version '2&= #39;,=20 publication_names '"track_ingestion"')

<= p class=3D"MsoNormal">2024-08-23 03:08:17.161 UTC [48799] LOG:=C2=A0 termin= ating process 1121 to release replication slot "track_subscription&quo= t;

2024-08-23 03:08:17.161 UTC [112= 1] FATAL:=C2=A0 terminating connection due to administrator command<= u>

2024-08-23 03:08:17.161 UTC [1121] CONTEXT:=C2=A0 slot "track_subscription"= , output=20 plugin "pgoutput", in the change callback, associated LSN AB17/66= 3138F0

2024-08-23 03:08:17.161 UTC [1121] STATEMENT:=C2=A0 START_REPLICATION SLOT=20 "track_subscription" LOGICAL AB17/554088B0 (proto_version '2&= #39;,=20 publication_names '"track_ingestion"')

<= p class=3D"MsoNormal">2024-08-23 03:08:17.190 UTC [1121] LOG:=C2=A0 disconnection: session time: 0:00:33.50= 2=20 user=3Dsysrep database=3Dtrackdb host=3Dpostgresqldb03.s2a.nrl.navy.mil.31.= 250.132.in-addr.arpa port=3D36840

2= 024-08-23 03:08:17.195 UTC [48799] LOG:=C2=A0 invalidating slot "track_subscrip= tion"=20 because its restart_lsn AB17/4D0E3320 exceeds max_slot_wal_keep_size=


trackdb=3D# select * from pg_replicatio= n_slots;

=C2=A0=C2=A0=C2=A0=C2=A0 slot_name=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0 plugin=C2=A0 | slot_type |= datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn |=20 confirmed_flush_lsn | wal_status | safe_wal_size | two_phase

-----------------= ---+----------+-----------+---= -----+----------+-----------+-= -------+------------+------+--= ------------+-------------+---= ------------------+-----------= -+---------------+-----------<= /span>

track_subscription | pgoutput | logical=C2=A0=C2=A0 |=C2=A0 16386 | trackd= b=C2=A0 |=20 f=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | f=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0 |=C2=A0=C2=A0 =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0|=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0 130568429 |=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 |=20 AB17/554088B0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | lost=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0 |=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | f

(1 row)=

=C2=A0

show max_slot_wal_keep_size;

max_slot_wal_keep_size

= ------------------------

-1=

(1 row)


Thanks,


Dennis

--000000000000c5a684062060772f--