Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVmsz-002Bu1-3D for pgsql-hackers@arkaria.postgresql.org; Sat, 06 Jun 2026 09:00:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVmsx-00FjoK-2Q for pgsql-hackers@arkaria.postgresql.org; Sat, 06 Jun 2026 09:00:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVmsx-00FjoC-0w for pgsql-hackers@lists.postgresql.org; Sat, 06 Jun 2026 09:00:07 +0000 Received: from mail-wm1-x32b.google.com ([2a00:1450:4864:20::32b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wVmsv-00000001N7W-1FOR for pgsql-hackers@postgresql.org; Sat, 06 Jun 2026 09:00:06 +0000 Received: by mail-wm1-x32b.google.com with SMTP id 5b1f17b1804b1-490b211ee6aso19778075e9.3 for ; Sat, 06 Jun 2026 02:00:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1780736402; x=1781341202; darn=postgresql.org; h=content-transfer-encoding:to:subject:from:content-language :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=LLg/JiKjrKg6azckVDqp/loWU0jf+NYOeLy2dnPx4ig=; b=DX1LQ2saDi481ck6FxQDL0nnWLWAkvVT3RlXL30F8L3GphnZ0eGUcphxb8Q3twsclt shO1aP7BKLXtmacJ66bj7qEOClXwmwltOXm85giWPUhtlnaDwVL1JE9CAovTW5rl55Or swPXN8V/BgQ7jzFNTN8vDyyl6jKmYLo9eqOwjhet89R6+1e3+G4kJr3fUf1VlbSSFmZl cV6fdZjrS1EZnUsRTfIyRPiBATPm0EXRhBLzFpLdxxjw6S99IwqZ/WpqE0SWVaazciRY ImiHON8k7YorvgegiWtAWWxbge+zW65p1bvivDjhUP8QHYxdUI+cTEWaWYu8Fa2EIhlb g0cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780736402; x=1781341202; h=content-transfer-encoding:to:subject:from:content-language :user-agent:mime-version:date:message-id:x-gm-gg:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=LLg/JiKjrKg6azckVDqp/loWU0jf+NYOeLy2dnPx4ig=; b=qURdIJgHsjOvP1KBlirkGul2+TpxIpIol2jMFif1JYKHPYwDYGdm94q1wsqq2tawe8 lXtYKCl+xbwhEko1mqdupwo/JZ2EJxy3veHdyhzAstBF2MVGU0VFeMLW0i82p9C5LrGS 6IZUzhB7g3E5o2ARxU4Dw5LcJQS4x70V+aY7ILQNwCRPcEqNJvEeWJSnB8BdXy6F9z+Y VWlMtR9Xh62r00rGZdlqdzeP7q6GBaydV25BmfaKfJji0OVjr5P9Z40qJ/UP6WgLhH/g j+AxkednhVj07OheJNqbhQ4SvEpA9wv+KozaS+hz036qJnAllBquzRKYP8aela+m16r1 H9eg== X-Gm-Message-State: AOJu0YxPKzFupkB9feReRsClI/o9hbwPt0eAzla0tRFWYK5Ol9HkcFPc 6hYR41QCoWtldPg1qnDWO9UD0LAKPxazK609LKOkMGk6FhOUsTzHGVS16pqNvA== X-Gm-Gg: Acq92OFgTnB3DjohvbEeAhcEvXQ7SYqYKO6GZWntXijw9SmqoDYzDhu71TCTiGgrEBU HUUMDbyzvKf3vnjwDPheuYp0HyhagZNGRTS5T7s/K44QkOzH0FRXadgo9an+wwxmXBo5cED3vlx MuskcuXt8UWzSArEAAgbx3FzT+bhc9BPRLBqAgHRHqr670tJOLTyuvk5ftkRQeXsRg9tD8pFXdg FB9bsjyZuGV55uY4Y90oEdRV7NY/2a2JTXJY1q9WY7blthY5afEPYfayvsuTbZPPXs7iAaiTYJD xhav/sBFOqHj7k5/a0/Qtuslw0txTsiEbkxvxmbliTohlWELH2XgI7tFjnq8mnUwUnNUMD1JkxL YAx9x+r5rgqTlmQZ+g5KKf2WkB9SAS+OAY7V1tgQhLzZZRuCUoQFVWCekhZezLVB6dQbP2crDrg ycqCRZCHZKS0RuGJejGvQFVf9gXs0moqo91/I= X-Received: by 2002:a05:600c:620c:b0:490:b28d:a6f9 with SMTP id 5b1f17b1804b1-490c256f1f1mr116780325e9.8.1780736402037; Sat, 06 Jun 2026 02:00:02 -0700 (PDT) Received: from [192.168.0.50] ([89.149.68.143]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-490bc39eb04sm220604285e9.6.2026.06.06.02.00.01 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 06 Jun 2026 02:00:01 -0700 (PDT) Message-ID: <7daef094-abf3-4672-bc23-3df4763b16a3@gmail.com> Date: Sat, 6 Jun 2026 12:00:00 +0300 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US From: Alexander Lakhin Subject: t/035_standby_logical_decoding.pl might fail on attempt to read wrong timeline To: pgsql-hackers Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Hello hackers, I'd like to present a recent failure of 035_standby_logical_decoding on buildfarm [1]: ### Promoting node "standby" # Running: pg_ctl --pgdata /home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/data/t_035_standby_logical_decoding_standby_data/pgdata --log /home/bf/bf-build/scorpion/HEAD/pgsql.build/testrun/recovery/035_standby_logical_decoding/log/035_standby_logical_decoding_standby.log promote waiting for server to promote.... done server promoted Waiting for replication conn cascading_standby's replay_lsn to pass 0/0648A290 on standby done [10:40:16.996](0.446s) ok 84 - got expected output from SQL decoding session on promoted standby [10:40:16.997](0.001s) # pump_until: process terminated unexpectedly when searching for "(?^s:^.*COMMIT.*COMMIT$)" with stream: "" [10:40:16.997](0.000s) not ok 85 - got 2 COMMIT from pg_recvlogical output [10:40:16.997](0.000s) #   Failed test 'got 2 COMMIT from pg_recvlogical output' #   at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1036. [10:40:16.997](0.000s) not ok 86 - got same expected output from pg_recvlogical decoding session [10:40:16.997](0.000s) #   Failed test 'got same expected output from pg_recvlogical decoding session' #   at /home/bf/bf-build/scorpion/HEAD/pgsql/src/test/recovery/t/035_standby_logical_decoding.pl line 1040. [10:40:16.998](0.000s) #          got: '' #     expected: 'BEGIN # table public.decoding_test: INSERT: x[integer]:1 y[text]:'1' # table public.decoding_test: INSERT: x[integer]:2 y[text]:'2' # table public.decoding_test: INSERT: x[integer]:3 y[text]:'3' # table public.decoding_test: INSERT: x[integer]:4 y[text]:'4' # COMMIT # BEGIN # table public.decoding_test: INSERT: x[integer]:5 y[text]:'5' # table public.decoding_test: INSERT: x[integer]:6 y[text]:'6' # table public.decoding_test: INSERT: x[integer]:7 y[text]:'7' # COMMIT' [10:40:17.009](0.011s) ok 87 - got expected output from SQL decoding session on cascading standby [10:40:17.009](0.000s) ok 88 - got 2 COMMIT from pg_recvlogical output 035_standby_logical_decoding_standby.log contains: 2026-06-03 10:40:16.745 CEST [3955206][startup][29/0:0] LOG: received promote request ... 2026-06-03 10:40:16.784 CEST [3955206][startup][29/0:0] LOG: archive recovery complete 2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: starting logical decoding for slot "promotion_activeslot" 2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] DETAIL: Streaming transactions committing after 0/06487A20, reading WAL from 0/064879E8. 2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1') 2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] ERROR: requested WAL segment 000000010000000000000006 has already been removed 2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1') 2026-06-03 10:40:16.862 CEST [3963805][walsender][25/0:0] LOG: released logical replication slot "promotion_activeslot" 2026-06-03 10:40:16.862 CEST [3963805][walsender][:0] LOG: disconnection: session time: 0:00:00.212 user=bf database=testdb host=[local] ... That is, walsender requested WAL segment for timeline 1, while in a successful run, it reads WAL for timeline 2. I've managed to reproduce this failure with: --- a/src/backend/replication/walsender.c +++ b/src/backend/replication/walsender.c @@ -1502,2 +1502,3 @@ StartLogicalReplication(StartReplicationCmd *cmd)      ReplicationSlotAcquire(cmd->slotname, true, true); +pg_usleep(200000); Being executed in a loop, the test fails for me within 100 iterations (within 10 on a slower VM): ITERATION 23 # +++ tap check in src/test/recovery +++ t/035_standby_logical_decoding.pl .. 84/? # pump_until: process terminated unexpectedly when searching for "(?^s:^.*COMMIT.*COMMIT$)" with stream: "" #   Failed test 'got 2 COMMIT from pg_recvlogical output' #   at t/035_standby_logical_decoding.pl line 1038. #   Failed test 'got same expected output from pg_recvlogical decoding session' #   at t/035_standby_logical_decoding.pl line 1042. #          got: '' #     expected: 'BEGIN # table public.decoding_test: INSERT: x[integer]:1 y[text]:'1' ... As far as I can see, the timeline is chosen in logical_read_xlog_page() depending on the recovery state:         am_cascading_walsender = RecoveryInProgress();         if (am_cascading_walsender)                 GetXLogReplayRecPtr(&currTLI);         else                 currTLI = GetWALInsertionTimeLine(); With diagnostic logging added here, a failed run shows: 2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl LOG:  starting logical decoding for slot "promotion_activeslot" 2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl DETAIL:  Streaming transactions committing after 0/0647C7E8, reading WAL from 0/0647C7B0. 2026-06-06 07:23:01.558 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1') 2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG:  !!!logical_read_xlog_page| am_cascading_walsender: 1, currTLI: 1 2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1') 2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl LOG:  !!!WalSndSegmentOpen| nextSegNo: 6, path: pg_wal/000000010000000000000006 2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1') 2026-06-06 07:23:01.559 UTC walsender[165403] 035_standby_logical_decoding.pl ERROR:  requested WAL segment 000000010000000000000006 has already been removed In a successful run, am_cascading_walsender/RecoveryInProgress() == false and currTLI == 2: 2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG:  starting logical decoding for slot "promotion_activeslot" 2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl DETAIL:  Streaming transactions committing after 0/0647C7E8, reading WAL from 0/0647C7B0. 2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1') 2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG:  !!!logical_read_xlog_page| am_cascading_walsender: 0, currTLI: 2 2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl STATEMENT:  START_REPLICATION SLOT "promotion_activeslot" LOGICAL 0/00000000 ("include-xids" '0', "skip-empty-xacts" '1') 2026-06-06 07:28:38.498 UTC walsender[169751] 035_standby_logical_decoding.pl LOG:  !!!WalSndSegmentOpen| nextSegNo: 6, path: pg_wal/000000020000000000000006 Reproduced at the commit fcd77d532, which added the test. [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-06-03%2008%3A31%3A43 Best regards, Alexander