Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vkqLh-002rgd-0o for pgsql-hackers@arkaria.postgresql.org; Tue, 27 Jan 2026 21:11:45 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vkqLd-00G7DQ-2Y for pgsql-hackers@arkaria.postgresql.org; Tue, 27 Jan 2026 21:11:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vkqLd-00G7DI-1T for pgsql-hackers@lists.postgresql.org; Tue, 27 Jan 2026 21:11:41 +0000 Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vkqLb-00000000lkc-1U1F for pgsql-hackers@lists.postgresql.org; Tue, 27 Jan 2026 21:11:41 +0000 Received: by mail-ej1-x629.google.com with SMTP id a640c23a62f3a-b885e8c6700so805072866b.0 for ; Tue, 27 Jan 2026 13:11:39 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1769548299; cv=none; d=google.com; s=arc-20240605; b=DCU2+KnBF6Iwl8JxTA2g6DEgGkxJ+/kDWKQOADlpJ9YikL8NIr0C3Yai4wjCQY1Yh2 /lYHpqB3UMAng1spQQ4x/b2H5uwsAZkU4m/uVwrbMnilu5OjL8dEVzWyrJjAg52jfexC mpolw4tEwxJMRG20lM4O6192J5GIwFRWNhi36StQhOBwKNuS2ME8HN3jS0aflDD6+1i7 z+8Oym4u0+IjjdZrDDUu0rWOChwWo2TNlNT/iz4T8JQUD1p+sazvK1ZRUqiGsS8NOum1 Bu/vlio304pybIzBlrrSbec7mCpKhqMAqU0w1wBPZHBIQfMG+acpmaVOIM4iNlqdanTi HrVw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=28ndtS2GXjY31YGVWbk/x7702QdsemPIJGFl2dzPzMc=; fh=QfBnQ6gz5hlEVAnFWmBv/PSZVZUF6Co+Di04W0XyApA=; b=CoNfLRQ9Uc5ecD0esmoJp0czDTrRtTyJmijBVJl/Ah08kMa5OGAcIBiQpShzHNpxyH JDf61d/u29BcHE38ZTNC2GqfX43U49aVkPDiZWFgJkVuVnpU0XNSTT3CzWDZDUR82TAa JlWtFPFYNzlSVCR7ntFxSLSrMLWOXTFYEEMsXUcyTOAZg3wDFE87KX87VvjGgBzxjfRt hLMI2FmATZisAMcemeAOz+v3emtiHlsV9H1njJTVttR+DhsqO/f9BtNoVzWRxmI2JPtI J4Iz3UETxzCAsu7EXHtVCFaI+/ilSaGjK9LhQqxr2mu+QgXO+deDaCUTrTQsszdbRVkN yIiQ==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769548299; x=1770153099; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=28ndtS2GXjY31YGVWbk/x7702QdsemPIJGFl2dzPzMc=; b=N1tR8T2vZSoBHGvO3GqIDvcSfZCaNEGgymS+gE5R1WU9Hu9Mju1SEdYWw3UzIygFNg o4HZYlkBl4KFZnJyWRiFb/bWan3hHzILHC8LwMmMbKlif1ecm9auK5/jEF46sWZDeZmm h+wFgKORZGWnRDnLqHODYLO5uZtWgT1Hcj2aNN2j/hvf4nMAqIPiKeqdaquuPQTSOH4P bznw3Ra88tsRh6Xy9xeTlcj6Uqp80r/33/PohdTrx/UgtullWey84pMBfPaZhsHGBwRd AtiRkfEq2sqGQ2MwxPd1kZAmH4bkg1gxd4qgis1lX1R7r2oizmQPyIED+unvjbS5LlAH FqFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769548299; x=1770153099; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=28ndtS2GXjY31YGVWbk/x7702QdsemPIJGFl2dzPzMc=; b=kt0DbsjAAGFDmv7GtqF1NF/LrdhwpmR2qAaGDG19R3AFHW6/BqY4alI3Lno4tuIVEO ZEewOSaaYLIq8GmDeea07UBqWZWKzeCukoB7sPA8BoIP8UBD4NqPJxNFBvVDAv62WHMK NsNSXECFNDAFcQp9sg+fQerlm8wQMbzOFXIdp4aOHxkHWi1/Fz3O9K+wrG4UohIncL2S TQ7ckUtJncv7agCWNU6LPurYf0fZ+jc4JKYEi86f5L6RzaEydtYr234DiYiajrjdzJ1g x5mN2DBBXLtFwHT3BBCfAuxHhrir/9p11pmjnqQw0wZh9/bNGZ4ejOCUNddxdeiwwThZ DZCg== X-Forwarded-Encrypted: i=1; AJvYcCX6oNCl+mtz9wz36zj80R3beFlp8Ati34rHTgo1F0oLDMMmszhxodOKhg0TuGnt73d0q5Dh4rC8Vw2gCP1M@lists.postgresql.org X-Gm-Message-State: AOJu0Yyfm6cwCw/HwbJNCqDym8+6VifJJnGS76MzqfmB6EcNnGDPy2dS 093s2e5I79WzNlDBw+XxA0ejPHPnKuX9tMIyI8oizdcG0ookBAm+syeu9OOe+vq1LFX5XydA8DR QMnyJXP5yXWQyZHGaBPT3c/CrIqXEYVQ= X-Gm-Gg: AZuq6aIed7KEX3Ba91A1hCWn7WCA7dS6cxUOkLHHeDMene0Vl4Vx+A2EmbfCchJpw+b iMBiVVYurSgxt5DCvhyEx2O/ZuUuIP21S4OD3Bk9BatHBxNmpCCHlU2cbUXOtbhB/pJVbQQGtjF A3Ip2UknWO4wf99vKw/WlweojqNFrvxsC7LIgmAivJI1/3kxmNeEn68U6EIV/MlCPn2914zIDmS wEtzk/fDXja/EVTgea3gkz3BNurs9unb2O5hjJhfUgBy//VsWbR3S7s236VypmReGNJK6ANTXyQ ljRyc3Ldj4zDwTk8ILLqlno9OSs= X-Received: by 2002:a17:907:9446:b0:b87:206a:a23b with SMTP id a640c23a62f3a-b8dab420129mr253073866b.34.1769548298404; Tue, 27 Jan 2026 13:11:38 -0800 (PST) MIME-Version: 1.0 References: <731ADE6F-01C5-4996-BAEE-5851DFC3F502@gmail.com> In-Reply-To: From: Robert Haas Date: Tue, 27 Jan 2026 16:11:26 -0500 X-Gm-Features: AZwV_QgHBFslB_5lVMhkVrQpLEU2-pG1mYWfSwHJ4WVAdtwToH2s-R03j1HGQoY Message-ID: Subject: Re: pg_waldump: support decoding of WAL inside tarfile To: Amul Sul Cc: Chao Li , Jakub Wartak , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Jan 27, 2026 at 7:07=E2=80=AFAM Amul Sul wrote: > In the attached version, I am using the WAL segment name as the hash > key, which is much more straightforward. I have rewritten > read_archive_wal_page(), and it looks much cleaner than before. The > logic to discard irrelevant WAL files is still within > get_archive_wal_entry. I added an explanation for setting cur_wal to > NULL, which is now handled in the separate function I mentioned > previously. > > Kindly have a look at the attached version; let me know if you are > still not happy with the current approach for filtering/discarding > irrelevant WAL segments. It isn't much different from the previous > version, but I have tried to keep it in a separate routine for better > code readability, with comments to make it easier to understand. I > also added a comment for ArchivedWALFile. I feel like the division of labor between get_archive_wal_entry() and read_archive_wal_page() is odd. I noticed this in the last version, too, and it still seems to be the case. get_archive_wal_entry() first calls ArchivedWAL_lookup(). If that finds an entry, it just returns. If it doesn't, it loops until an entry for the requested file shows up and then returns it. Then control returns to read_archive_wal_page() which loops some more until we have all the data we need for the requested file. But it seems odd to me to have two separate loops here. I think that the first loop is going to call read_archive_file() until we find the beginning of the file that we care about and then the second one is going to call read_archive_file() some more until we have read enough of it to satisfy the request. It feels odd to me to do it that way, as if we told somebody to first wait until 9 o'clock and then wait another 30 minutes, instead of just telling them to wait until 9:30. I realize it's not quite the same thing, because apart from calling read_archive_file(), the two loops do different things, but I still think it looks odd. + /* + * Ignore if the timeline is different or the current segment is not + * the desired one. + */ + XLogFromFileName(entry->fname, &curSegTimeline, &curSegNo, WalSegSz); + if (privateInfo->timeline !=3D curSegTimeline || + privateInfo->startSegNo > curSegNo || + privateInfo->endSegNo < curSegNo || + segno > curSegNo) + { + free_archive_wal_entry(entry->fname, privateInfo); + continue; + } The comment doesn't match the code. If it did, the test would be (privateInfo->timeline !=3D curSegTimeline || segno !=3D curSegno). But instead the segno test is > rather than !=3D, and the checks against startSegNo and endSegNo aren't explained at all. I think I understand why the segno test uses > rather than !=3D, but it's the point of the comment to explain things like that, rather than leaving the reader to guess. And I don't know why we also need to test startSegNo and endSegNo. I also wonder what the point is of doing XLogFromFileName() on the fname provided by the caller and then again on entry->fname. Couldn't you just compare the strings? Again, the division of labor is really odd here. It's the job of astreamer_waldump_content() to skip things that aren't WAL files at all, but it's the job of get_archive_wal_entry() to skip things that are WAL files but not the one we want. I disagree with putting those checks in completely separate parts of the code. +static bool +member_is_wal_file(astreamer_waldump *mystreamer, astreamer_member *member= , + char **fname) +{ + int pathlen; + char *filename; + + /* We are only interested in normal files. */ + if (member->is_directory || member->is_link) + return false; + + pathlen =3D strlen(member->pathname); + if (pathlen < XLOG_FNAME_LEN) + return false; + + /* WAL files from the top-level or pg_wal directory will be decoded */ + if (pathlen > XLOG_FNAME_LEN && + strncmp(member->pathname, XLOGDIR, strlen(XLOGDIR)) !=3D 0) + return false; + + /* WAL file could be with full path */ + filename =3D member->pathname + (pathlen - XLOG_FNAME_LEN); + if (!IsXLogFileName(filename)) + return false; + + *fname =3D pnstrdup(filename, XLOG_FNAME_LEN); + + return true; +} I don't think this code is fully correct. Note that pg_verifybackup's member_verify_header() prepends "./" and then applies canonicalize_path(). I think the same considerations apply here. Consider: [rhaas pgdata]$ (cd pg_wal; tar -cf ../pg_wal1.tar 00*) [rhaas pgdata]$ (cd pg_wal; tar -cf ../pg_wal2.tar .) [rhaas pgdata]$ tar tf pg_wal1.tar 00000001000000000000002C 00000001000000000000002D 00000001000000000000002E 00000001000000000000002F 000000010000000000000030 000000010000000000000031 000000010000000000000032 000000010000000000000033 [rhaas pgdata]$ tar tf pg_wal2.tar ./ ./00000001000000000000002C ./00000001000000000000002D ./00000001000000000000002E ./archive_status/ ./000000010000000000000031 ./000000010000000000000030 ./summaries/ ./00000001000000000000002F ./000000010000000000000032 ./000000010000000000000033 -- Robert Haas EDB: http://www.enterprisedb.com