Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vki72-001CKY-14 for pgsql-hackers@arkaria.postgresql.org; Tue, 27 Jan 2026 12:24:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vki71-00DjrD-1Y for pgsql-hackers@arkaria.postgresql.org; Tue, 27 Jan 2026 12:24:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vki71-00Djr4-0R for pgsql-hackers@lists.postgresql.org; Tue, 27 Jan 2026 12:24:03 +0000 Received: from mail-pj1-x1032.google.com ([2607:f8b0:4864:20::1032]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vki6z-00000000hf0-050p for pgsql-hackers@lists.postgresql.org; Tue, 27 Jan 2026 12:24:03 +0000 Received: by mail-pj1-x1032.google.com with SMTP id 98e67ed59e1d1-352e2c59264so3832709a91.0 for ; Tue, 27 Jan 2026 04:24:00 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1769516639; cv=none; d=google.com; s=arc-20240605; b=N/5JvAt/15j6Bk+DEc+aMjK+6fLb4KbTd7+RuiIQtxNhedQwjolbgRkndtta7VQOM5 UmMm3Tqo0mVNvrc46uR3u0TfnC9ozHBBkOPcvcUUixLmAtBriHj7IRNOXTojg36MglDZ +zN36ABDq+ejnbMJKAib1ZdlbQ2+zQMSGRByRIgVzLE/MLCAnpb60PgUyfW7Da34dWDd EQMNhuuO6/r2E52ysEUOnDVBF9Vy+7gql5cT7t8DB2NKc0udgezflOy4lCqHdItwqAN/ GWimiFIRhYKXNXOtYlZ9Vog2kmO3nmY0hFEos+RLh0ueVvSju1GK9+vSg85wCkW0wo6I 7Urg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=4i5qmK6HnzJoeq8K4KjFMH8fkJLO+YETHWDRAKL6iM8=; fh=XoNkUZfu8YXp+asCbYdYbyqxWbTGnQz/ctBydlFLIL4=; b=kj1ZRrfs2He4Jvm3BQHqwUbfgyu2fuK8doGYA9q5KZMb7FGyF2S0JswedlQMNgesYJ E9JopW5KsBqzw4vXkiAkJ38uz37eDChfvidSmd5oj6k4ifDi88izwMVumqVDApScqJE7 wy7hTgauKrEO6WOBIoHvRK3XJmPNK5zWxmTqfimdwVKPAiKu9RtrbbVP+Zkf0fAC7ywf +0vd4hiH53H95sU/aZy+5gny5Yt6PMON9e2or07rnN/cA9HOgO/oOKpE+j1O7E0HE7ZW 1vnyPS95/FEAADvt2FaKlVM2UNZ/8Hdu5VMeRq+m4GyGKmj8hdiosFplnyD9/xtMLu0a 4K+g==; darn=lists.postgresql.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769516639; x=1770121439; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=4i5qmK6HnzJoeq8K4KjFMH8fkJLO+YETHWDRAKL6iM8=; b=ejwXkTLMUmMBWMY8qn0JsGUV+A4/13YQ1APFibogf+aO57WyU9asFnk07NSzFJoNa4 9E/vlKiqLQI4822clJRawEnszebnNZc6sVkDGEqT2x7J7toWUX8wc+ouTHWn6ZECsL/Z R32UfLvuug3nH+YYpGkpKGHkIUNEV7rmdpnYkspfDOMJYK8nZ4+YsOodBhZA3WGTuUk4 dTSNxgx7xgS+t1AkeYGSjDazISxS8Z1qnrb85kd0FX/3a7y/2JPcHVjIlY0gBbx4ri2o VUwT6k9o6fh3Rk8sXi91Aqs9t3NG0Fht6r+TycL1bnnnHr1sZxFgCvXsi9Ceb2VXKTm6 +olQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769516639; x=1770121439; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=4i5qmK6HnzJoeq8K4KjFMH8fkJLO+YETHWDRAKL6iM8=; b=MM/ZFdASPwgHuvMQaBzQRQ1OjMMIp/iIR2MudIMOXg03MsuqNe1SjLszCvUxXrWXxt gr5MCbCMcVqt4+rF+pXb2oG4Rat4/HQ7H/PIMk4/MpP3sTaqLs5aiFqPzwwMMDAe0YIb ZZ/bUxxGJ6pXtk0StvKT8zN/ILYQjJQii+q0QBdKE2OiYC49c3UJmGpBxLBvvQ7YQYdv EN2JCPcANfnK1rYHP1fUHcZGJ6Ku5eYWdTX/LyqO42ZBvHjqAFIUlyyJ6LVz4Z63vG1c zmP42IUQ0sOpkDLv7btnK3sPK6ETNY7EV20F6V0OWUmFO0BiOYq+HzOrXhWWC0BK6onR o4xA== X-Forwarded-Encrypted: i=1; AJvYcCWrh5jweG9UwzWRL9RJLVKlKy+Y6itf/lP0MrTggq05zOX31jg76bbVCDQVzrMeIvpgjELVBM1GWYuNHgyw@lists.postgresql.org X-Gm-Message-State: AOJu0YyxlsIpS4tMOj7YI4KiwMgRntVC4EG4MtDIgxvVkKbZYo2nQyKF WowTorky6Rg05Gdzhh4vBxCz0xqonZHTI2L8/L+LxXOTo3jBP03UgrZnb4BUAX5CQK8hmPyXTm9 WFye54OwGeTNJFXrCqe0xyzdFFrPpd79CLQ== X-Gm-Gg: AZuq6aJuUh5ufbArpiq7MDMHXsxCPHkN2ARf/UfFTar8Kp7y0NDwBlAwmC4bZCnOGvm EtZmek3ka7PSkxUT4Ao4ohqowPlsuVV/A93edo4WywZSV+SQkr3QNL87O08+6kYqsX/6LUTua/p q+E09jb3fdlwoN1n/D70p0A8ijaow4iaUmB3sbETjPslV6NFUR8Invh5iQ4JltNH67PA3LZ6Pjx yRpEgg7ZtahWozIPE2mzAS6jkJQCylHoKEpOHCNsc6F4xw3n32msNh2MtuIpwIHmpmvlqjGgw== X-Received: by 2002:a17:90b:3a87:b0:353:3934:1449 with SMTP id 98e67ed59e1d1-353feceebbdmr1750056a91.12.1769516638461; Tue, 27 Jan 2026 04:23:58 -0800 (PST) MIME-Version: 1.0 References: <731ADE6F-01C5-4996-BAEE-5851DFC3F502@gmail.com> In-Reply-To: From: Amul Sul Date: Tue, 27 Jan 2026 17:53:19 +0530 X-Gm-Features: AZwV_QhLrCk4xGFBTTlgelHvZimyHR4k0YGNtNp7OWM8WxAVmIL4ggIXfDlqBGA Message-ID: Subject: Re: pg_waldump: support decoding of WAL inside tarfile To: Robert Haas Cc: Chao Li , Jakub Wartak , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Sat, Jan 17, 2026 at 2:08=E2=80=AFAM Robert Haas = wrote: > > On Fri, Jan 2, 2026 at 7:30=E2=80=AFAM Amul Sul wrote= : > > Attached is the rebased version, includes some code comment improvement= s. > Here are my replies to the remaining review comments. An updated version of the patch was posted in the previous email [1]. > Regarding 0001, I don't think that passing the WAL segment size around > in a global variable is a good practice. It makes it hard to tell > whether the variable is guaranteed to have been set properly at all > the places where it is used. Of course, this patch set does not > introduce that problem, but I think it could avoid making it worse by > adding WalSegSz to XLogDumpPrivate instead of exporting the global > variable everywhere. That struct seems to be available everyplace > where we need the value. Actually, though, I'm inclined to remove the > file-level global as well. Patch for that attached. I'm not against > global variables in general, but I'm against global variables that are > just there to avoid passing the right stuff around via function > argument lists. > Yes, it looks good. Added to the v11 patch set. > 0002 claims to be just a refactoring but if > (unlikely(private->endptr_reached)) return -1 is net new code, so I > object. I realize this contradicts the review comment from Chao Li, > but the principal that refactoring shouldn't change the logic is more > important than whatever value we might get out of adding this sanity > check here -- and I bet if we look into it we'll find that there are > lots of other places in PostgreSQL that would also need to be changed > if we wanted such a sanity check everywhere that we have code like > this. I'd also like to point out that the commit message isn't really > adequate to understand why this helps achieve the overall goal of the > patch set. > Understood, removed it. > 0003 looks functionally correct but notice that you've stolen some but > not all of the tests from the "# various ways of specifying WAL range" > section without copying the comment or explaining in the commit > message why you've moved some tests but not others. > The excluded tests provide WAL files as input to pg_waldump, which is incompatible with tar archive input. I have updated the commit message to reflect this. > Regarding 0004: > > Instead of making ArchivedWAL_HTAB a global variable, can we just > store a pointer to it in XLogDumpPrivate? (I would change the name to > something else, something all lower case.) Or maybe the pointer should > go someplace else, but I'm skeptical of a global variable here for the > same reasons as 0001. > Done. > I think ArchivedWALFile would be a better name than ArchivedWALEntry, > and cur_wal_file or cur_file would be a better name than cur_wal. > Done. > I'm extremely happy with the way that the hash table looks and the new > logic in init_archive_reader() to make sure that we don't need to > reread the beginning of the WAL but can reuse the data already read. > That seems like a huge improvement to me. > > + if (recptr >=3D endPtr) > + { > + /* Discard the buffered data */ > + resetStringInfo(&entry->buf); > + len =3D 0; > + > + /* > + * Push back the partial page data for the > current page to the > + * buffer, ensuring it remains full page > available for re-reading > + * if requested. > + */ > + if (p > readBuff) > + { > + Assert((count - nbytes) > 0); > + appendBinaryStringInfo(&entry->buf, > readBuff, count - nbytes); > + } > + } > > It feels pretty inelegant to me to pull data back into &entry->buf > from readBuff here. I think what you should do is just do this test > once, before entering the while (nbytes > 0) loop. Then you'll always > have p =3D=3D readBuff, so the nested if statement is no longer required. > Also, the way you have it, if this triggers for the first loop > iteration, it will trigger for every subsquent loop iteration, which > seems wasteful. > I'm not sure I follow the logic. How would this trigger a copy from readBuff back into &entry->buf on every iteration? I believe that this is a rare case: it should only happen when we reach the end of the buffer with less than the requested 8kB available. In that scenario, we copy the remaining bytes, reset the buffer, and then -- since (p > readBuff) -- perform the pull-back before invoking the archive stream for the next segment. Am I missing something in the loop condition? I replied to the rest of the review comments previously, but I=E2=80=99m no= t sure whether replying in multiple emails is the correct approach. Perhaps, since the patch is taking a bit longer, I wanted to keep the conversation moving. Regards, Amul 1] http://postgr.es/m/CAAJ_b97mWgAkj7CqdRsof4N_J-gxqjKh79b0Aw3JRo0PoZZCJg@m= ail.gmail.com