public inbox for [email protected]  
help / color / mirror / Atom feed
gistGetFakeLSN() can return incorrect LSNs
3+ messages / 2 participants
[nested] [flat]

* gistGetFakeLSN() can return incorrect LSNs
@ 2026-03-05 17:10 Andres Freund <[email protected]>
  2026-03-05 19:27 ` Re: gistGetFakeLSN() can return incorrect LSNs Andres Freund <[email protected]>
  2026-03-13 11:33 ` Re: gistGetFakeLSN() can return incorrect LSNs Fujii Masao <[email protected]>
  0 siblings, 2 replies; 3+ messages in thread

From: Andres Freund @ 2026-03-05 17:10 UTC (permalink / raw)
  To: pgsql-hackers; Noah Misch <[email protected]>; +Cc: Tomas Vondra <[email protected]>; Peter Geoghegan <[email protected]>

Hi,

Tomas encountered a crash with the index prefetching patchset. One of the
patches included therein is a generalization of the gistGetFakeLSN()
mechanism, which is then used by other indexes as well.  That triggered an
occasional, hard to locally reproduce, ERROR or PANIC in CI, about

  ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only to 0/01BD2000


A bunch of debugging later, it turns out that this is a pre-existing issue.

XLogRecPtr
gistGetFakeLSN(Relation rel)
{
...
    else if (RelationIsPermanent(rel))
    {
        /*
         * WAL-logging on this relation will start after commit, so its LSNs
         * must be distinct numbers smaller than the LSN at the next commit.
         * Emit a dummy WAL record if insert-LSN hasn't advanced after the
         * last call.
         */
        static XLogRecPtr lastlsn = InvalidXLogRecPtr;
        XLogRecPtr  currlsn = GetXLogInsertRecPtr();

        /* Shouldn't be called for WAL-logging relations */
        Assert(!RelationNeedsWAL(rel));

        /* No need for an actual record if we already have a distinct LSN */
        if (XLogRecPtrIsValid(lastlsn) && lastlsn == currlsn)
            currlsn = gistXLogAssignLSN();
        lastlsn = currlsn;
        XLogFlush(currlsn);

        return currlsn;
    }


The problem is that GetXLogInsertRecPtr() returns the start of the next
record. That's *most* of the time the same as what XLogInsert() would return
(i.e. one byte past the end of the last record), but not reliably so: If the
last record ends directly at a page boundary, XLogInsert() returns an LSN
directly to the start of the page, but GetXLogInsertRecPtr() will return an
LSN that points to just after the xlog page header.

If you look at the error from above, that's exactly what's happening - the
flush is only up to 0/01BD2000 (0x2000 is 8192, i.e. an 8kB boundary). But the
flush request is to 0/01BD2018, where 0x18 is the size of XLogPageHeaderData.

To be safe, this code would need to use a version of GetXLogInsertRecPtr()
that does use XLogBytePosToEndRecPtr() instead of XLogBytePosToRecPtr().


It's probably not easy to trigger this outside of aggressive test scenarios,
due to needing to avoid the gistXLogAssingLSN() path and having to encounter a
"reason" to flush the buffer immediately after.


However, if I put an XLogFlush() into gistGetFakeLSN() and use
wal_level=minimal, it's a lot easier.


It looks like this was introduced in

commit c6b92041d38
Author: Noah Misch <[email protected]>
Date:   2020-04-04 12:25:34 -0700

    Skip WAL for new relfilenodes, under wal_level=minimal.


Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: gistGetFakeLSN() can return incorrect LSNs
  2026-03-05 17:10 gistGetFakeLSN() can return incorrect LSNs Andres Freund <[email protected]>
@ 2026-03-05 19:27 ` Andres Freund <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Andres Freund @ 2026-03-05 19:27 UTC (permalink / raw)
  To: Andrey Borodin <[email protected]>; +Cc: pgsql-hackers; Noah Misch <[email protected]>; Tomas Vondra <[email protected]>; Peter Geoghegan <[email protected]>

Hi,

On 2026-03-05 23:26:30 +0500, Andrey Borodin wrote:
> Interesting bug. Your analysis seems correct to me.
> 
> > On 5 Mar 2026, at 22:10, Andres Freund <[email protected]> wrote:
> > 
> > To be safe, this code would need to use a version of GetXLogInsertRecPtr()
> > that does use XLogBytePosToEndRecPtr() instead of XLogBytePosToRecPtr().
> 
> Can't we just take Insert->CurrBytePos without XLogBytePosToEndRecPtr()?
> Is there a point in alignment before the page header?

No, that'd be a completely bogus LSN, as CurrBytePos does not include any
space for page headers, to make the the very contended spinlock'ed section in
ReserveXLogInsertLocation() cheaper:

	/*
	 * The duration the spinlock needs to be held is minimized by minimizing
	 * the calculations that have to be done while holding the lock. The
	 * current tip of reserved WAL is kept in CurrBytePos, as a byte position
	 * that only counts "usable" bytes in WAL, that is, it excludes all WAL
	 * page headers. The mapping between "usable" byte positions and physical
	 * positions (XLogRecPtrs) can be done outside the locked region, and
	 * because the usable byte position doesn't include any headers, reserving
	 * X bytes from WAL is almost as simple as "CurrBytePos += X".
	 */
	SpinLockAcquire(&Insert->insertpos_lck);


And we rely on the page LSNs to be correct for the content of newly created
permanent relations with wal_level=minimal, so you can't just return arbitrary
other values.

Greetings,

Andres Freund





^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: gistGetFakeLSN() can return incorrect LSNs
  2026-03-05 17:10 gistGetFakeLSN() can return incorrect LSNs Andres Freund <[email protected]>
@ 2026-03-13 11:33 ` Fujii Masao <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Fujii Masao @ 2026-03-13 11:33 UTC (permalink / raw)
  To: Tomas Vondra <[email protected]>; +Cc: Andres Freund <[email protected]>; Noah Misch <[email protected]>; pgsql-hackers; Tomas Vondra <[email protected]>; Peter Geoghegan <[email protected]>

On Fri, Mar 13, 2026 at 8:12 PM Tomas Vondra <[email protected]> wrote:
>
> On 3/5/26 21:25, Andres Freund wrote:
> > Hi,
> >
> > On 2026-03-05 10:52:03 -0800, Noah Misch wrote:
> >> On Thu, Mar 05, 2026 at 12:10:11PM -0500, Andres Freund wrote:
> >>> Tomas encountered a crash with the index prefetching patchset. One of the
> >>> patches included therein is a generalization of the gistGetFakeLSN()
> >>> mechanism, which is then used by other indexes as well.  That triggered an
> >>> occasional, hard to locally reproduce, ERROR or PANIC in CI, about
> >>>
> >>>   ERROR: xlog flush request 0/01BD2018 is not satisfied --- flushed only to 0/01BD2000
> >>
> >>> To be safe, this code would need to use a version of GetXLogInsertRecPtr()
> >>> that does use XLogBytePosToEndRecPtr() instead of XLogBytePosToRecPtr().
> >>
> >> I agree.  Thanks for diagnosing it.  Feel free to move forward with that
> >> strategy, or let me know if you'd like me to do it.
> >
> > I'd appreciate if you could do it.
> >
>
> Here's a fix for master (and backpatching). It introduces a new function
> GetXLogInsertEndRecPtr() and then uses that in gistGetFakeLSN(). I still
> need to test this a bit more, but it did fix the issue in our dev branch
> (where we saw regular failures). So I'm 99% sure it's fine.
>
> After writing the fix I had the idea to grep for GetXLogInsertRecPtr
> calls that might have similar issue (being passed to XLogFlush), and
> sure enough walsender does this:
>
>     XLogFlush(GetXLogInsertRecPtr());
>
> Which AFAICS has the same issue, right? Funnily enough, this is a very
> new call, from 2026/03/06. Before 6eedb2a5fd88 walsender might flush not
> far enough, now it may be flushing too far ;-) AFAIK it should call the
> same GetXLogInsertEndRecPtr() once we have it.

Yes, this is already being discussed at [1], and Anthonin has also proposed
a patch there that adds GetXLogInsertEndRecPtr().

Regards,

[1] https://postgr.es/m/vzguaguldbcyfbyuq76qj7hx5qdr5kmh67gqkncyb2yhsygrdt@dfhcpteqifux


-- 
Fujii Masao





^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2026-03-13 11:33 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-05 17:10 gistGetFakeLSN() can return incorrect LSNs Andres Freund <[email protected]>
2026-03-05 19:27 ` Andres Freund <[email protected]>
2026-03-13 11:33 ` Fujii Masao <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox