Re: Adding REPACK [concurrently]

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Álvaro Herrera <[email protected]>
To: David Klika <[email protected]>
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Cc: [email protected]
Subject: Re: Adding REPACK [concurrently]
Date: Thu, 4 Dec 2025 16:43:44 +0100
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>

Hello David,

Thanks for your interest in this.

On 2025-Dec-04, David Klika wrote:

> Let's consider a large table where 80% blocks are fine (filled enough by
> live tuples). The table could be scanned from the beginning (left side) to
> identify "not enough filled" blocks and also from the end (right side) to
> process live tuples by moving them to the blocks identified by the left side
> scan. The work is over when both scan reaches the same position.

If you only have a small number of pages that have this problem, then
you don't actually need to do anything -- the pages will be marked free
by regular vacuuming, and future inserts or updates can make use of
those pages.  It's not a problem to have a small number of pages in
empty state for some time.

So if you're trying to do this, the number of problematic pages must be
large.

Now, the issue with what you propose is that you need to make either the
old tuples or the new tuples visible to concurrent transactions.  If at
any point they are both visible, or none of them is visible, then you
have potentially corrupted the results that would be obtained by a query
that's scanning the table and halfway through.

The other point is that you need to keep indexes updated.  That is, you
need to make the indexes point to both the old and new, until you remove
the old tuples from the table, then remove those index pointers.
This process bloats the indexes, which is not insignificant, considering
that the number of tuples to process is large.  If there are several
indexes, this makes your process take even longer.

You can fix the concurrency problem by holding a lock on the table that
ensures nobody is reading the table until you've finished.  But we don't
want to have to hold such a lock for long!  And we already established
that the number of pages to check is large, which means you're going to
work for a long time.

So, I'm not really sure that it's practical to implement what you
suggest.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

view thread (18+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Adding REPACK [concurrently]
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox