Re: Adding REPACK [concurrently]

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Antonin Houska <[email protected]>
To: Alvaro Herrera <[email protected]>
Cc: Mihail Nikalayeu <[email protected]>
Cc: Pg Hackers <[email protected]>
Cc: Robert Treat <[email protected]>
Subject: Re: Adding REPACK [concurrently]
Date: Fri, 13 Mar 2026 09:11:38 +0100
Message-ID: <8349.1773389498@localhost> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>

Alvaro Herrera <[email protected]> wrote:

> I offer the following rather trivial fixup diffs, which I think should
> be mostly be for 0002.

Thanks, just a few comments:

* 0002 - I didn't add the 'repack_' prefix because the function is static, but
  realize now that it might be useful from the code browsing perspective.

* 0008 - It's possible during query execution, when you know exactly which
  attributes you need to fetch from the tuple. However in store_change(), we
  don't know which attributes are external (indirect) without deforming them
  all.

> Other trivial things I'd like to change, if you don't mind,

> - the name pgoutput_repack sounds wrong to me.  I would rather say
>   rpck_output, repack_output, repack_plugin, ... or something.  I don't
>   understand where the suffix "output" comes from in the name
>   "pgoutput", but it sounds like arbitrary nonsense to me.

No strong preference here, except that I don't like "rpck_...". How about
pgoutput/repack.c? I think I tried to put the plugin into the existing
"pgoutput" directory at some point, but don't remember why I eventually
rejected that approach.

> - I would rename the routines in that file to also have the name
>   "repack", probably as prefixes.

ok

> One thing we need and is rather not trivial, is to go through the table
> AM interface rather than calling heapam.c routines directly.  I'm
> working on this now and will present a patch later.

It occurred to me too, at least because copy_table_data() calls
table_relation_copy_for_cluster() rather than
heapam_relation_copy_for_cluster(). Thanks for working on it.

> Another thing I noticed while going over the code was that we seem to
> spill whole tuples even for things like the old tuple of an UPDATE and
> for DELETE, but unless I misunderstand how this works, these shouldn't
> be necessary: we just need the replica identity so that we can locate
> the tuple to operate on.  Especially for tuples that contain large
> toasted attributes, this is likely important.

I think we don't need to care, as both heap_update() and heap_delete() call
ExtractReplicaIdentity(). That returns a real tuple, but it only contains the
attributes constituting the replica identity. The other ones are set to
NULL.

> It may make sense to use the TupleTableSlot interface rather than
> HeapTuple for everything.  I'm not really sure about this though.

Isn't this part of the adoption of table AM? For example, table_tuple_insert()
accepts tuple slot rather than heap tuple.

-- 
Antonin Houska
Web: https://www.cybertec-postgresql.com

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Adding REPACK [concurrently]
  In-Reply-To: <8349.1773389498@localhost>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox