Logical replication, need to reclaim big disk space

public inbox for [email protected]  
help / color / mirror / Atom feed

Logical replication, need to reclaim big disk space
2+ messages / 2 participants
[nested] [flat]

* Logical replication, need to reclaim big disk space
@ 2025-05-16 15:45 Moreno Andreo <[email protected]>
  2025-05-16 19:33 ` Re: Logical replication, need to reclaim big disk space Achilleas Mantzios <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: Moreno Andreo @ 2025-05-16 15:45 UTC (permalink / raw)
  To: pgsql-general

Hi,
     we are moving our old binary data approach, moving them from bytea 
fields in a table to external storage (making database smaller and 
related operations faster and smarter).
In short, we have a job that runs in background and copies data from the 
table to an external file and then sets the bytea field to NULL.
(UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = <uuid>)

This results, at the end of the operations, to a table that's less than 
one tenth in size.
We have a multi-tenant architecture (100s of schemas with identical 
architecture, all inheriting from public) and we are performing the task 
on one table per schema.

The problem is: this is generating BIG table bloat, as you may imagine.
Running a VACUUM FULL on an ex-22GB table on a standalone test server is 
almost immediate.
If I had only one server, I'll process a table a time, with a nightly 
script, and issue a VACUUM FULL to tables that have already been processed.

But I'm in a logical replication architecture (we are using a 
multimaster system called pgEdge, but I don't think it will make big 
difference, since it's based on logical replication), and I'm building a 
test cluster.

I've been instructed to issue VACUUM FULL on both nodes, nightly, but 
before proceeding I read on docs that VACUUM FULL can disrupt logical 
replication, so I'm a bit concerned on how to proceed. Rows are cleared 
one a time (one transaction, one row, to keep errors to the record that 
issued them)

I read about extensions like pg_squeeze, but I wonder if they are still 
not dangerous for replication.

Thanks for your help.
Moreno.-

^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Logical replication, need to reclaim big disk space
  2025-05-16 15:45 Logical replication, need to reclaim big disk space Moreno Andreo <[email protected]>
@ 2025-05-16 19:33 ` Achilleas Mantzios <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Achilleas Mantzios @ 2025-05-16 19:33 UTC (permalink / raw)
  To: Moreno Andreo <[email protected]>; pgsql-general

On 16/5/25 18:45, Moreno Andreo wrote:

> Hi,
>     we are moving our old binary data approach, moving them from bytea 
> fields in a table to external storage (making database smaller and 
> related operations faster and smarter).
> In short, we have a job that runs in background and copies data from 
> the table to an external file and then sets the bytea field to NULL.
> (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = <uuid>)
>
> This results, at the end of the operations, to a table that's less 
> than one tenth in size.
> We have a multi-tenant architecture (100s of schemas with identical 
> architecture, all inheriting from public) and we are performing the 
> task on one table per schema.
>
So? toasted data are kept on separate TOAST tables, unless those bytea 
cols are selected, you won't even touch them. I cannot understand what 
you are trying to achieve here.

Years ago, when I made the mistake to go for a coffee and let my 
developers "improvise" , the result was a design similar to what you are 
trying to achieve. Years after, I am seriously considering moving those 
data back to PostgreSQL.


> The problem is: this is generating BIG table bloat, as you may imagine.
> Running a VACUUM FULL on an ex-22GB table on a standalone test server 
> is almost immediate.
> If I had only one server, I'll process a table a time, with a nightly 
> script, and issue a VACUUM FULL to tables that have already been 
> processed.
>
> But I'm in a logical replication architecture (we are using a 
> multimaster system called pgEdge, but I don't think it will make big 
> difference, since it's based on logical replication), and I'm building 
> a test cluster.
>
So you use PgEdge , but you wanna lose all the benefits of multi-master 
, since your binary data won't be replicated ...
> I've been instructed to issue VACUUM FULL on both nodes, nightly, but 
> before proceeding I read on docs that VACUUM FULL can disrupt logical 
> replication, so I'm a bit concerned on how to proceed. Rows are 
> cleared one a time (one transaction, one row, to keep errors to the 
> record that issued them)
>
PgEdge is based on the old pg_logical, the old 2ndQuadrant extension, 
not the native logical replication we have since pgsql 10. But I might 
be mistaken.
> I read about extensions like pg_squeeze, but I wonder if they are 
> still not dangerous for replication.
>
What's pgEdge take on that, I mean the bytea thing you are trying to 
achieve here.
> Thanks for your help.
> Moreno.-
>
>
>






^ permalink  raw  reply  [nested|flat] 2+ messages in thread

end of thread, other threads:[~2025-05-16 19:33 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-05-16 15:45 Logical replication, need to reclaim big disk space Moreno Andreo <[email protected]>
2025-05-16 19:33 ` Achilleas Mantzios <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox