Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uG0oK-00Bfh8-Iw for pgsql-general@arkaria.postgresql.org; Fri, 16 May 2025 19:33:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uG0oJ-003qjV-M7 for pgsql-general@arkaria.postgresql.org; Fri, 16 May 2025 19:33:35 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uG0oJ-003qjN-B7 for pgsql-general@lists.postgresql.org; Fri, 16 May 2025 19:33:35 +0000 Received: from cloud.gatewaynet.com ([185.90.37.94]) by makus.postgresql.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1uG0oG-0027sj-2Y for pgsql-general@postgresql.org; Fri, 16 May 2025 19:33:34 +0000 Message-ID: Date: Fri, 16 May 2025 22:33:29 +0300 MIME-Version: 1.0 Subject: Re: Logical replication, need to reclaim big disk space To: Moreno Andreo , PostgreSQL mailing lists References: <7be05164-f62e-49ec-87c8-9c3512904d07@evolu-s.it> Content-Language: en-US From: Achilleas Mantzios In-Reply-To: <7be05164-f62e-49ec-87c8-9c3512904d07@evolu-s.it> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 16/5/25 18:45, Moreno Andreo wrote: > Hi, >     we are moving our old binary data approach, moving them from bytea > fields in a table to external storage (making database smaller and > related operations faster and smarter). > In short, we have a job that runs in background and copies data from > the table to an external file and then sets the bytea field to NULL. > (UPDATE tbl SET blob = NULL, ref = 'path/to/file' WHERE id = ) > > This results, at the end of the operations, to a table that's less > than one tenth in size. > We have a multi-tenant architecture (100s of schemas with identical > architecture, all inheriting from public) and we are performing the > task on one table per schema. > So? toasted data are kept on separate TOAST tables, unless those bytea cols are selected, you won't even touch them. I cannot understand what you are trying to achieve here. Years ago, when I made the mistake to go for a coffee and let my developers "improvise" , the result was a design similar to what you are trying to achieve. Years after, I am seriously considering moving those data back to PostgreSQL. > The problem is: this is generating BIG table bloat, as you may imagine. > Running a VACUUM FULL on an ex-22GB table on a standalone test server > is almost immediate. > If I had only one server, I'll process a table a time, with a nightly > script, and issue a VACUUM FULL to tables that have already been > processed. > > But I'm in a logical replication architecture (we are using a > multimaster system called pgEdge, but I don't think it will make big > difference, since it's based on logical replication), and I'm building > a test cluster. > So you use PgEdge , but you wanna lose all the benefits of multi-master , since your binary data won't be replicated ... > I've been instructed to issue VACUUM FULL on both nodes, nightly, but > before proceeding I read on docs that VACUUM FULL can disrupt logical > replication, so I'm a bit concerned on how to proceed. Rows are > cleared one a time (one transaction, one row, to keep errors to the > record that issued them) > PgEdge is based on the old pg_logical, the old 2ndQuadrant extension, not the native logical replication we have since pgsql 10. But I might be mistaken. > I read about extensions like pg_squeeze, but I wonder if they are > still not dangerous for replication. > What's pgEdge take on that, I mean the bytea thing you are trying to achieve here. > Thanks for your help. > Moreno.- > > >