Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1si0YL-00E8gK-0I for pgsql-general@arkaria.postgresql.org; Sat, 24 Aug 2024 23:52:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1si0YI-009uvc-Bt for pgsql-general@arkaria.postgresql.org; Sat, 24 Aug 2024 23:52:14 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1si0YH-009upu-EW for pgsql-general@lists.postgresql.org; Sat, 24 Aug 2024 23:52:14 +0000 Received: from relay3-d.mail.gandi.net ([2001:4b98:dc4:8::223]) by magus.postgresql.org with esmtps (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1si0YE-001L6v-7J for pgsql-general@lists.postgresql.org; Sat, 24 Aug 2024 23:52:13 +0000 Received: by mail.gandi.net (Postfix) with ESMTPSA id 412ED60002; Sat, 24 Aug 2024 23:52:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=vondra.me; s=gm1; t=1724543527; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fWJTH7F2yP/1NZg2iqlP9MgHEx76X6Brrctmd4esVI4=; b=MebWXz94JZh688NLeZ7Zo/2eQp+aq2ugPxLphw6oahoeIadvOPrX6JoyWVLZP8LMI7f39S d5vYEB4OcdkXO6dC3DcxU9idkeOuVicPrpI8h2quPVEqGE1XJ4nFxtG1JGxFbCF7jYaOIA B9NgfNnS01he3q8QBQRXrFVsVuUqggWdYme5bTfZU8CyJYNNbsZZVKYFv6EtrE29yBYWhD vN4Cy20O/gXzRLt3SNsab5F/ywpdnHN6TQQWJPsLym/FRxqUROGRjf1u4q0aif83nVAhNc VKUoSiV94aBvTTfg5Qqq7+B/Q2zqCrm4246I38M0QpMF2t2pcdY76tSk17MfbQ== Message-ID: <912190b3-7aaf-4f64-9870-9e96888f968e@vondra.me> Date: Sun, 25 Aug 2024 01:52:06 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: About replication minimal disk space usage To: Manan Kansara , pgsql-general@lists.postgresql.org References: Content-Language: en-US From: Tomas Vondra In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-GND-Sasl: tomas@vondra.me List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On 8/24/24 14:18, Manan Kansara wrote: > Hello All, > I have my self hosted postgres server on aws with 16gb disk space > attached to it for ml stuff and analysis stuff we are using vertex ai so > i have setup live replication of postgres using data stream service to > BigQuery table.  We use BigQuery table as data warehouse because we have > so many different data source so our data analysis and ml can > happened at one place. > but problem is there When i am starting replication in there pg_wal take > whole space about 15.8gb in some days of starting replication  > > *_Question_ *:  how can i setup something like that that optimally use > disk space so old pg_wal data that are not usable can we delete  i think > i should create one cron job which taken care whole that things but i > don't know any approach can you please guide > In future if as data grew i will attached more disk space to that > instance but i want to make optimal setup so my whole disk is not in > full usage any time and my server crash again. > Why don't you just give it more disk space? I'm not a fan of blindly throwing hardware at an issue, but 16GB is tiny these days, especially if it's shared by both data and WAL, and the time you spend optimizing this is likely more expensive than any savings. If you really want to keep this on 16GB, I think we'll need more details about what exactly you see on the instance / how it runs out of disk space. AFAIK datastream relies on logical replication, and there's a couple ways how that may consume disk space. For example, the datastream replication may pause for a while, in which case the replication slot will block removal of still-needed WAL, and if the pause is long enough, that may be an issue. Of course, we have no idea how much data you're dealing with (clearly not much, if it fits onto 16GB of disk space with everything else). Another option is that you have a huge transaction (inserting and/or modifying a lot of data at once), and the logical decoding ends up spilling the decoded transaction to disk. If you want a better answer, I think you'll have to provide a lot more details. For example, which PostgreSQL version are you using, and how is it configured? What config parameters have non-default values? regards -- Tomas Vondra