MIME-Version: 1.0
From: Manan Kansara <manan.kansara@vlo.city>
Date: Sat, 24 Aug 2024 17:48:19 +0530
Message-ID: <CANz4VOO+GSdUYVrid+YvpxPzOZ6SWiB_T9FJL2H-PqAFVa9AsA@mail.gmail.com>
Subject: About replication minimal disk space usage
To: pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="000000000000ba913206206ce32f"
Archived-At: <https://www.postgresql.org/message-id/CANz4VOO%2BGSdUYVrid%2BYvpxPzOZ6SWiB_T9FJL2H-PqAFVa9AsA%40mail.gmail.com>
Precedence: bulk

--000000000000ba913206206ce32f
Content-Type: text/plain; charset="UTF-8"

Hello All,
I have my self hosted postgres server on aws with 16gb disk space
attached to it for ml stuff and analysis stuff we are using vertex ai so i
have setup live replication of postgres using data stream service to
BigQuery table.  We use BigQuery table as data warehouse because we have so
many different data source so our data analysis and ml can happened at one
place.
but problem is there When i am starting replication in there pg_wal take
whole space about 15.8gb in some days of starting replication

*Question *:  how can i setup something like that that optimally use disk
space so old pg_wal data that are not usable can we delete  i think i
should create one cron job which taken care whole that things but i don't
know any approach can you please guide
In future if as data grew i will attached more disk space to that instance
but i want to make optimal setup so my whole disk is not in full usage any
time and my server crash again.

--000000000000ba913206206ce32f
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello All,<br>I have my self=C2=A0hosted postgres server o=
n aws with 16gb disk space attached=C2=A0to it for ml stuff and analysis st=
uff we are using vertex ai so i have setup live replication of postgres usi=
ng data stream service to BigQuery table.=C2=A0 We use BigQuery table as da=
ta warehouse because=C2=A0we have so many different data source so our data=
 analysis and ml can happened=C2=A0at one place.<br>but problem is there Wh=
en i am starting replication in there pg_wal take whole space about 15.8gb =
in some days of starting replication=C2=A0<br><br><b><u><font size=3D"4">Qu=
estion</font></u> </b>:=C2=A0 how can i setup something like that that opti=
mally=C2=A0use disk space so old pg_wal data that are not usable can we del=
ete=C2=A0 i think i should create one cron job which taken care whole that =
things but i don&#39;t know any approach can you please guide<br>In future =
if as data grew i will attached more disk space to that instance but i want=
 to make optimal setup so my whole disk is not in full usage any time and m=
y server crash again.<br><br></div>

--000000000000ba913206206ce32f--