MIME-Version: 1.0
References: <CAFeSbqh0Mj3bm9+aCaz5g4NhKn8+t4aGF=p5vOPc5oVssveATQ@mail.gmail.com>
 <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com>
In-Reply-To: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com>
From: Paul Brindusa <paulbrindusa88@gmail.com>
Date: Thu, 23 Jan 2025 11:40:16 +0000
Message-ID: <CAFeSbqijFCW9xFOfapTzebbPcv2sWgpgrS1kVfFNJ+F7sA8R=A@mail.gmail.com>
Subject: Re: Return of the pg_wal issue..
To: Adrian Klaver <adrian.klaver@aklaver.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="0000000000009fcf99062c5e136d"
Archived-At: <https://www.postgresql.org/message-id/CAFeSbqijFCW9xFOfapTzebbPcv2sWgpgrS1kVfFNJ%2BF7sA8R%3DA%40mail.gmail.com>
Precedence: bulk

--0000000000009fcf99062c5e136d
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Hopefully the below is going to give a little bit more insight on the issue=
.
I will mention as well that the cluster also replicates data to another
mysql database if it's relevant at all.
Also worth noting this is our production cluster and we have another
pre-production cluster with basically the same settings and the issue there
does not occur.

A good deal more information is needed to troubleshoot this:

1) Postgres version(s).

postgres (PostgreSQL) 15.10

2) The Patroni version.

patroni 4.0.4

3) The Patroni configuration.

scope: postgres-cluster
name: db01
namespace: /service/

log:
  level: INFO
  traceback_level: ERROR
  format: "%(asctime)s %(levelname)s: %(message)s"
  dateformat: ""
  max_queue_size: 1000
  dir: /var/log/patroni
  file_num: 4
  file_size: 25000000
  loggers:
    patroni.postmaster: WARNING
    urllib3: WARNING

restapi:
  listen: x.x.x.98:8008
  connect_address: x.x.x.98:8008

etcd3:
  hosts: db01.local:2379,db02.local:2379,db03.local:2379


bootstrap:
  dcs:
    ttl: 30
    loop_wait: 10
    retry_timeout: 10
    maximum_lag_on_failover: 1048576
    postgresql:
      use_pg_rewind: true
      use_slots: true
      parameters:
        max_connections: 500
        superuser_reserved_connections: 5
        password_encryption: scram-sha-256
        max_locks_per_transaction: 512
        max_prepared_transactions: 0
        huge_pages: try
        shared_buffers: 128MB
        effective_cache_size: 4GB
        work_mem: 128MB
        maintenance_work_mem: 256MB
        checkpoint_timeout: 15min
        checkpoint_completion_target: 0.9
        min_wal_size: 80MB
        max_wal_size: 1GB
        wal_buffers: 32MB
        default_statistics_target: 1000
        seq_page_cost: 1
        random_page_cost: 4
        effective_io_concurrency: 2
        synchronous_commit: on
        autovacuum: on
        autovacuum_max_workers: 5
        autovacuum_vacuum_scale_factor: 0.01
        autovacuum_analyze_scale_factor: 0.01
        autovacuum_vacuum_cost_limit: 500
        autovacuum_vacuum_cost_delay: 2
        autovacuum_naptime: 1s
        max_files_per_process: 4096
        archive_mode: on
        archive_timeout: 1800s
        archive_command: cd .
        wal_level: replica
        wal_keep_size: 2GB
        max_wal_senders: 10
        max_replication_slots: 10
        hot_standby: on
        wal_log_hints: on
        wal_compression: on
        shared_preload_libraries: pgaudit
        track_io_timing: on
        log_lock_waits: on
        log_temp_files: 0
        track_activities: on
        track_counts: on
        track_functions: all
        log_checkpoints: on
        logging_collector: on
        log_truncate_on_rotation: on
        log_rotation_age: 1d
        log_rotation_size: 1GB
        log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a,client=
=3D%h '
        log_filename: postgresql-%Y-%m-%d.log
        log_directory: /var/log/pgsql
        log_connections: on
        log_disconnections: on
        log_statement: ddl
        log_error_verbosity: verbose
        hot_standby_feedback: on
        max_standby_streaming_delay: 30s
        wal_receiver_status_interval: 10s
        idle_in_transaction_session_timeout: 10min
        jit: off
        max_worker_processes: 24
        max_parallel_workers: 8
        max_parallel_workers_per_gather: 2
        max_parallel_maintenance_workers: 2

  initdb:
  - encoding: UTF8
  - data-checksums

  pg_hba:
  - host replication replicator 127.0.0.1/32 md5

  - host replication replicator x.x.x.98/27 scram-sha-256


  - host replication replicator x.x.x.99/27 scram-sha-256


  - host replication replicator x.x.x.100/27 scram-sha-256


  - host all all 0.0.0.0/0 md5

postgresql:
  listen: x.x.x.98:5432
  connect_address: x.x.x.98:5432
  data_dir: /var/lib/pgsql/data
  bin_dir: /usr/bin
  pgpass: /var/lib/pgsql/.pgpass_patroni
  authentication:
    replication:
      username: replicator
      password: password
    superuser:
      username: postgres
      password: password
  parameters:
    unix_socket_directories: /var/run/postgresql

  remove_data_directory_on_rewind_failure: false
  remove_data_directory_on_diverged_timelines: false

  create_replica_methods:
    - basebackup
  basebackup:
    max-rate: '100M'
    checkpoint: 'fast'

watchdog:
  mode: required
  device: /dev/watchdog
  safety_margin: 5

tags:
  nofailover: false
  noloadbalance: false
  clonefrom: false
  nosync: false

4) Definition of 'ridiculous rate'.

1GB / day

5) Relevant information from the logs.

Below entry is something taken off today's log  until this point in time
which I think it might be relevant. I cannot see any specifics. If there is
anything else please let me know.

2<REDACTED>:<REDACTED> GMT [186889]: [863-1] db=3D,user=3D,app=3D,client=3D=
 LOG:
 00000: checkpoint starting: time
2<REDACTED>:<REDACTED> GMT [186889]: [864-1] db=3D,user=3D,app=3D,client=3D
LOCATION:  LogCheckpointStart, xlog.c:6121
2<REDACTED>:<REDACTED> GMT [186889]: [865-1] db=3D,user=3D,app=3D,client=3D=
 LOG:
 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) added,
0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total=3D6.619 s; sy=
nc
files=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 kB, estima=
te=3D56426
kB
2<REDACTED>:<REDACTED> GMT [186889]: [866-1] db=3D,user=3D,app=3D,client=3D
LOCATION:  LogCheckpointEnd, xlog.c:6202
2<REDACTED>:<REDACTED> GMT [2439188]: [7-1]
db=3Ddocumentation-database,user=3Ddocumentation-database-user,app=3DPostgr=
eSQL
JDBC Driver,client=3D<REDACTED> LOG:  00000: disconnection: session time:
0:<REDACTED> user=3Ddocumentation-database-user
database=3Ddocumentation-database host=3D<REDACTED> port=3D56170


@Laurenz

I guess you are referring to
https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/

*Yes, that is the one.*

I listed all the reasons I know for your predicament.
Did you do some research along these lines?

*I've had a look at the things that you have mentioned in the guide. *

If yes, what did you find?

*I've not managed to test the queries out yet. But I am planning to test
out in my lab environment.*
*Sorry am really cautious about this as those are the main production
databases.*

*Hope the above is going to give a bit of insight on the root cause of the
problem.*


Yours,
Laurenz Albe


On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klaver <adrian.klaver@aklave=
r.com>
wrote:

> On 1/22/25 09:33, Paul Brindusa wrote:
> > Good afternoon,
> >
> > Following below we are facing a similar issue and im getting a real buz=
z
> > to get this working myself, speaking to my DBA  in the  company has
> > actually left me a bit cold as he is not good with postgres.
> >
> > So I want to try and get a solution for this and fix this issue with th=
e
> > pg_wal files filling up the drive at a ridiculous rate. I have been
> > manually moving logs to a different directory but have had no luck in
> > finding an actual solution.
> >
> > The cluster is a 3 node cluster with HA which is running wirth patroni.
> >
> > Please help me out, I will mention that I have test cluster spun up in
> > case something needs testing.
> >
> > Also want to give a shout out to Lorenz Albe's for posting stuff about
> > wal files on his company blog.
> >
> > Again any help will be greatly appreciated.
>
> A good deal more information is needed to troubleshoot this:
>
> 1) Postgres version(s).
>
> 2) The Patroni version.
>
> 3) The Patroni configuration.
>
> 4) Definition of 'ridiculous rate'.
>
> 5) Relevant information from the logs.
>
> >
> >
> > " On one of our postgres instances we have the pg_wal/data folder up to
> > 196GB, out of 200GB disk filled up.
> > This has stopped the posgresql.service this morning causing two
> > applications to crash.
> > Unfortunately our database admin is on leave today, and we are trying t=
o
> > figure out how to get the disk down?
> > Any ideas or suggestions are more than welcome.
> >
> > Thank you in advance."
> >
> >
> > --
> > Kind Regards,
> > Paul Brindusa
> > paulbrindusa88@gmail.com <mailto:paulbrindusa88@gmail.com>
> >
>
> --
> Adrian Klaver
> adrian.klaver@aklaver.com
>
>

--=20
Kind Regards,
Paul Brindusa
paulbrindusa88@gmail.com

--0000000000009fcf99062c5e136d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div>Hopefully the below is going to give=
 a little bit more insight on the issue.</div><div>I will mention as well t=
hat the cluster also replicates data to another mysql database if it&#39;s =
relevant at all.</div><div>Also worth noting this is our production cluster=
 and we have another pre-production cluster with basically the same setting=
s and the issue there does not occur.</div><div><br></div>A good deal more =
information is needed to troubleshoot this:<br><br>1) Postgres version(s).<=
div><br></div><div>postgres (PostgreSQL) 15.10<br><br>2) The Patroni versio=
n.</div><div><br></div><div>patroni 4.0.4<br><br>3) The Patroni configurati=
on.<br><br>scope: postgres-cluster<br>name: db01<br>namespace: /service/<br=
><br>log:<br>=C2=A0 level: INFO<br>=C2=A0 traceback_level: ERROR<br>=C2=A0 =
format: &quot;%(asctime)s %(levelname)s: %(message)s&quot;<br>=C2=A0 datefo=
rmat: &quot;&quot;<br>=C2=A0 max_queue_size: 1000<br>=C2=A0 dir: /var/log/p=
atroni<br>=C2=A0 file_num: 4<br>=C2=A0 file_size: 25000000<br>=C2=A0 logger=
s:<br>=C2=A0 =C2=A0 patroni.postmaster: WARNING<br>=C2=A0 =C2=A0 urllib3: W=
ARNING<br><br>restapi:<br>=C2=A0 listen: x.x.x.98:8008<br>=C2=A0 connect_ad=
dress: x.x.x.98:8008<br><br>etcd3:<br>=C2=A0 hosts: db01.local:2379,db02.lo=
cal:2379,db03.local:2379<br><br><br>bootstrap:<br>=C2=A0 dcs:<br>=C2=A0 =C2=
=A0 ttl: 30<br>=C2=A0 =C2=A0 loop_wait: 10<br>=C2=A0 =C2=A0 retry_timeout: =
10<br>=C2=A0 =C2=A0 maximum_lag_on_failover: 1048576<br>=C2=A0 =C2=A0 postg=
resql:<br>=C2=A0 =C2=A0 =C2=A0 use_pg_rewind: true<br>=C2=A0 =C2=A0 =C2=A0 =
use_slots: true<br>=C2=A0 =C2=A0 =C2=A0 parameters:<br>=C2=A0 =C2=A0 =C2=A0=
 =C2=A0 max_connections: 500<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 superuser_reser=
ved_connections: 5<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 password_encryption: scra=
m-sha-256<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_locks_per_transaction: 512<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_prepared_transactions: 0<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 huge_pages: try =C2=A0 =C2=A0 =C2=A0<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 shared_buffers: 128MB<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_cache=
_size: 4GB<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 work_mem: 128MB<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 maintenance_work_mem: 256MB<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 ch=
eckpoint_timeout: 15min<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 checkpoint_completio=
n_target: 0.9<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 min_wal_size: 80MB<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 max_wal_size: 1GB<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_b=
uffers: 32MB<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 default_statistics_target: 1000=
<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 seq_page_cost: 1<br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 random_page_cost: 4<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_io_conc=
urrency: 2<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 synchronous_commit: on<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 autovacuum: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacu=
um_max_workers: 5<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_scale_fa=
ctor: 0.01<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_analyze_scale_factor: =
0.01<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_limit: 500<br>=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_delay: 2<br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 autovacuum_naptime: 1s<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 max=
_files_per_process: 4096<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_mode: on<br=
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_timeout: 1800s<br>=C2=A0 =C2=A0 =C2=A0=
 =C2=A0 archive_command: cd .<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_level: rep=
lica<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_keep_size: 2GB<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 max_wal_senders: 10<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_replicati=
on_slots: 10<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_standby: on<br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 wal_log_hints: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_comp=
ression: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 shared_preload_libraries: pgaudi=
t<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_io_timing: on<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 log_lock_waits: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_temp_files=
: 0<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_activities: on<br>=C2=A0 =C2=A0 =
=C2=A0 =C2=A0 track_counts: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_functio=
ns: all<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_checkpoints: on<br>=C2=A0 =C2=A0=
 =C2=A0 =C2=A0 logging_collector: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_tru=
ncate_on_rotation: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_age: 1d<b=
r>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_size: 1GB<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0 log_line_prefix: &#39;%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a=
,client=3D%h &#39;<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_filename: postgresql-=
%Y-%m-%d.log<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_directory: /var/log/pgsql<b=
r>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_connections: on <br>=C2=A0 =C2=A0 =C2=A0 =
=C2=A0 log_disconnections: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_statement:=
 ddl<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_error_verbosity: verbose<br>=C2=A0 =
=C2=A0 =C2=A0 =C2=A0 hot_standby_feedback: on<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 max_standby_streaming_delay: 30s<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_rec=
eiver_status_interval: 10s<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 idle_in_transacti=
on_session_timeout: 10min<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 jit: off<br>=C2=A0=
 =C2=A0 =C2=A0 =C2=A0 max_worker_processes: 24<br>=C2=A0 =C2=A0 =C2=A0 =C2=
=A0 max_parallel_workers: 8<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_wor=
kers_per_gather: 2<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_maintenance_=
workers: 2 =C2=A0 =C2=A0 =C2=A0<br><br>=C2=A0 initdb:<br>=C2=A0 - encoding:=
 UTF8<br>=C2=A0 - data-checksums<br><br>=C2=A0 pg_hba:<br>=C2=A0 - host rep=
lication replicator <a href=3D"http://127.0.0.1/32">127.0.0.1/32</a> md5<br=
>=C2=A0 <br>=C2=A0 - host replication replicator x.x.x.98/27 scram-sha-256<=
br>=C2=A0 <br>=C2=A0 <br>=C2=A0 <br>=C2=A0 - host replication replicator x.=
x.x.99/27 scram-sha-256<br>=C2=A0 <br>=C2=A0 <br>=C2=A0 <br>=C2=A0 - host r=
eplication replicator x.x.x.100/27 scram-sha-256<br>=C2=A0 <br>=C2=A0 =C2=
=A0 <br>=C2=A0 - host all all <a href=3D"http://0.0.0.0/0">0.0.0.0/0</a> md=
5<br><br>postgresql:<br>=C2=A0 listen: x.x.x.98:5432<br>=C2=A0 connect_addr=
ess: x.x.x.98:5432<br>=C2=A0 data_dir: /var/lib/pgsql/data<br>=C2=A0 bin_di=
r: /usr/bin<br>=C2=A0 pgpass: /var/lib/pgsql/.pgpass_patroni<br>=C2=A0 auth=
entication:<br>=C2=A0 =C2=A0 replication:<br>=C2=A0 =C2=A0 =C2=A0 username:=
 replicator<br>=C2=A0 =C2=A0 =C2=A0 password: password<br>=C2=A0 =C2=A0 sup=
eruser:<br>=C2=A0 =C2=A0 =C2=A0 username: postgres<br>=C2=A0 =C2=A0 =C2=A0 =
password: password<br>=C2=A0 parameters:<br>=C2=A0 =C2=A0 unix_socket_direc=
tories: /var/run/postgresql<br><br>=C2=A0 remove_data_directory_on_rewind_f=
ailure: false<br>=C2=A0 remove_data_directory_on_diverged_timelines: false<=
br><br>=C2=A0 create_replica_methods:<br>=C2=A0 =C2=A0 - basebackup<br>=C2=
=A0 basebackup:<br>=C2=A0 =C2=A0 max-rate: &#39;100M&#39;<br>=C2=A0 =C2=A0 =
checkpoint: &#39;fast&#39; =C2=A0 =C2=A0 =C2=A0<br><br>watchdog:<br>=C2=A0 =
mode: required<br>=C2=A0 device: /dev/watchdog<br>=C2=A0 safety_margin: 5<b=
r><br>tags:<br>=C2=A0 nofailover: false<br>=C2=A0 noloadbalance: false<br>=
=C2=A0 clonefrom: false<br>=C2=A0 nosync: false<br><br>4) Definition of =
9;ridiculous rate&#39;.<br><br>1GB / day<br><br>5) Relevant information fro=
m the logs.</div><div><br></div><div>Below entry is something taken off tod=
ay&#39;s log=C2=A0 until this point in time which I think it might be relev=
ant. I cannot see any specifics. If there is anything else please let me kn=
ow.=C2=A0</div><div><br></div><div>2&lt;REDACTED&gt;:&lt;REDACTED&gt; GMT [=
186889]: [863-1] db=3D,user=3D,app=3D,client=3D LOG: =C2=A000000: checkpoin=
t starting: time<br>2&lt;REDACTED&gt;:&lt;REDACTED&gt; GMT [186889]: [864-1=
] db=3D,user=3D,app=3D,client=3D LOCATION: =C2=A0LogCheckpointStart, xlog.c=
:6121<br>2&lt;REDACTED&gt;:&lt;REDACTED&gt; GMT [186889]: [865-1] db=3D,use=
r=3D,app=3D,client=3D LOG: =C2=A000000: checkpoint complete: wrote 66 buffe=
rs (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=3D6.563 s, syn=
c=3D0.003 s, total=3D6.619 s; sync files=3D22, longest=3D0.002 s, average=
=3D0.001 s; distance=3D776 kB, estimate=3D56426 kB<br>2&lt;REDACTED&gt;:&lt=
;REDACTED&gt; GMT [186889]: [866-1] db=3D,user=3D,app=3D,client=3D LOCATION=
: =C2=A0LogCheckpointEnd, xlog.c:6202<br>2&lt;REDACTED&gt;:&lt;REDACTED&gt;=
 GMT [2439188]: [7-1] db=3Ddocumentation-database,user=3Ddocumentation-data=
base-user,app=3DPostgreSQL JDBC Driver,client=3D&lt;REDACTED&gt; LOG: =C2=
=A000000: disconnection: session time: 0:&lt;REDACTED&gt; user=3Ddocumentat=
ion-database-user database=3Ddocumentation-database host=3D&lt;REDACTED&gt;=
 port=3D56170</div><div><br></div><div><br></div><div><span class=3D"gmail-=
im" style=3D"color:rgb(80,0,80)"><a class=3D"gmail_plusreply" id=3D"gmail-p=
lusReplyChip-1">@Laurenz=C2=A0</a></span></div><div><br></div><div>I guess =
you are referring to<br><a href=3D"https://www.cybertec-postgresql.com/en/w=
hy-does-my-pg_wal-keep-growing/" rel=3D"noreferrer" target=3D"_blank">https=
://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/</a><br><=
br><b>Yes, that is the one.</b><br><br>I listed all the reasons I know for =
your predicament.<br>Did you do some research along these lines?</div><div>=
<br></div><div><b>I&#39;ve had a look at the things that you have mentioned=
 in the guide.=C2=A0</b></div><div><br></div><div>If yes, what did you find=
?<br><br></div><div><b>I&#39;ve not managed to test the queries out yet. Bu=
t I am planning to test out in my lab environment.</b></div><div><b>Sorry a=
m really cautious about this as those are the main production databases.</b=
></div><div><b><br></b></div><div><b>Hope the above is going to give a bit =
of insight on the root cause of the problem.</b></div><div><br></div><div><=
br></div><div><br></div><div>Yours,<br>Laurenz Albe<div class=3D"gmail-yj6q=
o"></div><br class=3D"gmail-Apple-interchange-newline"></div><div><br></div=
></div><br><div class=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr=
" class=3D"gmail_attr">On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klave=
r &lt;<a href=3D"mailto:adrian.klaver@aklaver.com">adrian.klaver@aklaver.co=
m</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"=
>On 1/22/25 09:33, Paul Brindusa wrote:<br>
&gt; Good afternoon,<br>
&gt; <br>
&gt; Following below we are facing a similar issue and im getting a real bu=
zz <br>
&gt; to get this working myself, speaking to my DBA=C2=A0 in the=C2=A0 comp=
any has <br>
&gt; actually left me a bit cold as he is not good with postgres.<br>
&gt; <br>
&gt; So I want to try and get a solution for this and fix this issue with t=
he <br>
&gt; pg_wal files filling up the drive at a ridiculous rate. I have been <b=
r>
&gt; manually moving logs to a different directory but have had no luck in =
<br>
&gt; finding an actual solution.<br>
&gt; <br>
&gt; The cluster is a 3 node cluster with HA which is running wirth patroni=
.<br>
&gt; <br>
&gt; Please help me out, I will mention that I have test cluster spun up in=
 <br>
&gt; case something needs testing.<br>
&gt; <br>
&gt; Also want to give a shout out to Lorenz Albe&#39;s for posting stuff a=
bout <br>
&gt; wal files on his company blog.<br>
&gt; <br>
&gt; Again any help will be greatly appreciated.<br>
<br>
A good deal more information is needed to troubleshoot this:<br>
<br>
1) Postgres version(s).<br>
<br>
2) The Patroni version.<br>
<br>
3) The Patroni configuration.<br>
<br>
4) Definition of &#39;ridiculous rate&#39;.<br>
<br>
5) Relevant information from the logs.<br>
<br>
&gt; <br>
&gt; <br>
&gt; &quot; On one of our postgres instances we have the pg_wal/data folder=
 up to <br>
&gt; 196GB, out of 200GB disk filled up.<br>
&gt; This has stopped the posgresql.service this morning causing two <br>
&gt; applications to crash.<br>
&gt; Unfortunately our database admin is on leave today, and we are trying =
to <br>
&gt; figure out how to get the disk down?<br>
&gt; Any ideas or suggestions are more than welcome.<br>
&gt; <br>
&gt; Thank you in advance.&quot;<br>
&gt; <br>
&gt; <br>
&gt; -- <br>
&gt; Kind Regards,<br>
&gt; Paul Brindusa<br>
&gt; <a href=3D"mailto:paulbrindusa88@gmail.com" target=3D"_blank">paulbrin=
dusa88@gmail.com</a> &lt;mailto:<a href=3D"mailto:paulbrindusa88@gmail.com"=
 target=3D"_blank">paulbrindusa88@gmail.com</a>&gt;<br>
&gt; <br>
<br>
-- <br>
Adrian Klaver<br>
<a href=3D"mailto:adrian.klaver@aklaver.com" target=3D"_blank">adrian.klave=
r@aklaver.com</a><br>
<br>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr"><div>Kind Regards,</div><div>Paul Brindusa</div>=
<div><a href=3D"mailto:paulbrindusa88@gmail.com" target=3D"_blank">paulbrin=
dusa88@gmail.com</a></div><div><br></div></div></div></div>

--0000000000009fcf99062c5e136d--