Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tayAs-006puP-RN for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 14:27:15 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tayAr-00GQ31-1I for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 14:27:13 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tayAq-00GQ2t-KF for pgsql-general@lists.postgresql.org; Thu, 23 Jan 2025 14:27:12 +0000 Received: from mail-lf1-x12c.google.com ([2a00:1450:4864:20::12c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tayAn-0016J5-1c for pgsql-general@postgresql.org; Thu, 23 Jan 2025 14:27:12 +0000 Received: by mail-lf1-x12c.google.com with SMTP id 2adb3069b0e04-53e399e3310so1098812e87.1 for ; Thu, 23 Jan 2025 06:27:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737642428; x=1738247228; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=F0PfgirfqS5tukuhhgWzoeO3H0aFB6ZOQpKi0V4AA+8=; b=Mjkh7JNkNcG73Tvw5KWJ54PQWXJotmMFl2L+Ar1e6TIets+BbZRBEgFIh8CMJLWB92 eRNSr6WpqzL4WsCVY1/Wr8JJSTSF2pyJQVXmz/0l5k2PJp6Gt8Zd3jvVuBdwqye7Cz6r 0tgIE3cIa/6eH/jElstu5jRtbvVBoQJgYaZVeIGHs0+j7uKuITyzSaB9nPLeXA7+yi86 XonjFrvp9BlghHlX6b0zWXv836RKz3UvfmsSVdfks6Oj7fos5txX85+u5ZOxAIcBSnmt xyMLZjjvfljcv0+vnr5UjEqbLQblIkNSNR6RDPTvm/xthjFXDpwz2Q3RwbDVo1iDAuW0 VZCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737642428; x=1738247228; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=F0PfgirfqS5tukuhhgWzoeO3H0aFB6ZOQpKi0V4AA+8=; b=ZCn6euuojy2Ef4dDcESRyx26zmBMcf/xAhEc9stKIeGqccxzncJFin57BWJsJrdfkX 5kE6zZG5Y2TZho/rVpY5QgaXaF/CO/f4C5JDbijiJXEGNJglgaAALGwkm2hR3TXfLEpI dAeKI7H9UcOmYP/CBzg2XaL6GyQmoAAZ/vq7p87KHAv35muk0jSbg3A7qVUoZjR4a41Y HIwTMeciC3BMM0r9Y3+2PY8WvRFH9CKuHqTSJ6DDCF5tw/ZrzDWdQyfnCnaW3o1CBPN+ NOTZxdAuGnwhIn4cNP4eN4n9ze8GpmyCeq8rnZAx1KY1MooFmg5B4sEeZNiTy/DTlEWy kCXw== X-Gm-Message-State: AOJu0YzzvCXGjSrQ8XaLMnCd5xs0PU4JSL2EM+1EkjyGR4hrRlJ4sS+6 nY9RNHrpBdqEx7clrT+1GHr5vC+b+RbQ3owDJaqiqhQYNzXnakjUVaoK2KG+n+U66GDjb27Rdc4 FkU8+i8mSyLwF3+4MdALIFEtpCtmeHu6EMNk= X-Gm-Gg: ASbGncshOFBozTFjtZFqWxOBhpwdmabvS4f9OYFDr7XgnDOM1b8s5hVZx8nFcEktxHO uXU7USRS76mcqUq/dCLiOPX9H4gxZSv6Jmt2cs29iV8IYJv3XAyHWVXAhUwr5 X-Google-Smtp-Source: AGHT+IHcSoJL2+80ZFrQreR8k/5voTzZdyg+2iAfQCSy8ZPopA7S5bKL8ItLuTwKJGX9ytK8WFcojmEyjob02Ocbp4U= X-Received: by 2002:a05:6512:63:b0:542:8cb0:8892 with SMTP id 2adb3069b0e04-5439c2830damr6662618e87.53.1737642427924; Thu, 23 Jan 2025 06:27:07 -0800 (PST) MIME-Version: 1.0 References: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> In-Reply-To: From: Saul Perdomo Date: Thu, 23 Jan 2025 09:26:51 -0500 X-Gm-Features: AbW1kvYGL-aQmzyQRXAIEcQe3SqU5565__XBkagjGorOQYAUGqiRrcRYPOAtggM Message-ID: Subject: Re: Return of the pg_wal issue.. To: Paul Brindusa Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000a9ce06062c606707" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000a9ce06062c606707 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hey Paul, Regarding *"I've not managed to test the queries out yet. But I am planning to test out in my lab environment.* *Sorry am really cautious about this as those are the main production databases."* As a dispassionate third-party observer, I can confirm that all SELECT and SHOW queries from Laurenz's blog post are read-only. They're completely safe to run in the affected environment. On Thu, Jan 23, 2025 at 6:40=E2=80=AFAM Paul Brindusa wrote: > Hopefully the below is going to give a little bit more insight on the > issue. > I will mention as well that the cluster also replicates data to another > mysql database if it's relevant at all. > Also worth noting this is our production cluster and we have another > pre-production cluster with basically the same settings and the issue the= re > does not occur. > > A good deal more information is needed to troubleshoot this: > > 1) Postgres version(s). > > postgres (PostgreSQL) 15.10 > > 2) The Patroni version. > > patroni 4.0.4 > > 3) The Patroni configuration. > > scope: postgres-cluster > name: db01 > namespace: /service/ > > log: > level: INFO > traceback_level: ERROR > format: "%(asctime)s %(levelname)s: %(message)s" > dateformat: "" > max_queue_size: 1000 > dir: /var/log/patroni > file_num: 4 > file_size: 25000000 > loggers: > patroni.postmaster: WARNING > urllib3: WARNING > > restapi: > listen: x.x.x.98:8008 > connect_address: x.x.x.98:8008 > > etcd3: > hosts: db01.local:2379,db02.local:2379,db03.local:2379 > > > bootstrap: > dcs: > ttl: 30 > loop_wait: 10 > retry_timeout: 10 > maximum_lag_on_failover: 1048576 > postgresql: > use_pg_rewind: true > use_slots: true > parameters: > max_connections: 500 > superuser_reserved_connections: 5 > password_encryption: scram-sha-256 > max_locks_per_transaction: 512 > max_prepared_transactions: 0 > huge_pages: try > shared_buffers: 128MB > effective_cache_size: 4GB > work_mem: 128MB > maintenance_work_mem: 256MB > checkpoint_timeout: 15min > checkpoint_completion_target: 0.9 > min_wal_size: 80MB > max_wal_size: 1GB > wal_buffers: 32MB > default_statistics_target: 1000 > seq_page_cost: 1 > random_page_cost: 4 > effective_io_concurrency: 2 > synchronous_commit: on > autovacuum: on > autovacuum_max_workers: 5 > autovacuum_vacuum_scale_factor: 0.01 > autovacuum_analyze_scale_factor: 0.01 > autovacuum_vacuum_cost_limit: 500 > autovacuum_vacuum_cost_delay: 2 > autovacuum_naptime: 1s > max_files_per_process: 4096 > archive_mode: on > archive_timeout: 1800s > archive_command: cd . > wal_level: replica > wal_keep_size: 2GB > max_wal_senders: 10 > max_replication_slots: 10 > hot_standby: on > wal_log_hints: on > wal_compression: on > shared_preload_libraries: pgaudit > track_io_timing: on > log_lock_waits: on > log_temp_files: 0 > track_activities: on > track_counts: on > track_functions: all > log_checkpoints: on > logging_collector: on > log_truncate_on_rotation: on > log_rotation_age: 1d > log_rotation_size: 1GB > log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a,clie= nt=3D%h ' > log_filename: postgresql-%Y-%m-%d.log > log_directory: /var/log/pgsql > log_connections: on > log_disconnections: on > log_statement: ddl > log_error_verbosity: verbose > hot_standby_feedback: on > max_standby_streaming_delay: 30s > wal_receiver_status_interval: 10s > idle_in_transaction_session_timeout: 10min > jit: off > max_worker_processes: 24 > max_parallel_workers: 8 > max_parallel_workers_per_gather: 2 > max_parallel_maintenance_workers: 2 > > initdb: > - encoding: UTF8 > - data-checksums > > pg_hba: > - host replication replicator 127.0.0.1/32 md5 > > - host replication replicator x.x.x.98/27 scram-sha-256 > > > > - host replication replicator x.x.x.99/27 scram-sha-256 > > > > - host replication replicator x.x.x.100/27 scram-sha-256 > > > - host all all 0.0.0.0/0 md5 > > postgresql: > listen: x.x.x.98:5432 > connect_address: x.x.x.98:5432 > data_dir: /var/lib/pgsql/data > bin_dir: /usr/bin > pgpass: /var/lib/pgsql/.pgpass_patroni > authentication: > replication: > username: replicator > password: password > superuser: > username: postgres > password: password > parameters: > unix_socket_directories: /var/run/postgresql > > remove_data_directory_on_rewind_failure: false > remove_data_directory_on_diverged_timelines: false > > create_replica_methods: > - basebackup > basebackup: > max-rate: '100M' > checkpoint: 'fast' > > watchdog: > mode: required > device: /dev/watchdog > safety_margin: 5 > > tags: > nofailover: false > noloadbalance: false > clonefrom: false > nosync: false > > 4) Definition of 'ridiculous rate'. > > 1GB / day > > 5) Relevant information from the logs. > > Below entry is something taken off today's log until this point in time > which I think it might be relevant. I cannot see any specifics. If there = is > anything else please let me know. > > 2: GMT [186889]: [863-1] db=3D,user=3D,app=3D,client= =3D LOG: > 00000: checkpoint starting: time > 2: GMT [186889]: [864-1] db=3D,user=3D,app=3D,client= =3D > LOCATION: LogCheckpointStart, xlog.c:6121 > 2: GMT [186889]: [865-1] db=3D,user=3D,app=3D,client= =3D LOG: > 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) added= , > 0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total=3D6.619 s; = sync > files=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 kB, esti= mate=3D56426 > kB > 2: GMT [186889]: [866-1] db=3D,user=3D,app=3D,client= =3D > LOCATION: LogCheckpointEnd, xlog.c:6202 > 2: GMT [2439188]: [7-1] > db=3Ddocumentation-database,user=3Ddocumentation-database-user,app=3DPost= greSQL > JDBC Driver,client=3D LOG: 00000: disconnection: session time: > 0: user=3Ddocumentation-database-user > database=3Ddocumentation-database host=3D port=3D56170 > > > @Laurenz > > I guess you are referring to > https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ > > *Yes, that is the one.* > > I listed all the reasons I know for your predicament. > Did you do some research along these lines? > > *I've had a look at the things that you have mentioned in the guide. * > > If yes, what did you find? > > *I've not managed to test the queries out yet. But I am planning to test > out in my lab environment.* > *Sorry am really cautious about this as those are the main production > databases.* > > *Hope the above is going to give a bit of insight on the root cause of th= e > problem.* > > > > Yours, > Laurenz Albe > > > > On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klaver > wrote: > >> On 1/22/25 09:33, Paul Brindusa wrote: >> > Good afternoon, >> > >> > Following below we are facing a similar issue and im getting a real >> buzz >> > to get this working myself, speaking to my DBA in the company has >> > actually left me a bit cold as he is not good with postgres. >> > >> > So I want to try and get a solution for this and fix this issue with >> the >> > pg_wal files filling up the drive at a ridiculous rate. I have been >> > manually moving logs to a different directory but have had no luck in >> > finding an actual solution. >> > >> > The cluster is a 3 node cluster with HA which is running wirth patroni= . >> > >> > Please help me out, I will mention that I have test cluster spun up in >> > case something needs testing. >> > >> > Also want to give a shout out to Lorenz Albe's for posting stuff about >> > wal files on his company blog. >> > >> > Again any help will be greatly appreciated. >> >> A good deal more information is needed to troubleshoot this: >> >> 1) Postgres version(s). >> >> 2) The Patroni version. >> >> 3) The Patroni configuration. >> >> 4) Definition of 'ridiculous rate'. >> >> 5) Relevant information from the logs. >> >> > >> > >> > " On one of our postgres instances we have the pg_wal/data folder up t= o >> > 196GB, out of 200GB disk filled up. >> > This has stopped the posgresql.service this morning causing two >> > applications to crash. >> > Unfortunately our database admin is on leave today, and we are trying >> to >> > figure out how to get the disk down? >> > Any ideas or suggestions are more than welcome. >> > >> > Thank you in advance." >> > >> > >> > -- >> > Kind Regards, >> > Paul Brindusa >> > paulbrindusa88@gmail.com >> > >> >> -- >> Adrian Klaver >> adrian.klaver@aklaver.com >> >> > > -- > Kind Regards, > Paul Brindusa > paulbrindusa88@gmail.com > > --000000000000a9ce06062c606707 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hey Paul,

Regarding=C2= =A0

"I've not managed to test the queries out ye= t. But I am planning to test out in my lab environment.
<= b>Sorry am really cautious about this as those are the main production d= atabases."

=
As a dispassionate third-party observer, I can confirm that all SELECT= and SHOW queries from Laurenz's blog post are read-only. They're c= ompletely safe to run in the affected environment.
=C2=A0=C2=A0
On Thu, Jan 23, 2025 at 6:40=E2=80=AFAM Paul = Brindusa <paulbrindusa88@gma= il.com> wrote:
Hopefully the below is going t= o give a little bit more insight on the issue.
I will mention as = well that the cluster also replicates data to another mysql database if it&= #39;s relevant at all.
Also worth noting this is our production c= luster and we have another pre-production cluster with basically the same s= ettings and the issue there does not occur.

A good deal= more information is needed to troubleshoot this:

1) Postgres versio= n(s).

postgres (PostgreSQL) 15.10

2) The Patroni = version.

patroni 4.0.4

3) The Patroni confi= guration.

scope: postgres-cluster
name: db01
namespace: /servi= ce/

log:
=C2=A0 level: INFO
=C2=A0 traceback_level: ERROR
= =C2=A0 format: "%(asctime)s %(levelname)s: %(message)s"
=C2=A0= dateformat: ""
=C2=A0 max_queue_size: 1000
=C2=A0 dir: /va= r/log/patroni
=C2=A0 file_num: 4
=C2=A0 file_size: 25000000
=C2=A0= loggers:
=C2=A0 =C2=A0 patroni.postmaster: WARNING
=C2=A0 =C2=A0 url= lib3: WARNING

restapi:
=C2=A0 listen: x.x.x.98:8008
=C2=A0 con= nect_address: x.x.x.98:8008

etcd3:
=C2=A0 hosts: db01.local:2379,= db02.local:2379,db03.local:2379


bootstrap:
=C2=A0 dcs:
=C2= =A0 =C2=A0 ttl: 30
=C2=A0 =C2=A0 loop_wait: 10
=C2=A0 =C2=A0 retry_ti= meout: 10
=C2=A0 =C2=A0 maximum_lag_on_failover: 1048576
=C2=A0 =C2= =A0 postgresql:
=C2=A0 =C2=A0 =C2=A0 use_pg_rewind: true
=C2=A0 =C2= =A0 =C2=A0 use_slots: true
=C2=A0 =C2=A0 =C2=A0 parameters:
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 max_connections: 500
=C2=A0 =C2=A0 =C2=A0 =C2=A0 su= peruser_reserved_connections: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 password_enc= ryption: scram-sha-256
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_locks_per_transac= tion: 512
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_prepared_transactions: 0
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 huge_pages: try =C2=A0 =C2=A0 =C2=A0
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 shared_buffers: 128MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 e= ffective_cache_size: 4GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 work_mem: 128MB
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance_work_mem: 256MB
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 checkpoint_timeout: 15min
=C2=A0 =C2=A0 =C2=A0 =C2=A0 chec= kpoint_completion_target: 0.9
=C2=A0 =C2=A0 =C2=A0 =C2=A0 min_wal_size: = 80MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_wal_size: 1GB
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 wal_buffers: 32MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 default_statist= ics_target: 1000
=C2=A0 =C2=A0 =C2=A0 =C2=A0 seq_page_cost: 1
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 random_page_cost: 4
=C2=A0 =C2=A0 =C2=A0 =C2=A0 eff= ective_io_concurrency: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 synchronous_commit:= on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum: on
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 autovacuum_max_workers: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_= vacuum_scale_factor: 0.01
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_analyze= _scale_factor: 0.01
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_l= imit: 500
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_delay: 2=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_naptime: 1s
=C2=A0 =C2=A0 =C2=A0= =C2=A0 max_files_per_process: 4096
=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_= mode: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_timeout: 1800s
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 archive_command: cd .
=C2=A0 =C2=A0 =C2=A0 =C2=A0 w= al_level: replica
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_keep_size: 2GB
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 max_wal_senders: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0= max_replication_slots: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_standby: on=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_log_hints: on
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 wal_compression: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 shared_preload_libra= ries: pgaudit
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_io_timing: on
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 log_lock_waits: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_= temp_files: 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_activities: on
=C2=A0= =C2=A0 =C2=A0 =C2=A0 track_counts: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track= _functions: all
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_checkpoints: on
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 logging_collector: on
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 log_truncate_on_rotation: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotatio= n_age: 1d
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_size: 1GB
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%= u,app=3D%a,client=3D%h '
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_filename: p= ostgresql-%Y-%m-%d.log
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_directory: /var/l= og/pgsql
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_connections: on
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 log_disconnections: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log= _statement: ddl
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_error_verbosity: verbose=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_standby_feedback: on
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 max_standby_streaming_delay: 30s
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 wal_receiver_status_interval: 10s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 idle_i= n_transaction_session_timeout: 10min
=C2=A0 =C2=A0 =C2=A0 =C2=A0 jit: of= f
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_worker_processes: 24
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 max_parallel_workers: 8
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_pa= rallel_workers_per_gather: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_ma= intenance_workers: 2 =C2=A0 =C2=A0 =C2=A0

=C2=A0 initdb:
=C2=A0 -= encoding: UTF8
=C2=A0 - data-checksums

=C2=A0 pg_hba:
=C2=A0 = - host replication replicator 127.0.0.1/32 md5
=C2=A0
=C2=A0 - host replication replicato= r x.x.x.98/27 scram-sha-256
=C2=A0
=C2=A0
=C2=A0
=C2=A0 - ho= st replication replicator x.x.x.99/27 scram-sha-256
=C2=A0
=C2=A0 =C2=A0
=C2=A0 - host replication replicator x.x.x.100/27 scram-sha-25= 6
=C2=A0
=C2=A0 =C2=A0
=C2=A0 - host all all 0.0.0.0/0 md5

postgresql:
=C2=A0 = listen: x.x.x.98:5432
=C2=A0 connect_address: x.x.x.98:5432
=C2=A0 da= ta_dir: /var/lib/pgsql/data
=C2=A0 bin_dir: /usr/bin
=C2=A0 pgpass: /= var/lib/pgsql/.pgpass_patroni
=C2=A0 authentication:
=C2=A0 =C2=A0 re= plication:
=C2=A0 =C2=A0 =C2=A0 username: replicator
=C2=A0 =C2=A0 = =C2=A0 password: password
=C2=A0 =C2=A0 superuser:
=C2=A0 =C2=A0 =C2= =A0 username: postgres
=C2=A0 =C2=A0 =C2=A0 password: password
=C2=A0= parameters:
=C2=A0 =C2=A0 unix_socket_directories: /var/run/postgresql<= br>
=C2=A0 remove_data_directory_on_rewind_failure: false
=C2=A0 remo= ve_data_directory_on_diverged_timelines: false

=C2=A0 create_replica= _methods:
=C2=A0 =C2=A0 - basebackup
=C2=A0 basebackup:
=C2=A0 =C2= =A0 max-rate: '100M'
=C2=A0 =C2=A0 checkpoint: 'fast' = =C2=A0 =C2=A0 =C2=A0

watchdog:
=C2=A0 mode: required
=C2=A0 de= vice: /dev/watchdog
=C2=A0 safety_margin: 5

tags:
=C2=A0 nofai= lover: false
=C2=A0 noloadbalance: false
=C2=A0 clonefrom: false
= =C2=A0 nosync: false

4) Definition of 'ridiculous rate'.
=
1GB / day

5) Relevant information from the logs.

<= /div>
Below entry is something taken off today's log=C2=A0 until th= is point in time which I think it might be relevant. I cannot see any speci= fics. If there is anything else please let me know.=C2=A0

2<REDACTED>:<REDACTED> GMT [186889]: [863-1] db=3D,user= =3D,app=3D,client=3D LOG: =C2=A000000: checkpoint starting: time
2<RE= DACTED>:<REDACTED> GMT [186889]: [864-1] db=3D,user=3D,app=3D,clie= nt=3D LOCATION: =C2=A0LogCheckpointStart, xlog.c:6121
2<REDACTED>:= <REDACTED> GMT [186889]: [865-1] db=3D,user=3D,app=3D,client=3D LOG: = =C2=A000000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) ad= ded, 0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total=3D6.619 = s; sync files=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 kB= , estimate=3D56426 kB
2<REDACTED>:<REDACTED> GMT [186889]: [= 866-1] db=3D,user=3D,app=3D,client=3D LOCATION: =C2=A0LogCheckpointEnd, xlo= g.c:6202
2<REDACTED>:<REDACTED> GMT [2439188]: [7-1] db=3Ddo= cumentation-database,user=3Ddocumentation-database-user,app=3DPostgreSQL JD= BC Driver,client=3D<REDACTED> LOG: =C2=A000000: disconnection: sessio= n time: 0:<REDACTED> user=3Ddocumentation-database-user database=3Ddo= cumentation-database host=3D<REDACTED> port=3D56170



I guess you are referring to
https://www.cybertec-postgresql.com= /en/why-does-my-pg_wal-keep-growing/

Yes, that is the one.

I listed all the reasons I know for your predicament.
Did you d= o some research along these lines?

I've had= a look at the things that you have mentioned in the guide.=C2=A0
=

If yes, what did you find?

I'v= e not managed to test the queries out yet. But I am planning to test out in= my lab environment.
Sorry am really cautious about this a= s those are the main production databases.

<= div>Hope the above is going to give a bit of insight on the root cause o= f the problem.



Y= ours,
Laurenz Albe



On Wed, Jan 22, 20= 25 at 6:03=E2=80=AFPM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/22/25 09:33, Paul= Brindusa wrote:
> Good afternoon,
>
> Following below we are facing a similar issue and im getting a real bu= zz
> to get this working myself, speaking to my DBA=C2=A0 in the=C2=A0 comp= any has
> actually left me a bit cold as he is not good with postgres.
>
> So I want to try and get a solution for this and fix this issue with t= he
> pg_wal files filling up the drive at a ridiculous rate. I have been > manually moving logs to a different directory but have had no luck in =
> finding an actual solution.
>
> The cluster is a 3 node cluster with HA which is running wirth patroni= .
>
> Please help me out, I will mention that I have test cluster spun up in=
> case something needs testing.
>
> Also want to give a shout out to Lorenz Albe's for posting stuff a= bout
> wal files on his company blog.
>
> Again any help will be greatly appreciated.

A good deal more information is needed to troubleshoot this:

1) Postgres version(s).

2) The Patroni version.

3) The Patroni configuration.

4) Definition of 'ridiculous rate'.

5) Relevant information from the logs.

>
>
> " On one of our postgres instances we have the pg_wal/data folder= up to
> 196GB, out of 200GB disk filled up.
> This has stopped the posgresql.service this morning causing two
> applications to crash.
> Unfortunately our database admin is on leave today, and we are trying = to
> figure out how to get the disk down?
> Any ideas or suggestions are more than welcome.
>
> Thank you in advance."
>
>
> --
> Kind Regards,
> Paul Brindusa
> paulbrin= dusa88@gmail.com <mailto:paulbrindusa88@gmail.com>
>

--
Adrian Klaver
adrian.klave= r@aklaver.com



--
Kind Regards,
Paul Brindusa
=

--000000000000a9ce06062c606707--