Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tayOA-006rYI-9T for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 14:40:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tayO8-00GcUX-UI for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 14:40:56 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tayO8-00GcUP-H5 for pgsql-general@lists.postgresql.org; Thu, 23 Jan 2025 14:40:56 +0000 Received: from mail-ed1-x533.google.com ([2a00:1450:4864:20::533]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tayO5-0016Pv-0M for pgsql-general@postgresql.org; Thu, 23 Jan 2025 14:40:56 +0000 Received: by mail-ed1-x533.google.com with SMTP id 4fb4d7f45d1cf-5dc0522475eso2164726a12.1 for ; Thu, 23 Jan 2025 06:40:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737643252; x=1738248052; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dXJ+cyp7MTOOCozaEmFZMa5Z1Sg1X+0qoBIJvcEF3JI=; b=XhCZLhUOi/C/sEjUH0WpG0O5I9GmyIKVooaXnUR4GPLd+IrWnaUJcyPkyZr/hJi1UL UirfTAsva88yHhsKypSXW6sQMFSSkEMvXTEkPvHyrXc/KpQt1yynnnRym3oyB36DilvX Bd8W6mI8JBa+WrTRC4wAD+Tw4ArnlfRbNQtywvBTzaUcEvwzWtJkg+O+ILEag5UJiquI R+7t8CrPdsah/PHwxoq6HjYzLSH6qKfNef35HT252FRwEfPjQnpoj8P/iyccQOW44RDD e8iD44CyY5p57A2WridVxSMZqOf2UlFqAfGHzeETp1+fzIxme6m5ZemDp2ZQzvV5BCls qxfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737643252; x=1738248052; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dXJ+cyp7MTOOCozaEmFZMa5Z1Sg1X+0qoBIJvcEF3JI=; b=FJFYBzcAfNor8lzLmbh/p2CD0lV5tFDTFlXpDAkI9UhvMbsHUP5nLZn1mQ2EUhuzlg oHq0NJRnqllQ7/Jt4UmUuJadmitqH9VodvBpH6E40/wU+Za7Umx1YILL8ghanrMCZ8B1 lcCw95EstOy735b7YioYuoODK70Kxrop4JFQ0+s9tj2cA6Jfqc1M7HPeLh+YasAerG5y 6a1z5XIESVKtsFzViMmAw9wk8LUq06XdN5Ad/Ty90kfZQSOwZ+URGQguixSBNTR+gLX7 qu8ZRyFvNNmuhJM0dvMOVlPdizfDl3aVCZA7ZBwLkci9qeVZQRdwGiM1sex87CrtYEgl 9wRg== X-Gm-Message-State: AOJu0Yy/Cs0U6Iyk/3uJk9J2zc7U9ZFF5F4MKOz9LpI5amByFPwkavev GjTk2qkVhc3v7b37//RjV8NEVZANe94+e/TyVTzTU7j+8RnZqHE0PPEO801xRrU1u7s0DYK0w2D Q61sdwzmCZ06rphsoN0xEJyuXYwtLn8i1 X-Gm-Gg: ASbGnct9mhCtuFBUc3MlgtABFd9HS5iTDpjRVUmmy6TV3xnSPNgbnzkjpH05SHPto4S ieMmP9s0SC61+Sdp8NnKKQS/h7jpt2R98fkGXsDcOda3z7rjwdWirmvjbaWJu X-Google-Smtp-Source: AGHT+IEdlLOFZh5p3/pPlQW0CdPzg/oRg14e9w1tTpIa+AYgAfdNvy5XLmYkhGBHwhvkvxtcHDB0opO2+8QxAiF0Q6Q= X-Received: by 2002:a17:907:706:b0:ab2:eb1a:9471 with SMTP id a640c23a62f3a-ab38b3cf59fmr2686627666b.48.1737643252024; Thu, 23 Jan 2025 06:40:52 -0800 (PST) MIME-Version: 1.0 References: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> In-Reply-To: From: Paul Brindusa Date: Thu, 23 Jan 2025 14:40:40 +0000 X-Gm-Features: AbW1kvY4pRS8s6GwSKCQvsyb-eMb6yEomEV9imvIG0Ck1MP7qIX1GuRZjsjmMAU Message-ID: Subject: Re: Return of the pg_wal issue.. To: Saul Perdomo Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000c890b3062c609837" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c890b3062c609837 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Saul, Fantastic, highly appreciate that. Initially i've taken the top bit of Laurenz's post to understand what these wal files are. Any help is more than welcome, apologies for not checking that bit earlier. In the meantime i've checked those queries as well. Thank you On Thu, Jan 23, 2025 at 2:27=E2=80=AFPM Saul Perdomo wrote: > Hey Paul, > > Regarding > > *"I've not managed to test the queries out yet. But I am planning to test > out in my lab environment.* > *Sorry am really cautious about this as those are the main production > databases."* > > As a dispassionate third-party observer, I can confirm that all SELECT an= d > SHOW queries from Laurenz's blog post are read-only. They're completely > safe to run in the affected environment. > > On Thu, Jan 23, 2025 at 6:40=E2=80=AFAM Paul Brindusa > wrote: > >> Hopefully the below is going to give a little bit more insight on the >> issue. >> I will mention as well that the cluster also replicates data to another >> mysql database if it's relevant at all. >> Also worth noting this is our production cluster and we have another >> pre-production cluster with basically the same settings and the issue th= ere >> does not occur. >> >> A good deal more information is needed to troubleshoot this: >> >> 1) Postgres version(s). >> >> postgres (PostgreSQL) 15.10 >> >> 2) The Patroni version. >> >> patroni 4.0.4 >> >> 3) The Patroni configuration. >> >> scope: postgres-cluster >> name: db01 >> namespace: /service/ >> >> log: >> level: INFO >> traceback_level: ERROR >> format: "%(asctime)s %(levelname)s: %(message)s" >> dateformat: "" >> max_queue_size: 1000 >> dir: /var/log/patroni >> file_num: 4 >> file_size: 25000000 >> loggers: >> patroni.postmaster: WARNING >> urllib3: WARNING >> >> restapi: >> listen: x.x.x.98:8008 >> connect_address: x.x.x.98:8008 >> >> etcd3: >> hosts: db01.local:2379,db02.local:2379,db03.local:2379 >> >> >> bootstrap: >> dcs: >> ttl: 30 >> loop_wait: 10 >> retry_timeout: 10 >> maximum_lag_on_failover: 1048576 >> postgresql: >> use_pg_rewind: true >> use_slots: true >> parameters: >> max_connections: 500 >> superuser_reserved_connections: 5 >> password_encryption: scram-sha-256 >> max_locks_per_transaction: 512 >> max_prepared_transactions: 0 >> huge_pages: try >> shared_buffers: 128MB >> effective_cache_size: 4GB >> work_mem: 128MB >> maintenance_work_mem: 256MB >> checkpoint_timeout: 15min >> checkpoint_completion_target: 0.9 >> min_wal_size: 80MB >> max_wal_size: 1GB >> wal_buffers: 32MB >> default_statistics_target: 1000 >> seq_page_cost: 1 >> random_page_cost: 4 >> effective_io_concurrency: 2 >> synchronous_commit: on >> autovacuum: on >> autovacuum_max_workers: 5 >> autovacuum_vacuum_scale_factor: 0.01 >> autovacuum_analyze_scale_factor: 0.01 >> autovacuum_vacuum_cost_limit: 500 >> autovacuum_vacuum_cost_delay: 2 >> autovacuum_naptime: 1s >> max_files_per_process: 4096 >> archive_mode: on >> archive_timeout: 1800s >> archive_command: cd . >> wal_level: replica >> wal_keep_size: 2GB >> max_wal_senders: 10 >> max_replication_slots: 10 >> hot_standby: on >> wal_log_hints: on >> wal_compression: on >> shared_preload_libraries: pgaudit >> track_io_timing: on >> log_lock_waits: on >> log_temp_files: 0 >> track_activities: on >> track_counts: on >> track_functions: all >> log_checkpoints: on >> logging_collector: on >> log_truncate_on_rotation: on >> log_rotation_age: 1d >> log_rotation_size: 1GB >> log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a,cli= ent=3D%h ' >> log_filename: postgresql-%Y-%m-%d.log >> log_directory: /var/log/pgsql >> log_connections: on >> log_disconnections: on >> log_statement: ddl >> log_error_verbosity: verbose >> hot_standby_feedback: on >> max_standby_streaming_delay: 30s >> wal_receiver_status_interval: 10s >> idle_in_transaction_session_timeout: 10min >> jit: off >> max_worker_processes: 24 >> max_parallel_workers: 8 >> max_parallel_workers_per_gather: 2 >> max_parallel_maintenance_workers: 2 >> >> initdb: >> - encoding: UTF8 >> - data-checksums >> >> pg_hba: >> - host replication replicator 127.0.0.1/32 md5 >> >> - host replication replicator x.x.x.98/27 scram-sha-256 >> >> >> >> - host replication replicator x.x.x.99/27 scram-sha-256 >> >> >> >> - host replication replicator x.x.x.100/27 scram-sha-256 >> >> >> - host all all 0.0.0.0/0 md5 >> >> postgresql: >> listen: x.x.x.98:5432 >> connect_address: x.x.x.98:5432 >> data_dir: /var/lib/pgsql/data >> bin_dir: /usr/bin >> pgpass: /var/lib/pgsql/.pgpass_patroni >> authentication: >> replication: >> username: replicator >> password: password >> superuser: >> username: postgres >> password: password >> parameters: >> unix_socket_directories: /var/run/postgresql >> >> remove_data_directory_on_rewind_failure: false >> remove_data_directory_on_diverged_timelines: false >> >> create_replica_methods: >> - basebackup >> basebackup: >> max-rate: '100M' >> checkpoint: 'fast' >> >> watchdog: >> mode: required >> device: /dev/watchdog >> safety_margin: 5 >> >> tags: >> nofailover: false >> noloadbalance: false >> clonefrom: false >> nosync: false >> >> 4) Definition of 'ridiculous rate'. >> >> 1GB / day >> >> 5) Relevant information from the logs. >> >> Below entry is something taken off today's log until this point in time >> which I think it might be relevant. I cannot see any specifics. If there= is >> anything else please let me know. >> >> 2: GMT [186889]: [863-1] db=3D,user=3D,app=3D,client= =3D LOG: >> 00000: checkpoint starting: time >> 2: GMT [186889]: [864-1] db=3D,user=3D,app=3D,client= =3D >> LOCATION: LogCheckpointStart, xlog.c:6121 >> 2: GMT [186889]: [865-1] db=3D,user=3D,app=3D,client= =3D LOG: >> 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) adde= d, >> 0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total=3D6.619 s;= sync >> files=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 kB, est= imate=3D56426 >> kB >> 2: GMT [186889]: [866-1] db=3D,user=3D,app=3D,client= =3D >> LOCATION: LogCheckpointEnd, xlog.c:6202 >> 2: GMT [2439188]: [7-1] >> db=3Ddocumentation-database,user=3Ddocumentation-database-user,app=3DPos= tgreSQL >> JDBC Driver,client=3D LOG: 00000: disconnection: session time= : >> 0: user=3Ddocumentation-database-user >> database=3Ddocumentation-database host=3D port=3D56170 >> >> >> @Laurenz >> >> I guess you are referring to >> https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ >> >> *Yes, that is the one.* >> >> I listed all the reasons I know for your predicament. >> Did you do some research along these lines? >> >> *I've had a look at the things that you have mentioned in the guide. * >> >> If yes, what did you find? >> >> *I've not managed to test the queries out yet. But I am planning to test >> out in my lab environment.* >> *Sorry am really cautious about this as those are the main production >> databases.* >> >> *Hope the above is going to give a bit of insight on the root cause of >> the problem.* >> >> >> >> Yours, >> Laurenz Albe >> >> >> >> On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klaver >> wrote: >> >>> On 1/22/25 09:33, Paul Brindusa wrote: >>> > Good afternoon, >>> > >>> > Following below we are facing a similar issue and im getting a real >>> buzz >>> > to get this working myself, speaking to my DBA in the company has >>> > actually left me a bit cold as he is not good with postgres. >>> > >>> > So I want to try and get a solution for this and fix this issue with >>> the >>> > pg_wal files filling up the drive at a ridiculous rate. I have been >>> > manually moving logs to a different directory but have had no luck in >>> > finding an actual solution. >>> > >>> > The cluster is a 3 node cluster with HA which is running wirth patron= i. >>> > >>> > Please help me out, I will mention that I have test cluster spun up i= n >>> > case something needs testing. >>> > >>> > Also want to give a shout out to Lorenz Albe's for posting stuff abou= t >>> > wal files on his company blog. >>> > >>> > Again any help will be greatly appreciated. >>> >>> A good deal more information is needed to troubleshoot this: >>> >>> 1) Postgres version(s). >>> >>> 2) The Patroni version. >>> >>> 3) The Patroni configuration. >>> >>> 4) Definition of 'ridiculous rate'. >>> >>> 5) Relevant information from the logs. >>> >>> > >>> > >>> > " On one of our postgres instances we have the pg_wal/data folder up >>> to >>> > 196GB, out of 200GB disk filled up. >>> > This has stopped the posgresql.service this morning causing two >>> > applications to crash. >>> > Unfortunately our database admin is on leave today, and we are trying >>> to >>> > figure out how to get the disk down? >>> > Any ideas or suggestions are more than welcome. >>> > >>> > Thank you in advance." >>> > >>> > >>> > -- >>> > Kind Regards, >>> > Paul Brindusa >>> > paulbrindusa88@gmail.com >>> > >>> >>> -- >>> Adrian Klaver >>> adrian.klaver@aklaver.com >>> >>> >> >> -- >> Kind Regards, >> Paul Brindusa >> paulbrindusa88@gmail.com >> >> --=20 Kind Regards, Paul Brindusa paulbrindusa88@gmail.com --000000000000c890b3062c609837 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Saul,

Fantastic, highly appreciate t= hat.
Initially i've taken=C2=A0 the top bit of Laurenz's post to= understand what these wal files are.
Any help is more than welco= me, apologies for not checking that bit earlier. In the meantime i've c= hecked those queries as well.

Thank you
=

On Thu, Jan 23, 2025 at 2:27=E2=80=AFPM Sau= l Perdomo <saul.perdomo@gmail.= com> wrote:
Hey Paul,

Regarding= =C2=A0

"I've not managed to test the queries out= yet. But I am planning to test out in my lab environment.
Sorry am really cautious about this as those are the main productio= n databases."

As a dispassionate third-party obs= erver, I can confirm that all SELECT and SHOW queries from Laurenz's bl= og post are read-only. They're completely safe to run in the affected e= nvironment.
=C2=A0=C2=A0
On Thu, Jan 23, 2025 at 6:40=E2= =80=AFAM Paul Brindusa <paulbrindusa88@gmail.com> wrote:
Hopefully the below is going to give a little bit more insight on the iss= ue.
I will mention as well that the cluster also replicates data = to another mysql database if it's relevant at all.
Also worth= noting this is our production cluster and we have another pre-production c= luster with basically the same settings and the issue there does not occur.=

A good deal more information is needed to troubleshoot= this:

1) Postgres version(s).

postgres (PostgreS= QL) 15.10

2) The Patroni version.

patroni 4= .0.4

3) The Patroni configuration.

scope: postgres-clustername: db01
namespace: /service/

log:
=C2=A0 level: INFO
= =C2=A0 traceback_level: ERROR
=C2=A0 format: "%(asctime)s %(levelna= me)s: %(message)s"
=C2=A0 dateformat: ""
=C2=A0 max_qu= eue_size: 1000
=C2=A0 dir: /var/log/patroni
=C2=A0 file_num: 4
=C2= =A0 file_size: 25000000
=C2=A0 loggers:
=C2=A0 =C2=A0 patroni.postmas= ter: WARNING
=C2=A0 =C2=A0 urllib3: WARNING

restapi:
=C2=A0 li= sten: x.x.x.98:8008
=C2=A0 connect_address: x.x.x.98:8008

etcd3:<= br>=C2=A0 hosts: db01.local:2379,db02.local:2379,db03.local:2379

bootstrap:
=C2=A0 dcs:
=C2=A0 =C2=A0 ttl: 30
=C2=A0 =C2=A0 loop_w= ait: 10
=C2=A0 =C2=A0 retry_timeout: 10
=C2=A0 =C2=A0 maximum_lag_on_= failover: 1048576
=C2=A0 =C2=A0 postgresql:
=C2=A0 =C2=A0 =C2=A0 use_= pg_rewind: true
=C2=A0 =C2=A0 =C2=A0 use_slots: true
=C2=A0 =C2=A0 = =C2=A0 parameters:
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_connections: 500
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 superuser_reserved_connections: 5
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 password_encryption: scram-sha-256
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 max_locks_per_transaction: 512
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ma= x_prepared_transactions: 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 huge_pages: try = =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 shared_buffers: 128MB=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_cache_size: 4GB
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 work_mem: 128MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance_wo= rk_mem: 256MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 checkpoint_timeout: 15min
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 checkpoint_completion_target: 0.9
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 min_wal_size: 80MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_wal= _size: 1GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_buffers: 32MB
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 default_statistics_target: 1000
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 seq_page_cost: 1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 random_page_cost: 4<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_io_concurrency: 2
=C2=A0 =C2=A0= =C2=A0 =C2=A0 synchronous_commit: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autova= cuum: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_max_workers: 5
=C2=A0= =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_scale_factor: 0.01
=C2=A0 =C2=A0= =C2=A0 =C2=A0 autovacuum_analyze_scale_factor: 0.01
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 autovacuum_vacuum_cost_limit: 500
=C2=A0 =C2=A0 =C2=A0 =C2=A0= autovacuum_vacuum_cost_delay: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_= naptime: 1s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_files_per_process: 4096
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_mode: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0= archive_timeout: 1800s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_command: cd = .
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_level: replica
=C2=A0 =C2=A0 =C2=A0= =C2=A0 wal_keep_size: 2GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_wal_senders: = 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_replication_slots: 10
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 hot_standby: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_log_hi= nts: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_compression: on
=C2=A0 =C2=A0= =C2=A0 =C2=A0 shared_preload_libraries: pgaudit
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 track_io_timing: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_lock_waits: o= n
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_temp_files: 0
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 track_activities: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_counts: on=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_functions: all
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 log_checkpoints: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 logging_colle= ctor: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_truncate_on_rotation: on
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_age: 1d
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 log_rotation_size: 1GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_line_prefix: = '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a,client=3D%h '
=C2=A0= =C2=A0 =C2=A0 =C2=A0 log_filename: postgresql-%Y-%m-%d.log
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 log_directory: /var/log/pgsql
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 log_connections: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_disconnections:= on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_statement: ddl
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 log_error_verbosity: verbose
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_= standby_feedback: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_standby_streaming_d= elay: 30s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_receiver_status_interval: 10s<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 idle_in_transaction_session_timeout: 10min=C2=A0 =C2=A0 =C2=A0 =C2=A0 jit: off
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_w= orker_processes: 24
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_workers: 8<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_workers_per_gather: 2
=C2=A0= =C2=A0 =C2=A0 =C2=A0 max_parallel_maintenance_workers: 2 =C2=A0 =C2=A0 =C2= =A0

=C2=A0 initdb:
=C2=A0 - encoding: UTF8
=C2=A0 - data-check= sums

=C2=A0 pg_hba:
=C2=A0 - host replication replicator 127.0.0.1/32 md5
=C2=A0 <= br>=C2=A0 - host replication replicator x.x.x.98/27 scram-sha-256
=C2=A0=
=C2=A0
=C2=A0
=C2=A0 - host replication replicator x.x.x.99/27= scram-sha-256
=C2=A0
=C2=A0
=C2=A0
=C2=A0 - host replicatio= n replicator x.x.x.100/27 scram-sha-256
=C2=A0
=C2=A0 =C2=A0
=C2= =A0 - host all all 0.0.0.0/0= md5

postgresql:
=C2=A0 listen: x.x.x.98:5432
=C2=A0 conne= ct_address: x.x.x.98:5432
=C2=A0 data_dir: /var/lib/pgsql/data
=C2=A0= bin_dir: /usr/bin
=C2=A0 pgpass: /var/lib/pgsql/.pgpass_patroni
=C2= =A0 authentication:
=C2=A0 =C2=A0 replication:
=C2=A0 =C2=A0 =C2=A0 u= sername: replicator
=C2=A0 =C2=A0 =C2=A0 password: password
=C2=A0 = =C2=A0 superuser:
=C2=A0 =C2=A0 =C2=A0 username: postgres
=C2=A0 =C2= =A0 =C2=A0 password: password
=C2=A0 parameters:
=C2=A0 =C2=A0 unix_s= ocket_directories: /var/run/postgresql

=C2=A0 remove_data_directory_= on_rewind_failure: false
=C2=A0 remove_data_directory_on_diverged_timeli= nes: false

=C2=A0 create_replica_methods:
=C2=A0 =C2=A0 - basebac= kup
=C2=A0 basebackup:
=C2=A0 =C2=A0 max-rate: '100M'
=C2= =A0 =C2=A0 checkpoint: 'fast' =C2=A0 =C2=A0 =C2=A0

watchdog:=
=C2=A0 mode: required
=C2=A0 device: /dev/watchdog
=C2=A0 safety_= margin: 5

tags:
=C2=A0 nofailover: false
=C2=A0 noloadbalance:= false
=C2=A0 clonefrom: false
=C2=A0 nosync: false

4) Definit= ion of 'ridiculous rate'.

1GB / day

5) Relevant infor= mation from the logs.

Below entry is something tak= en off today's log=C2=A0 until this point in time which I think it migh= t be relevant. I cannot see any specifics. If there is anything else please= let me know.=C2=A0

2<REDACTED>:<REDACTED= > GMT [186889]: [863-1] db=3D,user=3D,app=3D,client=3D LOG: =C2=A000000:= checkpoint starting: time
2<REDACTED>:<REDACTED> GMT [18688= 9]: [864-1] db=3D,user=3D,app=3D,client=3D LOCATION: =C2=A0LogCheckpointSta= rt, xlog.c:6121
2<REDACTED>:<REDACTED> GMT [186889]: [865-1]= db=3D,user=3D,app=3D,client=3D LOG: =C2=A000000: checkpoint complete: wrot= e 66 buffers (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=3D6.= 563 s, sync=3D0.003 s, total=3D6.619 s; sync files=3D22, longest=3D0.002 s,= average=3D0.001 s; distance=3D776 kB, estimate=3D56426 kB
2<REDACTED= >:<REDACTED> GMT [186889]: [866-1] db=3D,user=3D,app=3D,client=3D = LOCATION: =C2=A0LogCheckpointEnd, xlog.c:6202
2<REDACTED>:<REDA= CTED> GMT [2439188]: [7-1] db=3Ddocumentation-database,user=3Ddocumentat= ion-database-user,app=3DPostgreSQL JDBC Driver,client=3D<REDACTED> LO= G: =C2=A000000: disconnection: session time: 0:<REDACTED> user=3Ddocu= mentation-database-user database=3Ddocumentation-database host=3D<REDACT= ED> port=3D56170


=

I guess you are referring to
https://www.cybertec-postgresql.com/en/why-does-my-p= g_wal-keep-growing/

Yes, that is the one.

I listed= all the reasons I know for your predicament.
Did you do some research a= long these lines?

I've had a look at the th= ings that you have mentioned in the guide.=C2=A0

If yes, what did you find?

I've not managed to = test the queries out yet. But I am planning to test out in my lab environme= nt.
Sorry am really cautious about this as those are the m= ain production databases.

Hope the a= bove is going to give a bit of insight on the root cause of the problem.



Yours,
Laurenz = Albe



On Wed, Jan 22, 2025 at 6:03=E2=80= =AFPM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/22/25 09:33, Paul Brindusa wrote:=
> Good afternoon,
>
> Following below we are facing a similar issue and im getting a real bu= zz
> to get this working myself, speaking to my DBA=C2=A0 in the=C2=A0 comp= any has
> actually left me a bit cold as he is not good with postgres.
>
> So I want to try and get a solution for this and fix this issue with t= he
> pg_wal files filling up the drive at a ridiculous rate. I have been > manually moving logs to a different directory but have had no luck in =
> finding an actual solution.
>
> The cluster is a 3 node cluster with HA which is running wirth patroni= .
>
> Please help me out, I will mention that I have test cluster spun up in=
> case something needs testing.
>
> Also want to give a shout out to Lorenz Albe's for posting stuff a= bout
> wal files on his company blog.
>
> Again any help will be greatly appreciated.

A good deal more information is needed to troubleshoot this:

1) Postgres version(s).

2) The Patroni version.

3) The Patroni configuration.

4) Definition of 'ridiculous rate'.

5) Relevant information from the logs.

>
>
> " On one of our postgres instances we have the pg_wal/data folder= up to
> 196GB, out of 200GB disk filled up.
> This has stopped the posgresql.service this morning causing two
> applications to crash.
> Unfortunately our database admin is on leave today, and we are trying = to
> figure out how to get the disk down?
> Any ideas or suggestions are more than welcome.
>
> Thank you in advance."
>
>
> --
> Kind Regards,
> Paul Brindusa
> paulbrin= dusa88@gmail.com <mailto:paulbrindusa88@gmail.com>
>

--
Adrian Klaver
adrian.klave= r@aklaver.com



--
Kind Regards,
Paul Brindusa
=



--
Kind Regards,
Paul Brindusa
=

--000000000000c890b3062c609837--