Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tavZZ-006Rvw-Ee for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 11:40:34 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tavZY-00ENwN-G4 for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 11:40:32 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tavZY-00ENv3-0Z for pgsql-general@lists.postgresql.org; Thu, 23 Jan 2025 11:40:32 +0000 Received: from mail-ej1-x632.google.com ([2a00:1450:4864:20::632]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tavZU-0014vF-3B for pgsql-general@postgresql.org; Thu, 23 Jan 2025 11:40:31 +0000 Received: by mail-ej1-x632.google.com with SMTP id a640c23a62f3a-ab65fca99b6so166406666b.0 for ; Thu, 23 Jan 2025 03:40:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737632428; x=1738237228; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=mEh7OX2cBUqefji3a1b3yeiO1C7e/HS2aWQCE/Vg0ps=; b=GtdtybbSO7qmCVxJPOIDDztOtDjE9BqC3H4HJdjmrJ706gr72hQUzytEP2J0WkzOUP rVVega1gnO9oIUa7WD7GH9qtDhyKube6RJd/cHG1A/ROubtGUUCK9GAHKN5Jc/ODmUsb MhF8kYc4U95Ay0OvITbS6Tuoq1aNSovB+B2hlYRUbSCu5UYDeQAgz6eLXhbTdaDl/C/0 /qpUcwGUWL5Ue9mIJkbjrFB0mV+rohbbzRVjRdZBjZgj9VD0BeZVdC9eJW/HwH+CV7aO 21nj6svFsnYwmL2ao5mPonQA6kencMF24+rkuoVOAAALLBeCv7fUBXeLrlPT1UTWP4wI wsMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737632428; x=1738237228; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=mEh7OX2cBUqefji3a1b3yeiO1C7e/HS2aWQCE/Vg0ps=; b=b9sBw2mLuIxnwfphIs6+WJI5YqyhcBBl7FQiffOlQwprWTkOMH/wjvtL7dAe4LGA0/ Xh3v6DQRz/6K8MNVCIbmeYC0QnZMm015qc3TDPHPkby20EA0lyWg8TlFfsewrINO9QtA z4AsQkqjPeWMy+fWUe+N2PEeQEpkr2PEBW/WmpZEt/lxbQ0yjumN7fNDCDJdU2S1NAU1 gPgaitNVYfj5/pkH1N/BmJDvp6cIzs3E+llqC7Dyu3q4/oIL2zwTblsjOjzW6UmDcF3+ Ukzh/owNSbEDwuQfy0MQyoDbxbjJU3CYAIWlOlxS/x6MoMULChYKYnHU3mQ62a8C9iU8 8+Lg== X-Gm-Message-State: AOJu0YyS9acmqLPjg0BefEr+d6CrlomrCB5SwC8lt2n5uK4q8q0izlpo W63ATOj7AQECvC09bPCULYxsdqPwlNDz5vUJ4n/riCdjQYE/JRlRMPG65/qqBxv3yykEo1qawmN poCooN2JEcECxdj5puXfVTdlfhiegBE7MNO8= X-Gm-Gg: ASbGncuT/cnIhYW3UnrgrNusvF2bUHAWsY86VeGhbo+i//kFSfIJdK7R/8vrnBAnjLj StUjQLuFXl7oKUL9UPyRTbsiIvLXCVF2kzQ+Cpqe5//DxwqpZEKZYK4En64Y5 X-Google-Smtp-Source: AGHT+IFT5aFbtrtnaK+0GwfIXYo/lXaXivVTuECqa1NC9JNoGh10fLyPG04+RhsfG9E9B0TLRsLgyZHC4N5AIJ5yoZI= X-Received: by 2002:a17:907:6094:b0:aa6:33cf:b389 with SMTP id a640c23a62f3a-ab38b321474mr2233575766b.34.1737632428049; Thu, 23 Jan 2025 03:40:28 -0800 (PST) MIME-Version: 1.0 References: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> In-Reply-To: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> From: Paul Brindusa Date: Thu, 23 Jan 2025 11:40:16 +0000 X-Gm-Features: AbW1kva4mFkOmvlq8ugB9AlIL7MmQTAypvRJP9gzSEOcPePlLeDbSKn2mogaj_0 Message-ID: Subject: Re: Return of the pg_wal issue.. To: Adrian Klaver Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000009fcf99062c5e136d" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009fcf99062c5e136d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hopefully the below is going to give a little bit more insight on the issue= . I will mention as well that the cluster also replicates data to another mysql database if it's relevant at all. Also worth noting this is our production cluster and we have another pre-production cluster with basically the same settings and the issue there does not occur. A good deal more information is needed to troubleshoot this: 1) Postgres version(s). postgres (PostgreSQL) 15.10 2) The Patroni version. patroni 4.0.4 3) The Patroni configuration. scope: postgres-cluster name: db01 namespace: /service/ log: level: INFO traceback_level: ERROR format: "%(asctime)s %(levelname)s: %(message)s" dateformat: "" max_queue_size: 1000 dir: /var/log/patroni file_num: 4 file_size: 25000000 loggers: patroni.postmaster: WARNING urllib3: WARNING restapi: listen: x.x.x.98:8008 connect_address: x.x.x.98:8008 etcd3: hosts: db01.local:2379,db02.local:2379,db03.local:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true use_slots: true parameters: max_connections: 500 superuser_reserved_connections: 5 password_encryption: scram-sha-256 max_locks_per_transaction: 512 max_prepared_transactions: 0 huge_pages: try shared_buffers: 128MB effective_cache_size: 4GB work_mem: 128MB maintenance_work_mem: 256MB checkpoint_timeout: 15min checkpoint_completion_target: 0.9 min_wal_size: 80MB max_wal_size: 1GB wal_buffers: 32MB default_statistics_target: 1000 seq_page_cost: 1 random_page_cost: 4 effective_io_concurrency: 2 synchronous_commit: on autovacuum: on autovacuum_max_workers: 5 autovacuum_vacuum_scale_factor: 0.01 autovacuum_analyze_scale_factor: 0.01 autovacuum_vacuum_cost_limit: 500 autovacuum_vacuum_cost_delay: 2 autovacuum_naptime: 1s max_files_per_process: 4096 archive_mode: on archive_timeout: 1800s archive_command: cd . wal_level: replica wal_keep_size: 2GB max_wal_senders: 10 max_replication_slots: 10 hot_standby: on wal_log_hints: on wal_compression: on shared_preload_libraries: pgaudit track_io_timing: on log_lock_waits: on log_temp_files: 0 track_activities: on track_counts: on track_functions: all log_checkpoints: on logging_collector: on log_truncate_on_rotation: on log_rotation_age: 1d log_rotation_size: 1GB log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a,client= =3D%h ' log_filename: postgresql-%Y-%m-%d.log log_directory: /var/log/pgsql log_connections: on log_disconnections: on log_statement: ddl log_error_verbosity: verbose hot_standby_feedback: on max_standby_streaming_delay: 30s wal_receiver_status_interval: 10s idle_in_transaction_session_timeout: 10min jit: off max_worker_processes: 24 max_parallel_workers: 8 max_parallel_workers_per_gather: 2 max_parallel_maintenance_workers: 2 initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator x.x.x.98/27 scram-sha-256 - host replication replicator x.x.x.99/27 scram-sha-256 - host replication replicator x.x.x.100/27 scram-sha-256 - host all all 0.0.0.0/0 md5 postgresql: listen: x.x.x.98:5432 connect_address: x.x.x.98:5432 data_dir: /var/lib/pgsql/data bin_dir: /usr/bin pgpass: /var/lib/pgsql/.pgpass_patroni authentication: replication: username: replicator password: password superuser: username: postgres password: password parameters: unix_socket_directories: /var/run/postgresql remove_data_directory_on_rewind_failure: false remove_data_directory_on_diverged_timelines: false create_replica_methods: - basebackup basebackup: max-rate: '100M' checkpoint: 'fast' watchdog: mode: required device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: false 4) Definition of 'ridiculous rate'. 1GB / day 5) Relevant information from the logs. Below entry is something taken off today's log until this point in time which I think it might be relevant. I cannot see any specifics. If there is anything else please let me know. 2: GMT [186889]: [863-1] db=3D,user=3D,app=3D,client=3D= LOG: 00000: checkpoint starting: time 2: GMT [186889]: [864-1] db=3D,user=3D,app=3D,client=3D LOCATION: LogCheckpointStart, xlog.c:6121 2: GMT [186889]: [865-1] db=3D,user=3D,app=3D,client=3D= LOG: 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total=3D6.619 s; sy= nc files=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 kB, estima= te=3D56426 kB 2: GMT [186889]: [866-1] db=3D,user=3D,app=3D,client=3D LOCATION: LogCheckpointEnd, xlog.c:6202 2: GMT [2439188]: [7-1] db=3Ddocumentation-database,user=3Ddocumentation-database-user,app=3DPostgr= eSQL JDBC Driver,client=3D LOG: 00000: disconnection: session time: 0: user=3Ddocumentation-database-user database=3Ddocumentation-database host=3D port=3D56170 @Laurenz I guess you are referring to https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ *Yes, that is the one.* I listed all the reasons I know for your predicament. Did you do some research along these lines? *I've had a look at the things that you have mentioned in the guide. * If yes, what did you find? *I've not managed to test the queries out yet. But I am planning to test out in my lab environment.* *Sorry am really cautious about this as those are the main production databases.* *Hope the above is going to give a bit of insight on the root cause of the problem.* Yours, Laurenz Albe On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klaver wrote: > On 1/22/25 09:33, Paul Brindusa wrote: > > Good afternoon, > > > > Following below we are facing a similar issue and im getting a real buz= z > > to get this working myself, speaking to my DBA in the company has > > actually left me a bit cold as he is not good with postgres. > > > > So I want to try and get a solution for this and fix this issue with th= e > > pg_wal files filling up the drive at a ridiculous rate. I have been > > manually moving logs to a different directory but have had no luck in > > finding an actual solution. > > > > The cluster is a 3 node cluster with HA which is running wirth patroni. > > > > Please help me out, I will mention that I have test cluster spun up in > > case something needs testing. > > > > Also want to give a shout out to Lorenz Albe's for posting stuff about > > wal files on his company blog. > > > > Again any help will be greatly appreciated. > > A good deal more information is needed to troubleshoot this: > > 1) Postgres version(s). > > 2) The Patroni version. > > 3) The Patroni configuration. > > 4) Definition of 'ridiculous rate'. > > 5) Relevant information from the logs. > > > > > > > " On one of our postgres instances we have the pg_wal/data folder up to > > 196GB, out of 200GB disk filled up. > > This has stopped the posgresql.service this morning causing two > > applications to crash. > > Unfortunately our database admin is on leave today, and we are trying t= o > > figure out how to get the disk down? > > Any ideas or suggestions are more than welcome. > > > > Thank you in advance." > > > > > > -- > > Kind Regards, > > Paul Brindusa > > paulbrindusa88@gmail.com > > > > -- > Adrian Klaver > adrian.klaver@aklaver.com > > --=20 Kind Regards, Paul Brindusa paulbrindusa88@gmail.com --0000000000009fcf99062c5e136d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hopefully the below is going to give= a little bit more insight on the issue.
I will mention as well t= hat the cluster also replicates data to another mysql database if it's = relevant at all.
Also worth noting this is our production cluster= and we have another pre-production cluster with basically the same setting= s and the issue there does not occur.

A good deal more = information is needed to troubleshoot this:

1) Postgres version(s).<= div>
postgres (PostgreSQL) 15.10

2) The Patroni versio= n.

patroni 4.0.4

3) The Patroni configurati= on.

scope: postgres-cluster
name: db01
namespace: /service/
log:
=C2=A0 level: INFO
=C2=A0 traceback_level: ERROR
=C2=A0 = format: "%(asctime)s %(levelname)s: %(message)s"
=C2=A0 datefo= rmat: ""
=C2=A0 max_queue_size: 1000
=C2=A0 dir: /var/log/p= atroni
=C2=A0 file_num: 4
=C2=A0 file_size: 25000000
=C2=A0 logger= s:
=C2=A0 =C2=A0 patroni.postmaster: WARNING
=C2=A0 =C2=A0 urllib3: W= ARNING

restapi:
=C2=A0 listen: x.x.x.98:8008
=C2=A0 connect_ad= dress: x.x.x.98:8008

etcd3:
=C2=A0 hosts: db01.local:2379,db02.lo= cal:2379,db03.local:2379


bootstrap:
=C2=A0 dcs:
=C2=A0 =C2= =A0 ttl: 30
=C2=A0 =C2=A0 loop_wait: 10
=C2=A0 =C2=A0 retry_timeout: = 10
=C2=A0 =C2=A0 maximum_lag_on_failover: 1048576
=C2=A0 =C2=A0 postg= resql:
=C2=A0 =C2=A0 =C2=A0 use_pg_rewind: true
=C2=A0 =C2=A0 =C2=A0 = use_slots: true
=C2=A0 =C2=A0 =C2=A0 parameters:
=C2=A0 =C2=A0 =C2=A0= =C2=A0 max_connections: 500
=C2=A0 =C2=A0 =C2=A0 =C2=A0 superuser_reser= ved_connections: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 password_encryption: scra= m-sha-256
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_locks_per_transaction: 512
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 max_prepared_transactions: 0
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 huge_pages: try =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 shared_buffers: 128MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_cache= _size: 4GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 work_mem: 128MB
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 maintenance_work_mem: 256MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ch= eckpoint_timeout: 15min
=C2=A0 =C2=A0 =C2=A0 =C2=A0 checkpoint_completio= n_target: 0.9
=C2=A0 =C2=A0 =C2=A0 =C2=A0 min_wal_size: 80MB
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 max_wal_size: 1GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_b= uffers: 32MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 default_statistics_target: 1000=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 seq_page_cost: 1
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 random_page_cost: 4
=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_io_conc= urrency: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 synchronous_commit: on
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 autovacuum: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacu= um_max_workers: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_scale_fa= ctor: 0.01
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_analyze_scale_factor: = 0.01
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_limit: 500
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_delay: 2
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 autovacuum_naptime: 1s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max= _files_per_process: 4096
=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_mode: on=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_timeout: 1800s
=C2=A0 =C2=A0 =C2=A0= =C2=A0 archive_command: cd .
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_level: rep= lica
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_keep_size: 2GB
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 max_wal_senders: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_replicati= on_slots: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_standby: on
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 wal_log_hints: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_comp= ression: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 shared_preload_libraries: pgaudi= t
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_io_timing: on
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 log_lock_waits: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_temp_files= : 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_activities: on
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 track_counts: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_functio= ns: all
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_checkpoints: on
=C2=A0 =C2=A0= =C2=A0 =C2=A0 logging_collector: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_tru= ncate_on_rotation: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_age: 1d=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_size: 1GB
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a= ,client=3D%h '
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_filename: postgresql-= %Y-%m-%d.log
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_directory: /var/log/pgsql=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_connections: on
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 log_disconnections: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_statement:= ddl
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_error_verbosity: verbose
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 hot_standby_feedback: on
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 max_standby_streaming_delay: 30s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_rec= eiver_status_interval: 10s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 idle_in_transacti= on_session_timeout: 10min
=C2=A0 =C2=A0 =C2=A0 =C2=A0 jit: off
=C2=A0= =C2=A0 =C2=A0 =C2=A0 max_worker_processes: 24
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 max_parallel_workers: 8
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_wor= kers_per_gather: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_maintenance_= workers: 2 =C2=A0 =C2=A0 =C2=A0

=C2=A0 initdb:
=C2=A0 - encoding:= UTF8
=C2=A0 - data-checksums

=C2=A0 pg_hba:
=C2=A0 - host rep= lication replicator 127.0.0.1/32 md5=C2=A0
=C2=A0 - host replication replicator x.x.x.98/27 scram-sha-256<= br>=C2=A0
=C2=A0
=C2=A0
=C2=A0 - host replication replicator x.= x.x.99/27 scram-sha-256
=C2=A0
=C2=A0
=C2=A0
=C2=A0 - host r= eplication replicator x.x.x.100/27 scram-sha-256
=C2=A0
=C2=A0 =C2= =A0
=C2=A0 - host all all 0.0.0.0/0 md= 5

postgresql:
=C2=A0 listen: x.x.x.98:5432
=C2=A0 connect_addr= ess: x.x.x.98:5432
=C2=A0 data_dir: /var/lib/pgsql/data
=C2=A0 bin_di= r: /usr/bin
=C2=A0 pgpass: /var/lib/pgsql/.pgpass_patroni
=C2=A0 auth= entication:
=C2=A0 =C2=A0 replication:
=C2=A0 =C2=A0 =C2=A0 username:= replicator
=C2=A0 =C2=A0 =C2=A0 password: password
=C2=A0 =C2=A0 sup= eruser:
=C2=A0 =C2=A0 =C2=A0 username: postgres
=C2=A0 =C2=A0 =C2=A0 = password: password
=C2=A0 parameters:
=C2=A0 =C2=A0 unix_socket_direc= tories: /var/run/postgresql

=C2=A0 remove_data_directory_on_rewind_f= ailure: false
=C2=A0 remove_data_directory_on_diverged_timelines: false<= br>
=C2=A0 create_replica_methods:
=C2=A0 =C2=A0 - basebackup
=C2= =A0 basebackup:
=C2=A0 =C2=A0 max-rate: '100M'
=C2=A0 =C2=A0 = checkpoint: 'fast' =C2=A0 =C2=A0 =C2=A0

watchdog:
=C2=A0 = mode: required
=C2=A0 device: /dev/watchdog
=C2=A0 safety_margin: 5
tags:
=C2=A0 nofailover: false
=C2=A0 noloadbalance: false
= =C2=A0 clonefrom: false
=C2=A0 nosync: false

4) Definition of = 9;ridiculous rate'.

1GB / day

5) Relevant information fro= m the logs.

Below entry is something taken off tod= ay's log=C2=A0 until this point in time which I think it might be relev= ant. I cannot see any specifics. If there is anything else please let me kn= ow.=C2=A0

2<REDACTED>:<REDACTED> GMT [= 186889]: [863-1] db=3D,user=3D,app=3D,client=3D LOG: =C2=A000000: checkpoin= t starting: time
2<REDACTED>:<REDACTED> GMT [186889]: [864-1= ] db=3D,user=3D,app=3D,client=3D LOCATION: =C2=A0LogCheckpointStart, xlog.c= :6121
2<REDACTED>:<REDACTED> GMT [186889]: [865-1] db=3D,use= r=3D,app=3D,client=3D LOG: =C2=A000000: checkpoint complete: wrote 66 buffe= rs (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=3D6.563 s, syn= c=3D0.003 s, total=3D6.619 s; sync files=3D22, longest=3D0.002 s, average= =3D0.001 s; distance=3D776 kB, estimate=3D56426 kB
2<REDACTED>:<= ;REDACTED> GMT [186889]: [866-1] db=3D,user=3D,app=3D,client=3D LOCATION= : =C2=A0LogCheckpointEnd, xlog.c:6202
2<REDACTED>:<REDACTED>= GMT [2439188]: [7-1] db=3Ddocumentation-database,user=3Ddocumentation-data= base-user,app=3DPostgreSQL JDBC Driver,client=3D<REDACTED> LOG: =C2= =A000000: disconnection: session time: 0:<REDACTED> user=3Ddocumentat= ion-database-user database=3Ddocumentation-database host=3D<REDACTED>= port=3D56170



I guess = you are referring to
https= ://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/
<= br>Yes, that is the one.

I listed all the reasons I know for = your predicament.
Did you do some research along these lines?
=
I've had a look at the things that you have mentioned= in the guide.=C2=A0

If yes, what did you find= ?

I've not managed to test the queries out yet. Bu= t I am planning to test out in my lab environment.
Sorry a= m really cautious about this as those are the main production databases.

Hope the above is going to give a bit = of insight on the root cause of the problem.

<= br>

Yours,
Laurenz Albe



On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klave= r <adrian.klaver@aklaver.co= m> wrote:
On 1/22/25 09:33, Paul Brindusa wrote:
> Good afternoon,
>
> Following below we are facing a similar issue and im getting a real bu= zz
> to get this working myself, speaking to my DBA=C2=A0 in the=C2=A0 comp= any has
> actually left me a bit cold as he is not good with postgres.
>
> So I want to try and get a solution for this and fix this issue with t= he
> pg_wal files filling up the drive at a ridiculous rate. I have been > manually moving logs to a different directory but have had no luck in =
> finding an actual solution.
>
> The cluster is a 3 node cluster with HA which is running wirth patroni= .
>
> Please help me out, I will mention that I have test cluster spun up in=
> case something needs testing.
>
> Also want to give a shout out to Lorenz Albe's for posting stuff a= bout
> wal files on his company blog.
>
> Again any help will be greatly appreciated.

A good deal more information is needed to troubleshoot this:

1) Postgres version(s).

2) The Patroni version.

3) The Patroni configuration.

4) Definition of 'ridiculous rate'.

5) Relevant information from the logs.

>
>
> " On one of our postgres instances we have the pg_wal/data folder= up to
> 196GB, out of 200GB disk filled up.
> This has stopped the posgresql.service this morning causing two
> applications to crash.
> Unfortunately our database admin is on leave today, and we are trying = to
> figure out how to get the disk down?
> Any ideas or suggestions are more than welcome.
>
> Thank you in advance."
>
>
> --
> Kind Regards,
> Paul Brindusa
> paulbrin= dusa88@gmail.com <mailto:paulbrindusa88@gmail.com>
>

--
Adrian Klaver
adrian.klave= r@aklaver.com



--
Kind Regards,
Paul Brindusa
=

--0000000000009fcf99062c5e136d--