Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tayYu-006soV-JF for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 14:52:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tayYt-00GnUd-OL for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 14:52:03 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tayYt-00GnUU-5E for pgsql-general@lists.postgresql.org; Thu, 23 Jan 2025 14:52:03 +0000 Received: from mail-lj1-x231.google.com ([2a00:1450:4864:20::231]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tayYq-0016mj-0F for pgsql-general@postgresql.org; Thu, 23 Jan 2025 14:52:02 +0000 Received: by mail-lj1-x231.google.com with SMTP id 38308e7fff4ca-3061513d353so10350251fa.2 for ; Thu, 23 Jan 2025 06:52:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737643919; x=1738248719; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=OzNVYpiXpIrFJBMIZN8/soH66LHr6F9vLRHAOa3EP3A=; b=HliTCSQEfXUwD6yh5/HoXi2tXbvGm7Aa98ApQuxHYuQAkM+ACuoRZ8/ulQzivQe4fT 9LkyiRiRo5gxMBTUBJyp+TWLnHz1xhS1KKSwEukueeGW6MiHod2GlsPCJmIUp9wf2zWg rj13pMrnAT7qZSHNGQ7nQJAo69dIL2ptqKJpmERTvDmtlErkAclr1cjIGUnV5YqDGK+J R6Hfvp6sa5h5JqJCwLONwSHd6dYFTYYmKBeIKQ1WSwwCGqzouzJ35Gmz4Xerh42yV9dr bxM2nZ273zO+CZ0W7WkfEcvQMmyka4CzEr2sGoq2wPj+qszQWkxPJ/qH6tPRpYYrJLL9 d7yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737643919; x=1738248719; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=OzNVYpiXpIrFJBMIZN8/soH66LHr6F9vLRHAOa3EP3A=; b=jaFdPOLdgrzAbfMmq06OnvhcUZUShghHdlV2cv/IM9XBk2gBti69GOAp/jT4Mm0n8L gTNUt9ZptijXKFF/AOPAII2Sx7dNtWusrz9Kl8df8NyJGJ8AB6VkHpoyXP7CHWX3hBG8 3sQISLaiAg/x/B5yxY3qAFZJrY7S0GuFjpDAV7mZsbAMuJv48beWpXs299Y0FrlF1L3m drd7sPACIqIGzT/413OgTOgYKdArBhHACCRcMgHXCnCC7mU5YBjHN4NFV3J6G+nM/WLm 9LGdZ+otd968FJjKwE/0qZJ6kkP6PM3TQlnSDLQ3iLhZmY6917aI+oxwSg3AtW+rNrKR F6yA== X-Gm-Message-State: AOJu0YwpWgpcOkIM7VNcTWG4DQ5Fodm+7cJZqi/wSiRGd6Dv0kn0kkuO AptX4Wzd1THs+c9RwxX6clqhESUrkEs+bMJ/8ZFTX/UIDMcNyCzgTKwDQktVX1AsGgogpmHNvDG O1dIupzpvIrYjB6KK6GFumUBmqIk= X-Gm-Gg: ASbGncuVl+nnlvKtLBowaJKjpNf8Tmj6G50/ujBREyu3wK5p5/nngTOUHG1n4vWkxoa hlOTMli/SqXxR/PKhEBcRFKkhgQTBstYsnZFfnBJTIGzTkvmVaglDz+XmlsaJ X-Google-Smtp-Source: AGHT+IHXSfmQsAJY25IskdjNA+7xLM4t2CqmEuwna1V9m1eKYI3joY8Rd/4b2RPErJaKcm2b4f8yuFvJe5aeJZdMQ8E= X-Received: by 2002:a2e:a542:0:b0:2ff:d83d:9155 with SMTP id 38308e7fff4ca-3072cb0e9e8mr94188691fa.27.1737643918435; Thu, 23 Jan 2025 06:51:58 -0800 (PST) MIME-Version: 1.0 References: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> In-Reply-To: From: Saul Perdomo Date: Thu, 23 Jan 2025 09:51:42 -0500 X-Gm-Features: AbW1kvYgL5yhQDyo3gCxQ78pxFFX62nioMSHvhSAm9WnealraoOIR3eBb52D3DQ Message-ID: Subject: Re: Return of the pg_wal issue.. To: Paul Brindusa Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000812ee0062c60c08a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000812ee0062c60c08a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable In a nutshell: Remember the ticker-tape of old? (You know, the festive paper strips tossed out of high-rises by stockbrokers, that float on down on city parades after a world war is won in black and white movies?) Those were market-transaction serial records.. Similar to those guys, the write-ahead logfiles are records of every single DB action that isn't read-only. Now, one could think "well if I've already committed the transactions to the database, why do I need to keep the receipts?" And, well, strictly speaking you don't -- but you really want to, because they could serve to replay, *in perfect order, *the database activity for any given period. This data is *gold *when it comes to, say, recovering from disaster. This is why everybody will tell you "don't just delete these files, archive them properly!" Again, for operational purposes, you could just delete them. But you really want to make a *copy *of them before you do... you know, *just in case *something bad happens to your DB that makes you want to roll it back in time. Enter the "archive_command" function. Instead of simply deleting them, you tell PG to pass the files on to another piece of software that knows what to do with this highly valuable data. To avoid complicating matters further, at this point I'll simply recommend that you use PGBackRest for this, it's my favorite piece of software beside the PG server. Or, you know, you could just delete them. You really don't want to, though. Get PGBackRest going and the concepts will click for you as you progress along setting it up: https://pgbackrest.org/user-guide.html Cheers Saul On Thu, Jan 23, 2025 at 9:40=E2=80=AFAM Paul Brindusa wrote: > Hi Saul, > > Fantastic, highly appreciate that. > Initially i've taken the top bit of Laurenz's post to understand what > these wal files are. > Any help is more than welcome, apologies for not checking that bit > earlier. In the meantime i've checked those queries as well. > > Thank you > > > On Thu, Jan 23, 2025 at 2:27=E2=80=AFPM Saul Perdomo > wrote: > >> Hey Paul, >> >> Regarding >> >> *"I've not managed to test the queries out yet. But I am planning to tes= t >> out in my lab environment.* >> *Sorry am really cautious about this as those are the main production >> databases."* >> >> As a dispassionate third-party observer, I can confirm that all SELECT >> and SHOW queries from Laurenz's blog post are read-only. They're complet= ely >> safe to run in the affected environment. >> >> On Thu, Jan 23, 2025 at 6:40=E2=80=AFAM Paul Brindusa >> wrote: >> >>> Hopefully the below is going to give a little bit more insight on the >>> issue. >>> I will mention as well that the cluster also replicates data to another >>> mysql database if it's relevant at all. >>> Also worth noting this is our production cluster and we have another >>> pre-production cluster with basically the same settings and the issue t= here >>> does not occur. >>> >>> A good deal more information is needed to troubleshoot this: >>> >>> 1) Postgres version(s). >>> >>> postgres (PostgreSQL) 15.10 >>> >>> 2) The Patroni version. >>> >>> patroni 4.0.4 >>> >>> 3) The Patroni configuration. >>> >>> scope: postgres-cluster >>> name: db01 >>> namespace: /service/ >>> >>> log: >>> level: INFO >>> traceback_level: ERROR >>> format: "%(asctime)s %(levelname)s: %(message)s" >>> dateformat: "" >>> max_queue_size: 1000 >>> dir: /var/log/patroni >>> file_num: 4 >>> file_size: 25000000 >>> loggers: >>> patroni.postmaster: WARNING >>> urllib3: WARNING >>> >>> restapi: >>> listen: x.x.x.98:8008 >>> connect_address: x.x.x.98:8008 >>> >>> etcd3: >>> hosts: db01.local:2379,db02.local:2379,db03.local:2379 >>> >>> >>> bootstrap: >>> dcs: >>> ttl: 30 >>> loop_wait: 10 >>> retry_timeout: 10 >>> maximum_lag_on_failover: 1048576 >>> postgresql: >>> use_pg_rewind: true >>> use_slots: true >>> parameters: >>> max_connections: 500 >>> superuser_reserved_connections: 5 >>> password_encryption: scram-sha-256 >>> max_locks_per_transaction: 512 >>> max_prepared_transactions: 0 >>> huge_pages: try >>> shared_buffers: 128MB >>> effective_cache_size: 4GB >>> work_mem: 128MB >>> maintenance_work_mem: 256MB >>> checkpoint_timeout: 15min >>> checkpoint_completion_target: 0.9 >>> min_wal_size: 80MB >>> max_wal_size: 1GB >>> wal_buffers: 32MB >>> default_statistics_target: 1000 >>> seq_page_cost: 1 >>> random_page_cost: 4 >>> effective_io_concurrency: 2 >>> synchronous_commit: on >>> autovacuum: on >>> autovacuum_max_workers: 5 >>> autovacuum_vacuum_scale_factor: 0.01 >>> autovacuum_analyze_scale_factor: 0.01 >>> autovacuum_vacuum_cost_limit: 500 >>> autovacuum_vacuum_cost_delay: 2 >>> autovacuum_naptime: 1s >>> max_files_per_process: 4096 >>> archive_mode: on >>> archive_timeout: 1800s >>> archive_command: cd . >>> wal_level: replica >>> wal_keep_size: 2GB >>> max_wal_senders: 10 >>> max_replication_slots: 10 >>> hot_standby: on >>> wal_log_hints: on >>> wal_compression: on >>> shared_preload_libraries: pgaudit >>> track_io_timing: on >>> log_lock_waits: on >>> log_temp_files: 0 >>> track_activities: on >>> track_counts: on >>> track_functions: all >>> log_checkpoints: on >>> logging_collector: on >>> log_truncate_on_rotation: on >>> log_rotation_age: 1d >>> log_rotation_size: 1GB >>> log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a,cl= ient=3D%h >>> ' >>> log_filename: postgresql-%Y-%m-%d.log >>> log_directory: /var/log/pgsql >>> log_connections: on >>> log_disconnections: on >>> log_statement: ddl >>> log_error_verbosity: verbose >>> hot_standby_feedback: on >>> max_standby_streaming_delay: 30s >>> wal_receiver_status_interval: 10s >>> idle_in_transaction_session_timeout: 10min >>> jit: off >>> max_worker_processes: 24 >>> max_parallel_workers: 8 >>> max_parallel_workers_per_gather: 2 >>> max_parallel_maintenance_workers: 2 >>> >>> initdb: >>> - encoding: UTF8 >>> - data-checksums >>> >>> pg_hba: >>> - host replication replicator 127.0.0.1/32 md5 >>> >>> - host replication replicator x.x.x.98/27 scram-sha-256 >>> >>> >>> >>> - host replication replicator x.x.x.99/27 scram-sha-256 >>> >>> >>> >>> - host replication replicator x.x.x.100/27 scram-sha-256 >>> >>> >>> - host all all 0.0.0.0/0 md5 >>> >>> postgresql: >>> listen: x.x.x.98:5432 >>> connect_address: x.x.x.98:5432 >>> data_dir: /var/lib/pgsql/data >>> bin_dir: /usr/bin >>> pgpass: /var/lib/pgsql/.pgpass_patroni >>> authentication: >>> replication: >>> username: replicator >>> password: password >>> superuser: >>> username: postgres >>> password: password >>> parameters: >>> unix_socket_directories: /var/run/postgresql >>> >>> remove_data_directory_on_rewind_failure: false >>> remove_data_directory_on_diverged_timelines: false >>> >>> create_replica_methods: >>> - basebackup >>> basebackup: >>> max-rate: '100M' >>> checkpoint: 'fast' >>> >>> watchdog: >>> mode: required >>> device: /dev/watchdog >>> safety_margin: 5 >>> >>> tags: >>> nofailover: false >>> noloadbalance: false >>> clonefrom: false >>> nosync: false >>> >>> 4) Definition of 'ridiculous rate'. >>> >>> 1GB / day >>> >>> 5) Relevant information from the logs. >>> >>> Below entry is something taken off today's log until this point in tim= e >>> which I think it might be relevant. I cannot see any specifics. If ther= e is >>> anything else please let me know. >>> >>> 2: GMT [186889]: [863-1] db=3D,user=3D,app=3D,clien= t=3D LOG: >>> 00000: checkpoint starting: time >>> 2: GMT [186889]: [864-1] db=3D,user=3D,app=3D,clien= t=3D >>> LOCATION: LogCheckpointStart, xlog.c:6121 >>> 2: GMT [186889]: [865-1] db=3D,user=3D,app=3D,clien= t=3D LOG: >>> 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) add= ed, >>> 0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total=3D6.619 s= ; sync >>> files=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 kB, es= timate=3D56426 >>> kB >>> 2: GMT [186889]: [866-1] db=3D,user=3D,app=3D,clien= t=3D >>> LOCATION: LogCheckpointEnd, xlog.c:6202 >>> 2: GMT [2439188]: [7-1] >>> db=3Ddocumentation-database,user=3Ddocumentation-database-user,app=3DPo= stgreSQL >>> JDBC Driver,client=3D LOG: 00000: disconnection: session tim= e: >>> 0: user=3Ddocumentation-database-user >>> database=3Ddocumentation-database host=3D port=3D56170 >>> >>> >>> @Laurenz >>> >>> I guess you are referring to >>> https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ >>> >>> *Yes, that is the one.* >>> >>> I listed all the reasons I know for your predicament. >>> Did you do some research along these lines? >>> >>> *I've had a look at the things that you have mentioned in the guide. * >>> >>> If yes, what did you find? >>> >>> *I've not managed to test the queries out yet. But I am planning to tes= t >>> out in my lab environment.* >>> *Sorry am really cautious about this as those are the main production >>> databases.* >>> >>> *Hope the above is going to give a bit of insight on the root cause of >>> the problem.* >>> >>> >>> >>> Yours, >>> Laurenz Albe >>> >>> >>> >>> On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klaver >>> wrote: >>> >>>> On 1/22/25 09:33, Paul Brindusa wrote: >>>> > Good afternoon, >>>> > >>>> > Following below we are facing a similar issue and im getting a real >>>> buzz >>>> > to get this working myself, speaking to my DBA in the company has >>>> > actually left me a bit cold as he is not good with postgres. >>>> > >>>> > So I want to try and get a solution for this and fix this issue with >>>> the >>>> > pg_wal files filling up the drive at a ridiculous rate. I have been >>>> > manually moving logs to a different directory but have had no luck i= n >>>> > finding an actual solution. >>>> > >>>> > The cluster is a 3 node cluster with HA which is running wirth >>>> patroni. >>>> > >>>> > Please help me out, I will mention that I have test cluster spun up >>>> in >>>> > case something needs testing. >>>> > >>>> > Also want to give a shout out to Lorenz Albe's for posting stuff >>>> about >>>> > wal files on his company blog. >>>> > >>>> > Again any help will be greatly appreciated. >>>> >>>> A good deal more information is needed to troubleshoot this: >>>> >>>> 1) Postgres version(s). >>>> >>>> 2) The Patroni version. >>>> >>>> 3) The Patroni configuration. >>>> >>>> 4) Definition of 'ridiculous rate'. >>>> >>>> 5) Relevant information from the logs. >>>> >>>> > >>>> > >>>> > " On one of our postgres instances we have the pg_wal/data folder up >>>> to >>>> > 196GB, out of 200GB disk filled up. >>>> > This has stopped the posgresql.service this morning causing two >>>> > applications to crash. >>>> > Unfortunately our database admin is on leave today, and we are tryin= g >>>> to >>>> > figure out how to get the disk down? >>>> > Any ideas or suggestions are more than welcome. >>>> > >>>> > Thank you in advance." >>>> > >>>> > >>>> > -- >>>> > Kind Regards, >>>> > Paul Brindusa >>>> > paulbrindusa88@gmail.com >>>> > >>>> >>>> -- >>>> Adrian Klaver >>>> adrian.klaver@aklaver.com >>>> >>>> >>> >>> -- >>> Kind Regards, >>> Paul Brindusa >>> paulbrindusa88@gmail.com >>> >>> > > -- > Kind Regards, > Paul Brindusa > paulbrindusa88@gmail.com > > --000000000000812ee0062c60c08a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
In a nutshell: Remember the ticker-tape of old? (You know,= the festive paper strips tossed out of high-rises by stockbrokers, that fl= oat on down on city parades after a world war is won in=C2=A0black and whit= e movies?) Those were market-transaction serial records.. Similar to those = guys, the write-ahead logfiles are records of every single DB action that i= sn't read-only.=C2=A0

Now, one could think "wel= l if I've already committed the transactions=C2=A0to the database, why = do I need to keep the receipts?" And, well, strictly speaking you don&= #39;t -- but you really want to, because they could serve to replay, in = perfect order, the database activity for any given period. This data is= gold when it comes to, say, recovering from disaster.
This is why everybody will tell you "don't just delete= these files, archive them properly!" Again, for operational purposes,= you could just delete them. But you really want to make a copy of t= hem before=C2=A0you do... you know, just in case something bad happe= ns to your DB that makes you want to roll it back in time.

Enter the "archive_command" function. Instead of simply = deleting them, you tell PG to pass the files on to another piece of softwar= e that knows what to do with this highly valuable data. To avoid complicati= ng matters further, at this point I'll simply recommend that you use PG= BackRest for this, it's my favorite piece of software beside the PG ser= ver.=C2=A0

Or, you know, you could just delete the= m. You really don't want to, though. Get PGBackRest going and the conce= pts will click for you as you progress along setting it up:=C2=A0https://pgbackrest.org/user-guide.= html

Cheers
Saul

On Thu, Jan 23, 2025 at 9:40=E2=80=AFAM Paul Brindusa <paulbrindusa88@gmail.com> wrote= :
Hi Saul,

Fantastic, highly appreciate that.
Initia= lly i've taken=C2=A0 the top bit of Laurenz's post to understand wh= at these wal files are.
Any help is more than welcome, apologies = for not checking that bit earlier. In the meantime i've checked those q= ueries as well.

Thank you


On T= hu, Jan 23, 2025 at 2:27=E2=80=AFPM Saul Perdomo <saul.perdomo@gmail.com> wrote:=
Hey Paul,

Regarding=C2=A0

"I've not managed to test the queries out yet. But I am pla= nning to test out in my lab environment.
Sorry am r= eally cautious about this as those are the main production databases."=

As a dispassionate third-party observer, I can confi= rm that all SELECT and SHOW queries from Laurenz's blog post are read-o= nly. They're completely safe to run in the affected environment.
<= div>=C2=A0=C2=A0
On Thu, Jan 23, 2025 at 6:40=E2=80=AFAM Paul Brindu= sa <paulbr= indusa88@gmail.com> wrote:
Hopefully the belo= w is going to give a little bit more insight on the issue.
I will= mention as well that the cluster also replicates data to another mysql dat= abase if it's relevant at all.
Also worth noting this is our = production cluster and we have another pre-production cluster with basicall= y the same settings and the issue there does not occur.

A good deal more information is needed to troubleshoot this:

1) Pos= tgres version(s).

postgres (PostgreSQL) 15.10

2) = The Patroni version.

patroni 4.0.4

3) The P= atroni configuration.

scope: postgres-cluster
name: db01
names= pace: /service/

log:
=C2=A0 level: INFO
=C2=A0 traceback_level= : ERROR
=C2=A0 format: "%(asctime)s %(levelname)s: %(message)s"= ;
=C2=A0 dateformat: ""
=C2=A0 max_queue_size: 1000
=C2= =A0 dir: /var/log/patroni
=C2=A0 file_num: 4
=C2=A0 file_size: 250000= 00
=C2=A0 loggers:
=C2=A0 =C2=A0 patroni.postmaster: WARNING
=C2= =A0 =C2=A0 urllib3: WARNING

restapi:
=C2=A0 listen: x.x.x.98:8008=
=C2=A0 connect_address: x.x.x.98:8008

etcd3:
=C2=A0 hosts: db= 01.local:2379,db02.local:2379,db03.local:2379


bootstrap:
=C2= =A0 dcs:
=C2=A0 =C2=A0 ttl: 30
=C2=A0 =C2=A0 loop_wait: 10
=C2=A0 = =C2=A0 retry_timeout: 10
=C2=A0 =C2=A0 maximum_lag_on_failover: 1048576<= br>=C2=A0 =C2=A0 postgresql:
=C2=A0 =C2=A0 =C2=A0 use_pg_rewind: true=C2=A0 =C2=A0 =C2=A0 use_slots: true
=C2=A0 =C2=A0 =C2=A0 parameters:=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_connections: 500
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 superuser_reserved_connections: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 pas= sword_encryption: scram-sha-256
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_locks_pe= r_transaction: 512
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_prepared_transactions= : 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 huge_pages: try =C2=A0 =C2=A0 =C2=A0
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 shared_buffers: 128MB
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 effective_cache_size: 4GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 work_mem: 1= 28MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 maintenance_work_mem: 256MB
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 checkpoint_timeout: 15min
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 checkpoint_completion_target: 0.9
=C2=A0 =C2=A0 =C2=A0 =C2=A0 min_wa= l_size: 80MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_wal_size: 1GB
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 wal_buffers: 32MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 default_= statistics_target: 1000
=C2=A0 =C2=A0 =C2=A0 =C2=A0 seq_page_cost: 1
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 random_page_cost: 4
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 effective_io_concurrency: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 synchronous_= commit: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum: on
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 autovacuum_max_workers: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 auto= vacuum_vacuum_scale_factor: 0.01
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_= analyze_scale_factor: 0.01
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum= _cost_limit: 500
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_dela= y: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_naptime: 1s
=C2=A0 =C2=A0= =C2=A0 =C2=A0 max_files_per_process: 4096
=C2=A0 =C2=A0 =C2=A0 =C2=A0 a= rchive_mode: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_timeout: 1800s
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_command: cd .
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 wal_level: replica
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_keep_size: 2GB=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_wal_senders: 10
=C2=A0 =C2=A0 =C2=A0= =C2=A0 max_replication_slots: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_standb= y: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_log_hints: on
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 wal_compression: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 shared_preloa= d_libraries: pgaudit
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_io_timing: on
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 log_lock_waits: on
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 log_temp_files: 0
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_activities: on=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_counts: on
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 track_functions: all
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_checkpoints: on=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 logging_collector: on
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 log_truncate_on_rotation: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_= rotation_age: 1d
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_size: 1GB
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 log_line_prefix: '%m [%p]: [%l-1] db=3D%d,u= ser=3D%u,app=3D%a,client=3D%h '
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_file= name: postgresql-%Y-%m-%d.log
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_directory:= /var/log/pgsql
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_connections: on
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 log_disconnections: on
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 log_statement: ddl
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_error_verbosity: = verbose
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_standby_feedback: on
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 max_standby_streaming_delay: 30s
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 wal_receiver_status_interval: 10s
=C2=A0 =C2=A0 =C2=A0 =C2=A0= idle_in_transaction_session_timeout: 10min
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = jit: off
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_worker_processes: 24
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 max_parallel_workers: 8
=C2=A0 =C2=A0 =C2=A0 =C2=A0= max_parallel_workers_per_gather: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_para= llel_maintenance_workers: 2 =C2=A0 =C2=A0 =C2=A0

=C2=A0 initdb:
= =C2=A0 - encoding: UTF8
=C2=A0 - data-checksums

=C2=A0 pg_hba:=C2=A0 - host replication replicator 127.0.0.1/32 md5
=C2=A0
=C2=A0 - host replication r= eplicator x.x.x.98/27 scram-sha-256
=C2=A0
=C2=A0
=C2=A0
=C2= =A0 - host replication replicator x.x.x.99/27 scram-sha-256
=C2=A0
= =C2=A0
=C2=A0
=C2=A0 - host replication replicator x.x.x.100/27 scr= am-sha-256
=C2=A0
=C2=A0 =C2=A0
=C2=A0 - host all all 0.0.0.0/0 md5

postgresql:=C2=A0 listen: x.x.x.98:5432
=C2=A0 connect_address: x.x.x.98:5432
= =C2=A0 data_dir: /var/lib/pgsql/data
=C2=A0 bin_dir: /usr/bin
=C2=A0 = pgpass: /var/lib/pgsql/.pgpass_patroni
=C2=A0 authentication:
=C2=A0 = =C2=A0 replication:
=C2=A0 =C2=A0 =C2=A0 username: replicator
=C2=A0 = =C2=A0 =C2=A0 password: password
=C2=A0 =C2=A0 superuser:
=C2=A0 =C2= =A0 =C2=A0 username: postgres
=C2=A0 =C2=A0 =C2=A0 password: password=C2=A0 parameters:
=C2=A0 =C2=A0 unix_socket_directories: /var/run/post= gresql

=C2=A0 remove_data_directory_on_rewind_failure: false
=C2= =A0 remove_data_directory_on_diverged_timelines: false

=C2=A0 create= _replica_methods:
=C2=A0 =C2=A0 - basebackup
=C2=A0 basebackup:
= =C2=A0 =C2=A0 max-rate: '100M'
=C2=A0 =C2=A0 checkpoint: 'fa= st' =C2=A0 =C2=A0 =C2=A0

watchdog:
=C2=A0 mode: required
= =C2=A0 device: /dev/watchdog
=C2=A0 safety_margin: 5

tags:
=C2= =A0 nofailover: false
=C2=A0 noloadbalance: false
=C2=A0 clonefrom: f= alse
=C2=A0 nosync: false

4) Definition of 'ridiculous rate&#= 39;.

1GB / day

5) Relevant information from the logs.

Below entry is something taken off today's log=C2=A0 = until this point in time which I think it might be relevant. I cannot see a= ny specifics. If there is anything else please let me know.=C2=A0

2<REDACTED>:<REDACTED> GMT [186889]: [863-1] db= =3D,user=3D,app=3D,client=3D LOG: =C2=A000000: checkpoint starting: time2<REDACTED>:<REDACTED> GMT [186889]: [864-1] db=3D,user=3D,app= =3D,client=3D LOCATION: =C2=A0LogCheckpointStart, xlog.c:6121
2<REDAC= TED>:<REDACTED> GMT [186889]: [865-1] db=3D,user=3D,app=3D,client= =3D LOG: =C2=A000000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL f= ile(s) added, 0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total= =3D6.619 s; sync files=3D22, longest=3D0.002 s, average=3D0.001 s; distance= =3D776 kB, estimate=3D56426 kB
2<REDACTED>:<REDACTED> GMT [1= 86889]: [866-1] db=3D,user=3D,app=3D,client=3D LOCATION: =C2=A0LogCheckpoin= tEnd, xlog.c:6202
2<REDACTED>:<REDACTED> GMT [2439188]: [7-1= ] db=3Ddocumentation-database,user=3Ddocumentation-database-user,app=3DPost= greSQL JDBC Driver,client=3D<REDACTED> LOG: =C2=A000000: disconnectio= n: session time: 0:<REDACTED> user=3Ddocumentation-database-user data= base=3Ddocumentation-database host=3D<REDACTED> port=3D56170


I guess you are referring to
https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-k= eep-growing/

Yes, that is the one.

I listed all th= e reasons I know for your predicament.
Did you do some research along th= ese lines?

I've had a look at the things th= at you have mentioned in the guide.=C2=A0

If y= es, what did you find?

I've not managed to test th= e queries out yet. But I am planning to test out in my lab environment.=
Sorry am really cautious about this as those are the main pro= duction databases.

Hope the above is= going to give a bit of insight on the root cause of the problem.
=



Yours,
Laurenz Albe



On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Ad= rian Klaver <adrian.klaver@aklaver.com> wrote:
On 1/22/25 09:33, Paul Brindusa wrote:
> Good afternoon,
>
> Following below we are facing a similar issue and im getting a real bu= zz
> to get this working myself, speaking to my DBA=C2=A0 in the=C2=A0 comp= any has
> actually left me a bit cold as he is not good with postgres.
>
> So I want to try and get a solution for this and fix this issue with t= he
> pg_wal files filling up the drive at a ridiculous rate. I have been > manually moving logs to a different directory but have had no luck in =
> finding an actual solution.
>
> The cluster is a 3 node cluster with HA which is running wirth patroni= .
>
> Please help me out, I will mention that I have test cluster spun up in=
> case something needs testing.
>
> Also want to give a shout out to Lorenz Albe's for posting stuff a= bout
> wal files on his company blog.
>
> Again any help will be greatly appreciated.

A good deal more information is needed to troubleshoot this:

1) Postgres version(s).

2) The Patroni version.

3) The Patroni configuration.

4) Definition of 'ridiculous rate'.

5) Relevant information from the logs.

>
>
> " On one of our postgres instances we have the pg_wal/data folder= up to
> 196GB, out of 200GB disk filled up.
> This has stopped the posgresql.service this morning causing two
> applications to crash.
> Unfortunately our database admin is on leave today, and we are trying = to
> figure out how to get the disk down?
> Any ideas or suggestions are more than welcome.
>
> Thank you in advance."
>
>
> --
> Kind Regards,
> Paul Brindusa
> paulbrin= dusa88@gmail.com <mailto:paulbrindusa88@gmail.com>
>

--
Adrian Klaver
adrian.klave= r@aklaver.com



--
Kind Regards,
Paul Brindusa
=



--
Kind Regards,
Paul Brindusa
=

--000000000000812ee0062c60c08a--