Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tazZs-0070kZ-4N for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 15:57:08 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tazZq-0007Wn-FB for pgsql-general@arkaria.postgresql.org; Thu, 23 Jan 2025 15:57:06 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tazZp-0007Wf-TI for pgsql-general@lists.postgresql.org; Thu, 23 Jan 2025 15:57:06 +0000 Received: from mail-ej1-x636.google.com ([2a00:1450:4864:20::636]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tazZm-0017NR-25 for pgsql-general@postgresql.org; Thu, 23 Jan 2025 15:57:04 +0000 Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-ab2c9b8aecaso195278766b.0 for ; Thu, 23 Jan 2025 07:57:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1737647821; x=1738252621; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Ndn8IWUU3WJuoyLiUn7OEAY2m33UDO5jqkucARiSmyU=; b=LmPGKY8WBMcVxppudDWfnZgBjEFEciLAnLNasuOIJlD8MmZZbcZOcGMCkhUMGNAkP0 O+KpUyreNITHRYmFJfUVOwlYX9SZ21NtPLgyZgWgxk4w5A/r9hfid8RyAFOowPQi8gAy MVO3zmu0G6Aos9hRIxsCUcE8atiSRmc30NtPY+iBYmU8g0VGmKfGPMcnm3xFk00ryd9D EXNAbUhlFqIHIroSIpEotFEBGAGKpW3TMOID5giaCAQHgC5StFxjGSvAibm8dxvvMfHj sHU1jCANL2Pbil4+xljwqZSeWwHD2AJz1j+AiSYg6i/1Z3DPwF411210ly/GX4UI/iHW lQrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737647821; x=1738252621; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Ndn8IWUU3WJuoyLiUn7OEAY2m33UDO5jqkucARiSmyU=; b=olqfdbmMlhr1IkUHL8n9DUq2qCJsbZLmBWKWCXiKS/OK0UwXbGeYNg+n46IysXUgIh TeXNR2KDGrM21+c8cz1R0XzcLt60c+cvyb5KmAj2gCZ8Nbe+DVtLQsgoOvh8O8CgKCMb jIXxJPov5urvvRm/3RtazWUyJJUvaN59uFKLom6rUNIQm6kjijclxl0TGOtduxt8WWUc hnI28viYrDA/XAeRWmp3TBcgGn+hFTY33U713vzxtFUxgtn85aOPrgX9eZBNoelF7DAo SSejG+MbaiYY8X/D6qjUXrmcKRywz3l2jMK3/a9exGvO2A/DpCYowSUAdlVmgFZe8hkm hLpA== X-Gm-Message-State: AOJu0Yzi6Bw95n4am8kKuZK5BgURlhpZCxQyoJZ0lF6hTfwBUiRNuwZZ hKMm3V5xIq/ayuU7ISmoRhszPc5wOE1MeAbK+QOlxGldL862b3NMhO0hJ1M0doj0IKOC0544D7o oTHmBtmy3CLuQ/YyRWgeRwWPzUTo= X-Gm-Gg: ASbGncveKohUQwnOawJHfPhg+zC+iJ1snSOz8XRWPilFVMTYQbFa5+vo4kATruNwXYF Lxf+XwgGyM5McbPPTXu80Lt6ODJT1qa7XscUCAyaawHY5eKWKyF3Q7ebcsh/D X-Google-Smtp-Source: AGHT+IHlQZ7S6S4pDwLE06giwPhgHDNcMIIEPg4XzYfUMDQ7URefkhfsIAI6HJ+IjMPbp6EU6i7wwcCIHPHU4Gtlw7I= X-Received: by 2002:a17:907:1c11:b0:ab2:f74f:3f82 with SMTP id a640c23a62f3a-ab38b3da0cemr2364455466b.57.1737647819970; Thu, 23 Jan 2025 07:56:59 -0800 (PST) MIME-Version: 1.0 References: <0ba329ef-62aa-4ab3-aefd-141baabced3b@aklaver.com> In-Reply-To: From: Paul Brindusa Date: Thu, 23 Jan 2025 15:56:48 +0000 X-Gm-Features: AbW1kvZQM06DUUGqLtQg95GIn9VRK7JvMxQwiUuDdoKqWs5ISAvf-ipW-TsCzZ8 Message-ID: Subject: Re: Return of the pg_wal issue.. To: Saul Perdomo Cc: pgsql-general Content-Type: multipart/alternative; boundary="0000000000000df9bf062c61a9f5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000df9bf062c61a9f5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Wow, hats off to you kind sir! Will defo look into this. Thank you for your support with this. On Thu, Jan 23, 2025 at 2:51=E2=80=AFPM Saul Perdomo wrote: > In a nutshell: Remember the ticker-tape of old? (You know, the festive > paper strips tossed out of high-rises by stockbrokers, that float on down > on city parades after a world war is won in black and white movies?) Thos= e > were market-transaction serial records.. Similar to those guys, the > write-ahead logfiles are records of every single DB action that isn't > read-only. > > Now, one could think "well if I've already committed the transactions to > the database, why do I need to keep the receipts?" And, well, strictly > speaking you don't -- but you really want to, because they could serve to > replay, *in perfect order, *the database activity for any given period. > This data is *gold *when it comes to, say, recovering from disaster. > > This is why everybody will tell you "don't just delete these files, > archive them properly!" Again, for operational purposes, you could just > delete them. But you really want to make a *copy *of them before you > do... you know, *just in case *something bad happens to your DB that > makes you want to roll it back in time. > > Enter the "archive_command" function. Instead of simply deleting them, yo= u > tell PG to pass the files on to another piece of software that knows what > to do with this highly valuable data. To avoid complicating matters > further, at this point I'll simply recommend that you use PGBackRest for > this, it's my favorite piece of software beside the PG server. > > Or, you know, you could just delete them. You really don't want to, > though. Get PGBackRest going and the concepts will click for you as you > progress along setting it up: https://pgbackrest.org/user-guide.html > > Cheers > Saul > > On Thu, Jan 23, 2025 at 9:40=E2=80=AFAM Paul Brindusa > wrote: > >> Hi Saul, >> >> Fantastic, highly appreciate that. >> Initially i've taken the top bit of Laurenz's post to understand what >> these wal files are. >> Any help is more than welcome, apologies for not checking that bit >> earlier. In the meantime i've checked those queries as well. >> >> Thank you >> >> >> On Thu, Jan 23, 2025 at 2:27=E2=80=AFPM Saul Perdomo >> wrote: >> >>> Hey Paul, >>> >>> Regarding >>> >>> *"I've not managed to test the queries out yet. But I am planning to >>> test out in my lab environment.* >>> *Sorry am really cautious about this as those are the main production >>> databases."* >>> >>> As a dispassionate third-party observer, I can confirm that all SELECT >>> and SHOW queries from Laurenz's blog post are read-only. They're comple= tely >>> safe to run in the affected environment. >>> >>> On Thu, Jan 23, 2025 at 6:40=E2=80=AFAM Paul Brindusa >>> wrote: >>> >>>> Hopefully the below is going to give a little bit more insight on the >>>> issue. >>>> I will mention as well that the cluster also replicates data to anothe= r >>>> mysql database if it's relevant at all. >>>> Also worth noting this is our production cluster and we have another >>>> pre-production cluster with basically the same settings and the issue = there >>>> does not occur. >>>> >>>> A good deal more information is needed to troubleshoot this: >>>> >>>> 1) Postgres version(s). >>>> >>>> postgres (PostgreSQL) 15.10 >>>> >>>> 2) The Patroni version. >>>> >>>> patroni 4.0.4 >>>> >>>> 3) The Patroni configuration. >>>> >>>> scope: postgres-cluster >>>> name: db01 >>>> namespace: /service/ >>>> >>>> log: >>>> level: INFO >>>> traceback_level: ERROR >>>> format: "%(asctime)s %(levelname)s: %(message)s" >>>> dateformat: "" >>>> max_queue_size: 1000 >>>> dir: /var/log/patroni >>>> file_num: 4 >>>> file_size: 25000000 >>>> loggers: >>>> patroni.postmaster: WARNING >>>> urllib3: WARNING >>>> >>>> restapi: >>>> listen: x.x.x.98:8008 >>>> connect_address: x.x.x.98:8008 >>>> >>>> etcd3: >>>> hosts: db01.local:2379,db02.local:2379,db03.local:2379 >>>> >>>> >>>> bootstrap: >>>> dcs: >>>> ttl: 30 >>>> loop_wait: 10 >>>> retry_timeout: 10 >>>> maximum_lag_on_failover: 1048576 >>>> postgresql: >>>> use_pg_rewind: true >>>> use_slots: true >>>> parameters: >>>> max_connections: 500 >>>> superuser_reserved_connections: 5 >>>> password_encryption: scram-sha-256 >>>> max_locks_per_transaction: 512 >>>> max_prepared_transactions: 0 >>>> huge_pages: try >>>> shared_buffers: 128MB >>>> effective_cache_size: 4GB >>>> work_mem: 128MB >>>> maintenance_work_mem: 256MB >>>> checkpoint_timeout: 15min >>>> checkpoint_completion_target: 0.9 >>>> min_wal_size: 80MB >>>> max_wal_size: 1GB >>>> wal_buffers: 32MB >>>> default_statistics_target: 1000 >>>> seq_page_cost: 1 >>>> random_page_cost: 4 >>>> effective_io_concurrency: 2 >>>> synchronous_commit: on >>>> autovacuum: on >>>> autovacuum_max_workers: 5 >>>> autovacuum_vacuum_scale_factor: 0.01 >>>> autovacuum_analyze_scale_factor: 0.01 >>>> autovacuum_vacuum_cost_limit: 500 >>>> autovacuum_vacuum_cost_delay: 2 >>>> autovacuum_naptime: 1s >>>> max_files_per_process: 4096 >>>> archive_mode: on >>>> archive_timeout: 1800s >>>> archive_command: cd . >>>> wal_level: replica >>>> wal_keep_size: 2GB >>>> max_wal_senders: 10 >>>> max_replication_slots: 10 >>>> hot_standby: on >>>> wal_log_hints: on >>>> wal_compression: on >>>> shared_preload_libraries: pgaudit >>>> track_io_timing: on >>>> log_lock_waits: on >>>> log_temp_files: 0 >>>> track_activities: on >>>> track_counts: on >>>> track_functions: all >>>> log_checkpoints: on >>>> logging_collector: on >>>> log_truncate_on_rotation: on >>>> log_rotation_age: 1d >>>> log_rotation_size: 1GB >>>> log_line_prefix: '%m [%p]: [%l-1] >>>> db=3D%d,user=3D%u,app=3D%a,client=3D%h ' >>>> log_filename: postgresql-%Y-%m-%d.log >>>> log_directory: /var/log/pgsql >>>> log_connections: on >>>> log_disconnections: on >>>> log_statement: ddl >>>> log_error_verbosity: verbose >>>> hot_standby_feedback: on >>>> max_standby_streaming_delay: 30s >>>> wal_receiver_status_interval: 10s >>>> idle_in_transaction_session_timeout: 10min >>>> jit: off >>>> max_worker_processes: 24 >>>> max_parallel_workers: 8 >>>> max_parallel_workers_per_gather: 2 >>>> max_parallel_maintenance_workers: 2 >>>> >>>> initdb: >>>> - encoding: UTF8 >>>> - data-checksums >>>> >>>> pg_hba: >>>> - host replication replicator 127.0.0.1/32 md5 >>>> >>>> - host replication replicator x.x.x.98/27 scram-sha-256 >>>> >>>> >>>> >>>> - host replication replicator x.x.x.99/27 scram-sha-256 >>>> >>>> >>>> >>>> - host replication replicator x.x.x.100/27 scram-sha-256 >>>> >>>> >>>> - host all all 0.0.0.0/0 md5 >>>> >>>> postgresql: >>>> listen: x.x.x.98:5432 >>>> connect_address: x.x.x.98:5432 >>>> data_dir: /var/lib/pgsql/data >>>> bin_dir: /usr/bin >>>> pgpass: /var/lib/pgsql/.pgpass_patroni >>>> authentication: >>>> replication: >>>> username: replicator >>>> password: password >>>> superuser: >>>> username: postgres >>>> password: password >>>> parameters: >>>> unix_socket_directories: /var/run/postgresql >>>> >>>> remove_data_directory_on_rewind_failure: false >>>> remove_data_directory_on_diverged_timelines: false >>>> >>>> create_replica_methods: >>>> - basebackup >>>> basebackup: >>>> max-rate: '100M' >>>> checkpoint: 'fast' >>>> >>>> watchdog: >>>> mode: required >>>> device: /dev/watchdog >>>> safety_margin: 5 >>>> >>>> tags: >>>> nofailover: false >>>> noloadbalance: false >>>> clonefrom: false >>>> nosync: false >>>> >>>> 4) Definition of 'ridiculous rate'. >>>> >>>> 1GB / day >>>> >>>> 5) Relevant information from the logs. >>>> >>>> Below entry is something taken off today's log until this point in >>>> time which I think it might be relevant. I cannot see any specifics. I= f >>>> there is anything else please let me know. >>>> >>>> 2: GMT [186889]: [863-1] db=3D,user=3D,app=3D,clie= nt=3D >>>> LOG: 00000: checkpoint starting: time >>>> 2: GMT [186889]: [864-1] db=3D,user=3D,app=3D,clie= nt=3D >>>> LOCATION: LogCheckpointStart, xlog.c:6121 >>>> 2: GMT [186889]: [865-1] db=3D,user=3D,app=3D,clie= nt=3D >>>> LOG: 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(= s) >>>> added, 0 removed, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total= =3D6.619 s; >>>> sync files=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 = kB, >>>> estimate=3D56426 kB >>>> 2: GMT [186889]: [866-1] db=3D,user=3D,app=3D,clie= nt=3D >>>> LOCATION: LogCheckpointEnd, xlog.c:6202 >>>> 2: GMT [2439188]: [7-1] >>>> db=3Ddocumentation-database,user=3Ddocumentation-database-user,app=3DP= ostgreSQL >>>> JDBC Driver,client=3D LOG: 00000: disconnection: session ti= me: >>>> 0: user=3Ddocumentation-database-user >>>> database=3Ddocumentation-database host=3D port=3D56170 >>>> >>>> >>>> @Laurenz >>>> >>>> I guess you are referring to >>>> https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing= / >>>> >>>> *Yes, that is the one.* >>>> >>>> I listed all the reasons I know for your predicament. >>>> Did you do some research along these lines? >>>> >>>> *I've had a look at the things that you have mentioned in the guide. * >>>> >>>> If yes, what did you find? >>>> >>>> *I've not managed to test the queries out yet. But I am planning to >>>> test out in my lab environment.* >>>> *Sorry am really cautious about this as those are the main production >>>> databases.* >>>> >>>> *Hope the above is going to give a bit of insight on the root cause of >>>> the problem.* >>>> >>>> >>>> >>>> Yours, >>>> Laurenz Albe >>>> >>>> >>>> >>>> On Wed, Jan 22, 2025 at 6:03=E2=80=AFPM Adrian Klaver < >>>> adrian.klaver@aklaver.com> wrote: >>>> >>>>> On 1/22/25 09:33, Paul Brindusa wrote: >>>>> > Good afternoon, >>>>> > >>>>> > Following below we are facing a similar issue and im getting a real >>>>> buzz >>>>> > to get this working myself, speaking to my DBA in the company has >>>>> > actually left me a bit cold as he is not good with postgres. >>>>> > >>>>> > So I want to try and get a solution for this and fix this issue wit= h >>>>> the >>>>> > pg_wal files filling up the drive at a ridiculous rate. I have been >>>>> > manually moving logs to a different directory but have had no luck >>>>> in >>>>> > finding an actual solution. >>>>> > >>>>> > The cluster is a 3 node cluster with HA which is running wirth >>>>> patroni. >>>>> > >>>>> > Please help me out, I will mention that I have test cluster spun up >>>>> in >>>>> > case something needs testing. >>>>> > >>>>> > Also want to give a shout out to Lorenz Albe's for posting stuff >>>>> about >>>>> > wal files on his company blog. >>>>> > >>>>> > Again any help will be greatly appreciated. >>>>> >>>>> A good deal more information is needed to troubleshoot this: >>>>> >>>>> 1) Postgres version(s). >>>>> >>>>> 2) The Patroni version. >>>>> >>>>> 3) The Patroni configuration. >>>>> >>>>> 4) Definition of 'ridiculous rate'. >>>>> >>>>> 5) Relevant information from the logs. >>>>> >>>>> > >>>>> > >>>>> > " On one of our postgres instances we have the pg_wal/data folder u= p >>>>> to >>>>> > 196GB, out of 200GB disk filled up. >>>>> > This has stopped the posgresql.service this morning causing two >>>>> > applications to crash. >>>>> > Unfortunately our database admin is on leave today, and we are >>>>> trying to >>>>> > figure out how to get the disk down? >>>>> > Any ideas or suggestions are more than welcome. >>>>> > >>>>> > Thank you in advance." >>>>> > >>>>> > >>>>> > -- >>>>> > Kind Regards, >>>>> > Paul Brindusa >>>>> > paulbrindusa88@gmail.com >>>>> > >>>>> >>>>> -- >>>>> Adrian Klaver >>>>> adrian.klaver@aklaver.com >>>>> >>>>> >>>> >>>> -- >>>> Kind Regards, >>>> Paul Brindusa >>>> paulbrindusa88@gmail.com >>>> >>>> >> >> -- >> Kind Regards, >> Paul Brindusa >> paulbrindusa88@gmail.com >> >> --=20 Kind Regards, Paul Brindusa paulbrindusa88@gmail.com --0000000000000df9bf062c61a9f5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Wow, hats off to you kind sir!
Will defo look into thi= s. Thank you for your support with this.

On Thu, Jan 23, 2025 at 2:51= =E2=80=AFPM Saul Perdomo <saul.perdomo@gmail.com> wrote:
In a nutshell: Remembe= r the ticker-tape of old? (You know, the festive paper strips tossed out of= high-rises by stockbrokers, that float on down on city parades after a wor= ld war is won in=C2=A0black and white movies?) Those were market-transactio= n serial records.. Similar to those guys, the write-ahead logfiles are reco= rds of every single DB action that isn't read-only.=C2=A0

Now, one could think "well if I've already committed the tra= nsactions=C2=A0to the database, why do I need to keep the receipts?" A= nd, well, strictly speaking you don't -- but you really want to, becaus= e they could serve to replay, in perfect order, the database activit= y for any given period. This data is gold when it comes to, say, rec= overing from disaster.

This is why everybody will = tell you "don't just delete these files, archive them properly!&qu= ot; Again, for operational purposes, you could just delete them. But you re= ally want to make a copy of them before=C2=A0you do... you know, = just in case something bad happens to your DB that makes you want to ro= ll it back in time.

Enter the "archive_comman= d" function. Instead of simply deleting them, you tell PG to pass the = files on to another piece of software that knows what to do with this highl= y valuable data. To avoid complicating matters further, at this point I'= ;ll simply recommend that you use PGBackRest for this, it's my favorite= piece of software beside the PG server.=C2=A0

Or,= you know, you could just delete them. You really don't want to, though= . Get PGBackRest going and the concepts will click for you as you progress = along setting it up:=C2=A0https://pgbackrest.org/user-guide.html
Cheers
Saul

On Thu, Jan 23, 2025 at 9:40=E2=80= =AFAM Paul Brindusa <paulbrindusa88@gmail.com> wrote:
Hi Saul,

Fantastic, highly appreciate that.
Initially i've taken=C2=A0 = the top bit of Laurenz's post to understand what these wal files are.
Any help is more than welcome, apologies for not checking that bit= earlier. In the meantime i've checked those queries as well.

Thank you


On Thu, Jan 23, 2025 at 2:27= =E2=80=AFPM Saul Perdomo <saul.perdomo@gmail.com> wrote:
Hey P= aul,

Regarding=C2=A0

"I've no= t managed to test the queries out yet. But I am planning to test out in my = lab environment.
Sorry am really cautious about thi= s as those are the main production databases."

A= s a dispassionate third-party observer, I can confirm that all SELECT and S= HOW queries from Laurenz's blog post are read-only. They're complet= ely safe to run in the affected environment.
=C2=A0=C2=A0
= On Thu, Jan 23, 2025 at 6:40=E2=80=AFAM Paul Brindusa <paulbrindusa88@gmail.com&g= t; wrote:
Hopefully the below is going to give a lit= tle bit more insight on the issue.
I will mention as well that th= e cluster also replicates data to another mysql database if it's releva= nt at all.
Also worth noting this is our production cluster and w= e have another pre-production cluster with basically the same settings and = the issue there does not occur.

A good deal more inform= ation is needed to troubleshoot this:

1) Postgres version(s).
postgres (PostgreSQL) 15.10

2) The Patroni version.

patroni 4.0.4

3) The Patroni configuration.
scope: postgres-cluster
name: db01
namespace: /service/

l= og:
=C2=A0 level: INFO
=C2=A0 traceback_level: ERROR
=C2=A0 format= : "%(asctime)s %(levelname)s: %(message)s"
=C2=A0 dateformat: = ""
=C2=A0 max_queue_size: 1000
=C2=A0 dir: /var/log/patroni=
=C2=A0 file_num: 4
=C2=A0 file_size: 25000000
=C2=A0 loggers:
= =C2=A0 =C2=A0 patroni.postmaster: WARNING
=C2=A0 =C2=A0 urllib3: WARNING=

restapi:
=C2=A0 listen: x.x.x.98:8008
=C2=A0 connect_address:= x.x.x.98:8008

etcd3:
=C2=A0 hosts: db01.local:2379,db02.local:23= 79,db03.local:2379


bootstrap:
=C2=A0 dcs:
=C2=A0 =C2=A0 tt= l: 30
=C2=A0 =C2=A0 loop_wait: 10
=C2=A0 =C2=A0 retry_timeout: 10
= =C2=A0 =C2=A0 maximum_lag_on_failover: 1048576
=C2=A0 =C2=A0 postgresql:=
=C2=A0 =C2=A0 =C2=A0 use_pg_rewind: true
=C2=A0 =C2=A0 =C2=A0 use_sl= ots: true
=C2=A0 =C2=A0 =C2=A0 parameters:
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 max_connections: 500
=C2=A0 =C2=A0 =C2=A0 =C2=A0 superuser_reserved_= connections: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 password_encryption: scram-sh= a-256
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_locks_per_transaction: 512
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 max_prepared_transactions: 0
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 huge_pages: try =C2=A0 =C2=A0 =C2=A0
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 shared_buffers: 128MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_cache_si= ze: 4GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 work_mem: 128MB
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 maintenance_work_mem: 256MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 check= point_timeout: 15min
=C2=A0 =C2=A0 =C2=A0 =C2=A0 checkpoint_completion_t= arget: 0.9
=C2=A0 =C2=A0 =C2=A0 =C2=A0 min_wal_size: 80MB
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 max_wal_size: 1GB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_buff= ers: 32MB
=C2=A0 =C2=A0 =C2=A0 =C2=A0 default_statistics_target: 1000=C2=A0 =C2=A0 =C2=A0 =C2=A0 seq_page_cost: 1
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 random_page_cost: 4
=C2=A0 =C2=A0 =C2=A0 =C2=A0 effective_io_concurr= ency: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 synchronous_commit: on
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 autovacuum: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_= max_workers: 5
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_scale_facto= r: 0.01
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_analyze_scale_factor: 0.0= 1
=C2=A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_limit: 500
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 autovacuum_vacuum_cost_delay: 2
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 autovacuum_naptime: 1s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_fil= es_per_process: 4096
=C2=A0 =C2=A0 =C2=A0 =C2=A0 archive_mode: on
=C2= =A0 =C2=A0 =C2=A0 =C2=A0 archive_timeout: 1800s
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 archive_command: cd .
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_level: replica=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_keep_size: 2GB
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 max_wal_senders: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_replication_s= lots: 10
=C2=A0 =C2=A0 =C2=A0 =C2=A0 hot_standby: on
=C2=A0 =C2=A0 = =C2=A0 =C2=A0 wal_log_hints: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_compress= ion: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 shared_preload_libraries: pgaudit=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_io_timing: on
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 log_lock_waits: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_temp_files: 0<= br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_activities: on
=C2=A0 =C2=A0 =C2=A0= =C2=A0 track_counts: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 track_functions: al= l
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_checkpoints: on
=C2=A0 =C2=A0 =C2= =A0 =C2=A0 logging_collector: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_truncat= e_on_rotation: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_age: 1d
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 log_rotation_size: 1GB
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 log_line_prefix: '%m [%p]: [%l-1] db=3D%d,user=3D%u,app=3D%a,cli= ent=3D%h '
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_filename: postgresql-%Y-%= m-%d.log
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_directory: /var/log/pgsql
= =C2=A0 =C2=A0 =C2=A0 =C2=A0 log_connections: on
=C2=A0 =C2=A0 =C2=A0 = =C2=A0 log_disconnections: on
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_statement:= ddl
=C2=A0 =C2=A0 =C2=A0 =C2=A0 log_error_verbosity: verbose
=C2=A0 = =C2=A0 =C2=A0 =C2=A0 hot_standby_feedback: on
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 max_standby_streaming_delay: 30s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 wal_rec= eiver_status_interval: 10s
=C2=A0 =C2=A0 =C2=A0 =C2=A0 idle_in_transacti= on_session_timeout: 10min
=C2=A0 =C2=A0 =C2=A0 =C2=A0 jit: off
=C2=A0= =C2=A0 =C2=A0 =C2=A0 max_worker_processes: 24
=C2=A0 =C2=A0 =C2=A0 =C2= =A0 max_parallel_workers: 8
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_wor= kers_per_gather: 2
=C2=A0 =C2=A0 =C2=A0 =C2=A0 max_parallel_maintenance_= workers: 2 =C2=A0 =C2=A0 =C2=A0

=C2=A0 initdb:
=C2=A0 - encoding:= UTF8
=C2=A0 - data-checksums

=C2=A0 pg_hba:
=C2=A0 - host rep= lication replicator 127.0= .0.1/32 md5
=C2=A0
=C2=A0 - host replication replicator x.x.x.98= /27 scram-sha-256
=C2=A0
=C2=A0
=C2=A0
=C2=A0 - host replica= tion replicator x.x.x.99/27 scram-sha-256
=C2=A0
=C2=A0
=C2=A0 <= br>=C2=A0 - host replication replicator x.x.x.100/27 scram-sha-256
=C2= =A0
=C2=A0 =C2=A0
=C2=A0 - host all all 0.0.0.0/0 md5

postgresql:
=C2=A0 listen: = x.x.x.98:5432
=C2=A0 connect_address: x.x.x.98:5432
=C2=A0 data_dir: = /var/lib/pgsql/data
=C2=A0 bin_dir: /usr/bin
=C2=A0 pgpass: /var/lib/= pgsql/.pgpass_patroni
=C2=A0 authentication:
=C2=A0 =C2=A0 replicatio= n:
=C2=A0 =C2=A0 =C2=A0 username: replicator
=C2=A0 =C2=A0 =C2=A0 pas= sword: password
=C2=A0 =C2=A0 superuser:
=C2=A0 =C2=A0 =C2=A0 usernam= e: postgres
=C2=A0 =C2=A0 =C2=A0 password: password
=C2=A0 parameters= :
=C2=A0 =C2=A0 unix_socket_directories: /var/run/postgresql

=C2= =A0 remove_data_directory_on_rewind_failure: false
=C2=A0 remove_data_di= rectory_on_diverged_timelines: false

=C2=A0 create_replica_methods:<= br>=C2=A0 =C2=A0 - basebackup
=C2=A0 basebackup:
=C2=A0 =C2=A0 max-ra= te: '100M'
=C2=A0 =C2=A0 checkpoint: 'fast' =C2=A0 =C2= =A0 =C2=A0

watchdog:
=C2=A0 mode: required
=C2=A0 device: /dev= /watchdog
=C2=A0 safety_margin: 5

tags:
=C2=A0 nofailover: fal= se
=C2=A0 noloadbalance: false
=C2=A0 clonefrom: false
=C2=A0 nosy= nc: false

4) Definition of 'ridiculous rate'.

1GB / d= ay

5) Relevant information from the logs.

B= elow entry is something taken off today's log=C2=A0 until this point in= time which I think it might be relevant. I cannot see any specifics. If th= ere is anything else please let me know.=C2=A0

2&l= t;REDACTED>:<REDACTED> GMT [186889]: [863-1] db=3D,user=3D,app=3D,= client=3D LOG: =C2=A000000: checkpoint starting: time
2<REDACTED>:= <REDACTED> GMT [186889]: [864-1] db=3D,user=3D,app=3D,client=3D LOCAT= ION: =C2=A0LogCheckpointStart, xlog.c:6121
2<REDACTED>:<REDACTE= D> GMT [186889]: [865-1] db=3D,user=3D,app=3D,client=3D LOG: =C2=A000000= : checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) added, 0 remo= ved, 0 recycled; write=3D6.563 s, sync=3D0.003 s, total=3D6.619 s; sync fil= es=3D22, longest=3D0.002 s, average=3D0.001 s; distance=3D776 kB, estimate= =3D56426 kB
2<REDACTED>:<REDACTED> GMT [186889]: [866-1] db= =3D,user=3D,app=3D,client=3D LOCATION: =C2=A0LogCheckpointEnd, xlog.c:6202<= br>2<REDACTED>:<REDACTED> GMT [2439188]: [7-1] db=3Ddocumentati= on-database,user=3Ddocumentation-database-user,app=3DPostgreSQL JDBC Driver= ,client=3D<REDACTED> LOG: =C2=A000000: disconnection: session time: 0= :<REDACTED> user=3Ddocumentation-database-user database=3Ddocumentati= on-database host=3D<REDACTED> port=3D56170

<= br>

I guess you are referring to
https://www.cybertec-postgresql.= com/en/why-does-my-pg_wal-keep-growing/

Yes, that is the one.=

I listed all the reasons I know for your predicament.
Did yo= u do some research along these lines?

I've = had a look at the things that you have mentioned in the guide.=C2=A0

If yes, what did you find?

I= 9;ve not managed to test the queries out yet. But I am planning to test out= in my lab environment.
Sorry am really cautious about thi= s as those are the main production databases.

Hope the above is going to give a bit of insight on the root caus= e of the problem.



Yours,
Laurenz Albe



On Wed, Jan 22,= 2025 at 6:03=E2=80=AFPM Adrian Klaver <adrian.klaver@aklaver.com> wrote:
=
On 1/22/25 09:33, P= aul Brindusa wrote:
> Good afternoon,
>
> Following below we are facing a similar issue and im getting a real bu= zz
> to get this working myself, speaking to my DBA=C2=A0 in the=C2=A0 comp= any has
> actually left me a bit cold as he is not good with postgres.
>
> So I want to try and get a solution for this and fix this issue with t= he
> pg_wal files filling up the drive at a ridiculous rate. I have been > manually moving logs to a different directory but have had no luck in =
> finding an actual solution.
>
> The cluster is a 3 node cluster with HA which is running wirth patroni= .
>
> Please help me out, I will mention that I have test cluster spun up in=
> case something needs testing.
>
> Also want to give a shout out to Lorenz Albe's for posting stuff a= bout
> wal files on his company blog.
>
> Again any help will be greatly appreciated.

A good deal more information is needed to troubleshoot this:

1) Postgres version(s).

2) The Patroni version.

3) The Patroni configuration.

4) Definition of 'ridiculous rate'.

5) Relevant information from the logs.

>
>
> " On one of our postgres instances we have the pg_wal/data folder= up to
> 196GB, out of 200GB disk filled up.
> This has stopped the posgresql.service this morning causing two
> applications to crash.
> Unfortunately our database admin is on leave today, and we are trying = to
> figure out how to get the disk down?
> Any ideas or suggestions are more than welcome.
>
> Thank you in advance."
>
>
> --
> Kind Regards,
> Paul Brindusa
> paulbrin= dusa88@gmail.com <mailto:paulbrindusa88@gmail.com>
>

--
Adrian Klaver
adrian.klave= r@aklaver.com



--
Kind Regards,
Paul Brindusa
=



--
Kind Regards,
Paul Brindusa
=



--
Kind Regards,
Paul Brindusa
=

--0000000000000df9bf062c61a9f5--