public inbox for [email protected]help / color / mirror / Atom feed
Re: Return of the pg_wal issue.. 5+ messages / 3 participants [nested] [flat]
* Re: Return of the pg_wal issue.. @ 2025-01-22 18:03 Adrian Klaver <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Adrian Klaver @ 2025-01-22 18:03 UTC (permalink / raw) To: Paul Brindusa <[email protected]>; pgsql-general On 1/22/25 09:33, Paul Brindusa wrote: > Good afternoon, > > Following below we are facing a similar issue and im getting a real buzz > to get this working myself, speaking to my DBA in the company has > actually left me a bit cold as he is not good with postgres. > > So I want to try and get a solution for this and fix this issue with the > pg_wal files filling up the drive at a ridiculous rate. I have been > manually moving logs to a different directory but have had no luck in > finding an actual solution. > > The cluster is a 3 node cluster with HA which is running wirth patroni. > > Please help me out, I will mention that I have test cluster spun up in > case something needs testing. > > Also want to give a shout out to Lorenz Albe's for posting stuff about > wal files on his company blog. > > Again any help will be greatly appreciated. A good deal more information is needed to troubleshoot this: 1) Postgres version(s). 2) The Patroni version. 3) The Patroni configuration. 4) Definition of 'ridiculous rate'. 5) Relevant information from the logs. > > > " On one of our postgres instances we have the pg_wal/data folder up to > 196GB, out of 200GB disk filled up. > This has stopped the posgresql.service this morning causing two > applications to crash. > Unfortunately our database admin is on leave today, and we are trying to > figure out how to get the disk down? > Any ideas or suggestions are more than welcome. > > Thank you in advance." > > > -- > Kind Regards, > Paul Brindusa > [email protected] <mailto:[email protected]> > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Return of the pg_wal issue.. @ 2025-01-23 11:40 Paul Brindusa <[email protected]> parent: Adrian Klaver <[email protected]> 0 siblings, 2 replies; 5+ messages in thread From: Paul Brindusa @ 2025-01-23 11:40 UTC (permalink / raw) To: Adrian Klaver <[email protected]>; +Cc: pgsql-general Hopefully the below is going to give a little bit more insight on the issue. I will mention as well that the cluster also replicates data to another mysql database if it's relevant at all. Also worth noting this is our production cluster and we have another pre-production cluster with basically the same settings and the issue there does not occur. A good deal more information is needed to troubleshoot this: 1) Postgres version(s). postgres (PostgreSQL) 15.10 2) The Patroni version. patroni 4.0.4 3) The Patroni configuration. scope: postgres-cluster name: db01 namespace: /service/ log: level: INFO traceback_level: ERROR format: "%(asctime)s %(levelname)s: %(message)s" dateformat: "" max_queue_size: 1000 dir: /var/log/patroni file_num: 4 file_size: 25000000 loggers: patroni.postmaster: WARNING urllib3: WARNING restapi: listen: x.x.x.98:8008 connect_address: x.x.x.98:8008 etcd3: hosts: db01.local:2379,db02.local:2379,db03.local:2379 bootstrap: dcs: ttl: 30 loop_wait: 10 retry_timeout: 10 maximum_lag_on_failover: 1048576 postgresql: use_pg_rewind: true use_slots: true parameters: max_connections: 500 superuser_reserved_connections: 5 password_encryption: scram-sha-256 max_locks_per_transaction: 512 max_prepared_transactions: 0 huge_pages: try shared_buffers: 128MB effective_cache_size: 4GB work_mem: 128MB maintenance_work_mem: 256MB checkpoint_timeout: 15min checkpoint_completion_target: 0.9 min_wal_size: 80MB max_wal_size: 1GB wal_buffers: 32MB default_statistics_target: 1000 seq_page_cost: 1 random_page_cost: 4 effective_io_concurrency: 2 synchronous_commit: on autovacuum: on autovacuum_max_workers: 5 autovacuum_vacuum_scale_factor: 0.01 autovacuum_analyze_scale_factor: 0.01 autovacuum_vacuum_cost_limit: 500 autovacuum_vacuum_cost_delay: 2 autovacuum_naptime: 1s max_files_per_process: 4096 archive_mode: on archive_timeout: 1800s archive_command: cd . wal_level: replica wal_keep_size: 2GB max_wal_senders: 10 max_replication_slots: 10 hot_standby: on wal_log_hints: on wal_compression: on shared_preload_libraries: pgaudit track_io_timing: on log_lock_waits: on log_temp_files: 0 track_activities: on track_counts: on track_functions: all log_checkpoints: on logging_collector: on log_truncate_on_rotation: on log_rotation_age: 1d log_rotation_size: 1GB log_line_prefix: '%m [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h ' log_filename: postgresql-%Y-%m-%d.log log_directory: /var/log/pgsql log_connections: on log_disconnections: on log_statement: ddl log_error_verbosity: verbose hot_standby_feedback: on max_standby_streaming_delay: 30s wal_receiver_status_interval: 10s idle_in_transaction_session_timeout: 10min jit: off max_worker_processes: 24 max_parallel_workers: 8 max_parallel_workers_per_gather: 2 max_parallel_maintenance_workers: 2 initdb: - encoding: UTF8 - data-checksums pg_hba: - host replication replicator 127.0.0.1/32 md5 - host replication replicator x.x.x.98/27 scram-sha-256 - host replication replicator x.x.x.99/27 scram-sha-256 - host replication replicator x.x.x.100/27 scram-sha-256 - host all all 0.0.0.0/0 md5 postgresql: listen: x.x.x.98:5432 connect_address: x.x.x.98:5432 data_dir: /var/lib/pgsql/data bin_dir: /usr/bin pgpass: /var/lib/pgsql/.pgpass_patroni authentication: replication: username: replicator password: password superuser: username: postgres password: password parameters: unix_socket_directories: /var/run/postgresql remove_data_directory_on_rewind_failure: false remove_data_directory_on_diverged_timelines: false create_replica_methods: - basebackup basebackup: max-rate: '100M' checkpoint: 'fast' watchdog: mode: required device: /dev/watchdog safety_margin: 5 tags: nofailover: false noloadbalance: false clonefrom: false nosync: false 4) Definition of 'ridiculous rate'. 1GB / day 5) Relevant information from the logs. Below entry is something taken off today's log until this point in time which I think it might be relevant. I cannot see any specifics. If there is anything else please let me know. 2<REDACTED>:<REDACTED> GMT [186889]: [863-1] db=,user=,app=,client= LOG: 00000: checkpoint starting: time 2<REDACTED>:<REDACTED> GMT [186889]: [864-1] db=,user=,app=,client= LOCATION: LogCheckpointStart, xlog.c:6121 2<REDACTED>:<REDACTED> GMT [186889]: [865-1] db=,user=,app=,client= LOG: 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) added, 0 removed, 0 recycled; write=6.563 s, sync=0.003 s, total=6.619 s; sync files=22, longest=0.002 s, average=0.001 s; distance=776 kB, estimate=56426 kB 2<REDACTED>:<REDACTED> GMT [186889]: [866-1] db=,user=,app=,client= LOCATION: LogCheckpointEnd, xlog.c:6202 2<REDACTED>:<REDACTED> GMT [2439188]: [7-1] db=documentation-database,user=documentation-database-user,app=PostgreSQL JDBC Driver,client=<REDACTED> LOG: 00000: disconnection: session time: 0:<REDACTED> user=documentation-database-user database=documentation-database host=<REDACTED> port=56170 @Laurenz I guess you are referring to https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ *Yes, that is the one.* I listed all the reasons I know for your predicament. Did you do some research along these lines? *I've had a look at the things that you have mentioned in the guide. * If yes, what did you find? *I've not managed to test the queries out yet. But I am planning to test out in my lab environment.* *Sorry am really cautious about this as those are the main production databases.* *Hope the above is going to give a bit of insight on the root cause of the problem.* Yours, Laurenz Albe On Wed, Jan 22, 2025 at 6:03 PM Adrian Klaver <[email protected]> wrote: > On 1/22/25 09:33, Paul Brindusa wrote: > > Good afternoon, > > > > Following below we are facing a similar issue and im getting a real buzz > > to get this working myself, speaking to my DBA in the company has > > actually left me a bit cold as he is not good with postgres. > > > > So I want to try and get a solution for this and fix this issue with the > > pg_wal files filling up the drive at a ridiculous rate. I have been > > manually moving logs to a different directory but have had no luck in > > finding an actual solution. > > > > The cluster is a 3 node cluster with HA which is running wirth patroni. > > > > Please help me out, I will mention that I have test cluster spun up in > > case something needs testing. > > > > Also want to give a shout out to Lorenz Albe's for posting stuff about > > wal files on his company blog. > > > > Again any help will be greatly appreciated. > > A good deal more information is needed to troubleshoot this: > > 1) Postgres version(s). > > 2) The Patroni version. > > 3) The Patroni configuration. > > 4) Definition of 'ridiculous rate'. > > 5) Relevant information from the logs. > > > > > > > " On one of our postgres instances we have the pg_wal/data folder up to > > 196GB, out of 200GB disk filled up. > > This has stopped the posgresql.service this morning causing two > > applications to crash. > > Unfortunately our database admin is on leave today, and we are trying to > > figure out how to get the disk down? > > Any ideas or suggestions are more than welcome. > > > > Thank you in advance." > > > > > > -- > > Kind Regards, > > Paul Brindusa > > [email protected] <mailto:[email protected]> > > > > -- > Adrian Klaver > [email protected] > > -- Kind Regards, Paul Brindusa [email protected] ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Return of the pg_wal issue.. @ 2025-01-23 14:26 Saul Perdomo <[email protected]> parent: Paul Brindusa <[email protected]> 1 sibling, 1 reply; 5+ messages in thread From: Saul Perdomo @ 2025-01-23 14:26 UTC (permalink / raw) To: Paul Brindusa <[email protected]>; +Cc: pgsql-general Hey Paul, Regarding *"I've not managed to test the queries out yet. But I am planning to test out in my lab environment.* *Sorry am really cautious about this as those are the main production databases."* As a dispassionate third-party observer, I can confirm that all SELECT and SHOW queries from Laurenz's blog post are read-only. They're completely safe to run in the affected environment. On Thu, Jan 23, 2025 at 6:40 AM Paul Brindusa <[email protected]> wrote: > Hopefully the below is going to give a little bit more insight on the > issue. > I will mention as well that the cluster also replicates data to another > mysql database if it's relevant at all. > Also worth noting this is our production cluster and we have another > pre-production cluster with basically the same settings and the issue there > does not occur. > > A good deal more information is needed to troubleshoot this: > > 1) Postgres version(s). > > postgres (PostgreSQL) 15.10 > > 2) The Patroni version. > > patroni 4.0.4 > > 3) The Patroni configuration. > > scope: postgres-cluster > name: db01 > namespace: /service/ > > log: > level: INFO > traceback_level: ERROR > format: "%(asctime)s %(levelname)s: %(message)s" > dateformat: "" > max_queue_size: 1000 > dir: /var/log/patroni > file_num: 4 > file_size: 25000000 > loggers: > patroni.postmaster: WARNING > urllib3: WARNING > > restapi: > listen: x.x.x.98:8008 > connect_address: x.x.x.98:8008 > > etcd3: > hosts: db01.local:2379,db02.local:2379,db03.local:2379 > > > bootstrap: > dcs: > ttl: 30 > loop_wait: 10 > retry_timeout: 10 > maximum_lag_on_failover: 1048576 > postgresql: > use_pg_rewind: true > use_slots: true > parameters: > max_connections: 500 > superuser_reserved_connections: 5 > password_encryption: scram-sha-256 > max_locks_per_transaction: 512 > max_prepared_transactions: 0 > huge_pages: try > shared_buffers: 128MB > effective_cache_size: 4GB > work_mem: 128MB > maintenance_work_mem: 256MB > checkpoint_timeout: 15min > checkpoint_completion_target: 0.9 > min_wal_size: 80MB > max_wal_size: 1GB > wal_buffers: 32MB > default_statistics_target: 1000 > seq_page_cost: 1 > random_page_cost: 4 > effective_io_concurrency: 2 > synchronous_commit: on > autovacuum: on > autovacuum_max_workers: 5 > autovacuum_vacuum_scale_factor: 0.01 > autovacuum_analyze_scale_factor: 0.01 > autovacuum_vacuum_cost_limit: 500 > autovacuum_vacuum_cost_delay: 2 > autovacuum_naptime: 1s > max_files_per_process: 4096 > archive_mode: on > archive_timeout: 1800s > archive_command: cd . > wal_level: replica > wal_keep_size: 2GB > max_wal_senders: 10 > max_replication_slots: 10 > hot_standby: on > wal_log_hints: on > wal_compression: on > shared_preload_libraries: pgaudit > track_io_timing: on > log_lock_waits: on > log_temp_files: 0 > track_activities: on > track_counts: on > track_functions: all > log_checkpoints: on > logging_collector: on > log_truncate_on_rotation: on > log_rotation_age: 1d > log_rotation_size: 1GB > log_line_prefix: '%m [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h ' > log_filename: postgresql-%Y-%m-%d.log > log_directory: /var/log/pgsql > log_connections: on > log_disconnections: on > log_statement: ddl > log_error_verbosity: verbose > hot_standby_feedback: on > max_standby_streaming_delay: 30s > wal_receiver_status_interval: 10s > idle_in_transaction_session_timeout: 10min > jit: off > max_worker_processes: 24 > max_parallel_workers: 8 > max_parallel_workers_per_gather: 2 > max_parallel_maintenance_workers: 2 > > initdb: > - encoding: UTF8 > - data-checksums > > pg_hba: > - host replication replicator 127.0.0.1/32 md5 > > - host replication replicator x.x.x.98/27 scram-sha-256 > > > > - host replication replicator x.x.x.99/27 scram-sha-256 > > > > - host replication replicator x.x.x.100/27 scram-sha-256 > > > - host all all 0.0.0.0/0 md5 > > postgresql: > listen: x.x.x.98:5432 > connect_address: x.x.x.98:5432 > data_dir: /var/lib/pgsql/data > bin_dir: /usr/bin > pgpass: /var/lib/pgsql/.pgpass_patroni > authentication: > replication: > username: replicator > password: password > superuser: > username: postgres > password: password > parameters: > unix_socket_directories: /var/run/postgresql > > remove_data_directory_on_rewind_failure: false > remove_data_directory_on_diverged_timelines: false > > create_replica_methods: > - basebackup > basebackup: > max-rate: '100M' > checkpoint: 'fast' > > watchdog: > mode: required > device: /dev/watchdog > safety_margin: 5 > > tags: > nofailover: false > noloadbalance: false > clonefrom: false > nosync: false > > 4) Definition of 'ridiculous rate'. > > 1GB / day > > 5) Relevant information from the logs. > > Below entry is something taken off today's log until this point in time > which I think it might be relevant. I cannot see any specifics. If there is > anything else please let me know. > > 2<REDACTED>:<REDACTED> GMT [186889]: [863-1] db=,user=,app=,client= LOG: > 00000: checkpoint starting: time > 2<REDACTED>:<REDACTED> GMT [186889]: [864-1] db=,user=,app=,client= > LOCATION: LogCheckpointStart, xlog.c:6121 > 2<REDACTED>:<REDACTED> GMT [186889]: [865-1] db=,user=,app=,client= LOG: > 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) added, > 0 removed, 0 recycled; write=6.563 s, sync=0.003 s, total=6.619 s; sync > files=22, longest=0.002 s, average=0.001 s; distance=776 kB, estimate=56426 > kB > 2<REDACTED>:<REDACTED> GMT [186889]: [866-1] db=,user=,app=,client= > LOCATION: LogCheckpointEnd, xlog.c:6202 > 2<REDACTED>:<REDACTED> GMT [2439188]: [7-1] > db=documentation-database,user=documentation-database-user,app=PostgreSQL > JDBC Driver,client=<REDACTED> LOG: 00000: disconnection: session time: > 0:<REDACTED> user=documentation-database-user > database=documentation-database host=<REDACTED> port=56170 > > > @Laurenz > > I guess you are referring to > https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ > > *Yes, that is the one.* > > I listed all the reasons I know for your predicament. > Did you do some research along these lines? > > *I've had a look at the things that you have mentioned in the guide. * > > If yes, what did you find? > > *I've not managed to test the queries out yet. But I am planning to test > out in my lab environment.* > *Sorry am really cautious about this as those are the main production > databases.* > > *Hope the above is going to give a bit of insight on the root cause of the > problem.* > > > > Yours, > Laurenz Albe > > > > On Wed, Jan 22, 2025 at 6:03 PM Adrian Klaver <[email protected]> > wrote: > >> On 1/22/25 09:33, Paul Brindusa wrote: >> > Good afternoon, >> > >> > Following below we are facing a similar issue and im getting a real >> buzz >> > to get this working myself, speaking to my DBA in the company has >> > actually left me a bit cold as he is not good with postgres. >> > >> > So I want to try and get a solution for this and fix this issue with >> the >> > pg_wal files filling up the drive at a ridiculous rate. I have been >> > manually moving logs to a different directory but have had no luck in >> > finding an actual solution. >> > >> > The cluster is a 3 node cluster with HA which is running wirth patroni. >> > >> > Please help me out, I will mention that I have test cluster spun up in >> > case something needs testing. >> > >> > Also want to give a shout out to Lorenz Albe's for posting stuff about >> > wal files on his company blog. >> > >> > Again any help will be greatly appreciated. >> >> A good deal more information is needed to troubleshoot this: >> >> 1) Postgres version(s). >> >> 2) The Patroni version. >> >> 3) The Patroni configuration. >> >> 4) Definition of 'ridiculous rate'. >> >> 5) Relevant information from the logs. >> >> > >> > >> > " On one of our postgres instances we have the pg_wal/data folder up to >> > 196GB, out of 200GB disk filled up. >> > This has stopped the posgresql.service this morning causing two >> > applications to crash. >> > Unfortunately our database admin is on leave today, and we are trying >> to >> > figure out how to get the disk down? >> > Any ideas or suggestions are more than welcome. >> > >> > Thank you in advance." >> > >> > >> > -- >> > Kind Regards, >> > Paul Brindusa >> > [email protected] <mailto:[email protected]> >> > >> >> -- >> Adrian Klaver >> [email protected] >> >> > > -- > Kind Regards, > Paul Brindusa > [email protected] > > ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Return of the pg_wal issue.. @ 2025-01-23 14:40 Paul Brindusa <[email protected]> parent: Saul Perdomo <[email protected]> 0 siblings, 0 replies; 5+ messages in thread From: Paul Brindusa @ 2025-01-23 14:40 UTC (permalink / raw) To: Saul Perdomo <[email protected]>; +Cc: pgsql-general Hi Saul, Fantastic, highly appreciate that. Initially i've taken the top bit of Laurenz's post to understand what these wal files are. Any help is more than welcome, apologies for not checking that bit earlier. In the meantime i've checked those queries as well. Thank you On Thu, Jan 23, 2025 at 2:27 PM Saul Perdomo <[email protected]> wrote: > Hey Paul, > > Regarding > > *"I've not managed to test the queries out yet. But I am planning to test > out in my lab environment.* > *Sorry am really cautious about this as those are the main production > databases."* > > As a dispassionate third-party observer, I can confirm that all SELECT and > SHOW queries from Laurenz's blog post are read-only. They're completely > safe to run in the affected environment. > > On Thu, Jan 23, 2025 at 6:40 AM Paul Brindusa <[email protected]> > wrote: > >> Hopefully the below is going to give a little bit more insight on the >> issue. >> I will mention as well that the cluster also replicates data to another >> mysql database if it's relevant at all. >> Also worth noting this is our production cluster and we have another >> pre-production cluster with basically the same settings and the issue there >> does not occur. >> >> A good deal more information is needed to troubleshoot this: >> >> 1) Postgres version(s). >> >> postgres (PostgreSQL) 15.10 >> >> 2) The Patroni version. >> >> patroni 4.0.4 >> >> 3) The Patroni configuration. >> >> scope: postgres-cluster >> name: db01 >> namespace: /service/ >> >> log: >> level: INFO >> traceback_level: ERROR >> format: "%(asctime)s %(levelname)s: %(message)s" >> dateformat: "" >> max_queue_size: 1000 >> dir: /var/log/patroni >> file_num: 4 >> file_size: 25000000 >> loggers: >> patroni.postmaster: WARNING >> urllib3: WARNING >> >> restapi: >> listen: x.x.x.98:8008 >> connect_address: x.x.x.98:8008 >> >> etcd3: >> hosts: db01.local:2379,db02.local:2379,db03.local:2379 >> >> >> bootstrap: >> dcs: >> ttl: 30 >> loop_wait: 10 >> retry_timeout: 10 >> maximum_lag_on_failover: 1048576 >> postgresql: >> use_pg_rewind: true >> use_slots: true >> parameters: >> max_connections: 500 >> superuser_reserved_connections: 5 >> password_encryption: scram-sha-256 >> max_locks_per_transaction: 512 >> max_prepared_transactions: 0 >> huge_pages: try >> shared_buffers: 128MB >> effective_cache_size: 4GB >> work_mem: 128MB >> maintenance_work_mem: 256MB >> checkpoint_timeout: 15min >> checkpoint_completion_target: 0.9 >> min_wal_size: 80MB >> max_wal_size: 1GB >> wal_buffers: 32MB >> default_statistics_target: 1000 >> seq_page_cost: 1 >> random_page_cost: 4 >> effective_io_concurrency: 2 >> synchronous_commit: on >> autovacuum: on >> autovacuum_max_workers: 5 >> autovacuum_vacuum_scale_factor: 0.01 >> autovacuum_analyze_scale_factor: 0.01 >> autovacuum_vacuum_cost_limit: 500 >> autovacuum_vacuum_cost_delay: 2 >> autovacuum_naptime: 1s >> max_files_per_process: 4096 >> archive_mode: on >> archive_timeout: 1800s >> archive_command: cd . >> wal_level: replica >> wal_keep_size: 2GB >> max_wal_senders: 10 >> max_replication_slots: 10 >> hot_standby: on >> wal_log_hints: on >> wal_compression: on >> shared_preload_libraries: pgaudit >> track_io_timing: on >> log_lock_waits: on >> log_temp_files: 0 >> track_activities: on >> track_counts: on >> track_functions: all >> log_checkpoints: on >> logging_collector: on >> log_truncate_on_rotation: on >> log_rotation_age: 1d >> log_rotation_size: 1GB >> log_line_prefix: '%m [%p]: [%l-1] db=%d,user=%u,app=%a,client=%h ' >> log_filename: postgresql-%Y-%m-%d.log >> log_directory: /var/log/pgsql >> log_connections: on >> log_disconnections: on >> log_statement: ddl >> log_error_verbosity: verbose >> hot_standby_feedback: on >> max_standby_streaming_delay: 30s >> wal_receiver_status_interval: 10s >> idle_in_transaction_session_timeout: 10min >> jit: off >> max_worker_processes: 24 >> max_parallel_workers: 8 >> max_parallel_workers_per_gather: 2 >> max_parallel_maintenance_workers: 2 >> >> initdb: >> - encoding: UTF8 >> - data-checksums >> >> pg_hba: >> - host replication replicator 127.0.0.1/32 md5 >> >> - host replication replicator x.x.x.98/27 scram-sha-256 >> >> >> >> - host replication replicator x.x.x.99/27 scram-sha-256 >> >> >> >> - host replication replicator x.x.x.100/27 scram-sha-256 >> >> >> - host all all 0.0.0.0/0 md5 >> >> postgresql: >> listen: x.x.x.98:5432 >> connect_address: x.x.x.98:5432 >> data_dir: /var/lib/pgsql/data >> bin_dir: /usr/bin >> pgpass: /var/lib/pgsql/.pgpass_patroni >> authentication: >> replication: >> username: replicator >> password: password >> superuser: >> username: postgres >> password: password >> parameters: >> unix_socket_directories: /var/run/postgresql >> >> remove_data_directory_on_rewind_failure: false >> remove_data_directory_on_diverged_timelines: false >> >> create_replica_methods: >> - basebackup >> basebackup: >> max-rate: '100M' >> checkpoint: 'fast' >> >> watchdog: >> mode: required >> device: /dev/watchdog >> safety_margin: 5 >> >> tags: >> nofailover: false >> noloadbalance: false >> clonefrom: false >> nosync: false >> >> 4) Definition of 'ridiculous rate'. >> >> 1GB / day >> >> 5) Relevant information from the logs. >> >> Below entry is something taken off today's log until this point in time >> which I think it might be relevant. I cannot see any specifics. If there is >> anything else please let me know. >> >> 2<REDACTED>:<REDACTED> GMT [186889]: [863-1] db=,user=,app=,client= LOG: >> 00000: checkpoint starting: time >> 2<REDACTED>:<REDACTED> GMT [186889]: [864-1] db=,user=,app=,client= >> LOCATION: LogCheckpointStart, xlog.c:6121 >> 2<REDACTED>:<REDACTED> GMT [186889]: [865-1] db=,user=,app=,client= LOG: >> 00000: checkpoint complete: wrote 66 buffers (0.4%); 0 WAL file(s) added, >> 0 removed, 0 recycled; write=6.563 s, sync=0.003 s, total=6.619 s; sync >> files=22, longest=0.002 s, average=0.001 s; distance=776 kB, estimate=56426 >> kB >> 2<REDACTED>:<REDACTED> GMT [186889]: [866-1] db=,user=,app=,client= >> LOCATION: LogCheckpointEnd, xlog.c:6202 >> 2<REDACTED>:<REDACTED> GMT [2439188]: [7-1] >> db=documentation-database,user=documentation-database-user,app=PostgreSQL >> JDBC Driver,client=<REDACTED> LOG: 00000: disconnection: session time: >> 0:<REDACTED> user=documentation-database-user >> database=documentation-database host=<REDACTED> port=56170 >> >> >> @Laurenz >> >> I guess you are referring to >> https://www.cybertec-postgresql.com/en/why-does-my-pg_wal-keep-growing/ >> >> *Yes, that is the one.* >> >> I listed all the reasons I know for your predicament. >> Did you do some research along these lines? >> >> *I've had a look at the things that you have mentioned in the guide. * >> >> If yes, what did you find? >> >> *I've not managed to test the queries out yet. But I am planning to test >> out in my lab environment.* >> *Sorry am really cautious about this as those are the main production >> databases.* >> >> *Hope the above is going to give a bit of insight on the root cause of >> the problem.* >> >> >> >> Yours, >> Laurenz Albe >> >> >> >> On Wed, Jan 22, 2025 at 6:03 PM Adrian Klaver <[email protected]> >> wrote: >> >>> On 1/22/25 09:33, Paul Brindusa wrote: >>> > Good afternoon, >>> > >>> > Following below we are facing a similar issue and im getting a real >>> buzz >>> > to get this working myself, speaking to my DBA in the company has >>> > actually left me a bit cold as he is not good with postgres. >>> > >>> > So I want to try and get a solution for this and fix this issue with >>> the >>> > pg_wal files filling up the drive at a ridiculous rate. I have been >>> > manually moving logs to a different directory but have had no luck in >>> > finding an actual solution. >>> > >>> > The cluster is a 3 node cluster with HA which is running wirth patroni. >>> > >>> > Please help me out, I will mention that I have test cluster spun up in >>> > case something needs testing. >>> > >>> > Also want to give a shout out to Lorenz Albe's for posting stuff about >>> > wal files on his company blog. >>> > >>> > Again any help will be greatly appreciated. >>> >>> A good deal more information is needed to troubleshoot this: >>> >>> 1) Postgres version(s). >>> >>> 2) The Patroni version. >>> >>> 3) The Patroni configuration. >>> >>> 4) Definition of 'ridiculous rate'. >>> >>> 5) Relevant information from the logs. >>> >>> > >>> > >>> > " On one of our postgres instances we have the pg_wal/data folder up >>> to >>> > 196GB, out of 200GB disk filled up. >>> > This has stopped the posgresql.service this morning causing two >>> > applications to crash. >>> > Unfortunately our database admin is on leave today, and we are trying >>> to >>> > figure out how to get the disk down? >>> > Any ideas or suggestions are more than welcome. >>> > >>> > Thank you in advance." >>> > >>> > >>> > -- >>> > Kind Regards, >>> > Paul Brindusa >>> > [email protected] <mailto:[email protected]> >>> > >>> >>> -- >>> Adrian Klaver >>> [email protected] >>> >>> >> >> -- >> Kind Regards, >> Paul Brindusa >> [email protected] >> >> -- Kind Regards, Paul Brindusa [email protected] ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Return of the pg_wal issue.. @ 2025-01-23 16:47 Adrian Klaver <[email protected]> parent: Paul Brindusa <[email protected]> 1 sibling, 0 replies; 5+ messages in thread From: Adrian Klaver @ 2025-01-23 16:47 UTC (permalink / raw) To: Paul Brindusa <[email protected]>; +Cc: pgsql-general On 1/23/25 03:40, Paul Brindusa wrote: > Hopefully the below is going to give a little bit more insight on the issue. > I will mention as well that the cluster also replicates data to another > mysql database if it's relevant at all. Yeah, how is that done? > Also worth noting this is our production cluster and we have another > pre-production cluster with basically the same settings and the issue > there does not occur. And the difference in settings is? > -- > Kind Regards, > Paul Brindusa > [email protected] <mailto:[email protected]> > -- Adrian Klaver [email protected] ^ permalink raw reply [nested|flat] 5+ messages in thread
end of thread, other threads:[~2025-01-23 16:47 UTC | newest] Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2025-01-22 18:03 Re: Return of the pg_wal issue.. Adrian Klaver <[email protected]> 2025-01-23 11:40 ` Paul Brindusa <[email protected]> 2025-01-23 14:26 ` Saul Perdomo <[email protected]> 2025-01-23 14:40 ` Paul Brindusa <[email protected]> 2025-01-23 16:47 ` Adrian Klaver <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox