Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sAp2x-00CX5f-3e for pgsql-general@arkaria.postgresql.org; Sat, 25 May 2024 10:54:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sAp2x-00EeCs-4C for pgsql-general@arkaria.postgresql.org; Sat, 25 May 2024 10:54:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sAp2w-00EeCh-Ov for pgsql-general@lists.postgresql.org; Sat, 25 May 2024 10:54:42 +0000 Received: from uucp.dinoex.org ([2a0b:f840::12]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sAp2r-000gc1-0A for pgsql-general@lists.postgresql.org; Sat, 25 May 2024 10:54:41 +0000 Received: from uucp.dinoex.org (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]) by uucp.dinoex.org (8.18.1/8.18.1) with ESMTPS id 44PAs6UK091455 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sat, 25 May 2024 12:54:07 +0200 (CEST) (envelope-from pmc@citylink.dinoex.sub.org) ARC-Seal: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1716634449; cv=none; b=VmhPR3+qN6pSK/zR1JokvY9Cd9FVc8Mx7t3TGM13yF9hKCcZJQgVbLNt9CMasFbzncGl6ClqASyMlAAG7DmjNwkSZtbUTtm6w+4j2CtdAKjtHEyT+dzrP3y85TLVMWxH757QpGM+U4D0XW0mu4cKVPAp9jksFQyOdjzIn0HnNV8= ARC-Message-Signature: i=1; a=rsa-sha256; d=uucp.dinoex.org; s=M20221114; t=1716634449; c=relaxed/simple; bh=fTA3u4RZUubi7UKY1Mp/SJEKLydE99JEUPanOASqfyk=; h=Received:Received:Received:Received:X-Authentication-Warning:Date: From:To:Subject:Message-ID:MIME-Version:Content-Type: Content-Disposition:X-Milter:X-Greylist; b=jkDxCaw98MA9Vnfr0PNHI5H/ydPvo1+P5nQSF4cUy8n0Oqzx8fY17T/t5+fHHr2ONaWdh0yqDvHBNRIqbuF+Q6R0vlk0qvCArSHAr5RQO7ZcSi5uJXjLQQBZYz1NIcNYv0ufyV8Y4CPnM07fcvvvvlBdO7vZIjG/tFNidbkG0kI= ARC-Authentication-Results: i=1; uucp.dinoex.org X-MDaemon-Deliver-To: Received: (from uucp@localhost) by uucp.dinoex.org (8.18.1/8.18.1/Submit) with UUCP id 44PAs6uO091453 for pgsql-general@lists.postgresql.org; Sat, 25 May 2024 12:54:06 +0200 (CEST) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (disp-e.intra.daemon.contact [IPv6:fd00:0:0:0:0:0:0:112]) by admn.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 44PAqsdw056188 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=OK) for ; Sat, 25 May 2024 12:52:54 +0200 (CEST) (envelope-from pmc@citylink.dinoex.sub.org) Received: from disp.intra.daemon.contact (localhost [127.0.0.1]) by disp.intra.daemon.contact (8.18.1/8.18.1) with ESMTPS id 44PAppC6044827 (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384 bits=256 verify=NO) for ; Sat, 25 May 2024 12:51:51 +0200 (CEST) (envelope-from pmc@citylink.dinoex.sub.org) Received: (from pmc@localhost) by disp.intra.daemon.contact (8.18.1/8.18.1/Submit) id 44PApob7044826 for pgsql-general@lists.postgresql.org; Sat, 25 May 2024 12:51:50 +0200 (CEST) (envelope-from pmc@citylink.dinoex.sub.org) X-Authentication-Warning: disp.intra.daemon.contact: pmc set sender to pmc@citylink.dinoex.sub.org using -f Date: Sat, 25 May 2024 12:51:50 +0200 From: Peter To: pgsql-general@lists.postgresql.org Subject: Autovacuum endless loop in heap_page_prune()? Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline X-Milter: Spamilter (Reciever: uucp.dinoex.org; Sender-ip: 0:0:2a0b:f840::; Sender-helo: uucp.dinoex.org;) X-Greylist: Sender passed SPF test, not delayed by milter-greylist-4.6.4 (uucp.dinoex.org [IPv6:2a0b:f840:0:0:0:0:0:12]); Sat, 25 May 2024 12:54:09 +0200 (CEST) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Good morning, I just found Autovacuum run for 6 hours on a 8 GB table, VACUUM query doesnt cancel, cluster doesn't stop, autovacuum worker is not killable, truss shows no activity, after kill -6 this backtrace: * thread #1, name = 'postgres', stop reason = signal SIGABRT * frame #0: 0x0000000000548063 postgres`HeapTupleSatisfiesVacuumHorizon + 531 frame #1: 0x000000000054aed9 postgres`heap_page_prune + 537 frame #2: 0x000000000054e38a postgres`heap_vacuum_rel + 3626 frame #3: 0x00000000006af382 postgres`vacuum_rel + 626 frame #4: 0x00000000006aeeeb postgres`vacuum + 1611 frame #5: 0x00000000007b4664 postgres`do_autovacuum + 4292 frame #6: 0x00000000007b2342 postgres`AutoVacWorkerMain + 866 frame #7: 0x00000000007b1f97 postgres`StartAutoVacWorker + 39 frame #8: 0x00000000007ba0df postgres`sigusr1_handler + 783 frame #9: 0x00000008220da627 libthr.so.3`___lldb_unnamed_symbol683 + 215 frame #10: 0x00000008220d9b1a libthr.so.3`___lldb_unnamed_symbol664 + 314 frame #11: 0x00007ffffffff913 frame #12: 0x00000000007bba25 postgres`ServerLoop + 1541 frame #13: 0x00000000007b9467 postgres`PostmasterMain + 3207 frame #14: 0x000000000071a566 postgres`main + 758 frame #15: 0x00000000004f9995 postgres`_start + 261 After restart, no problems reported yet. Storyline: this is the file-list table of my backup/archive system, contains ~50 mio records. Recently I found a flaw in the backup system, so that some old records weren't removed. I wrote a script to do this, that script did run first at 04:15 and reported it had now removed a lot of old data. I looked into pgadmin4 and it reported 9 mio dead tuples. I wondered why autovacuum wouldn't run, and the stats showed that regular daily vacuum had last run at 03:15 and autovacuum at 03:18. I wondered why it wouldn't start again, and went to sleep. Having consumed 300 minutes cpu now at 11:00, it did start at around 05:00. No messages of hardware errors, good old Haswell Xeon/EP with all ECC. Could there be a compute error somewhere, and does autovacuum lock out ordinary kill signals during some time? We're at Rel. 15.6 on FreeBSD 13.3 *** Changed values: shared_buffers = 40MB temp_buffers = 20MB work_mem = 50MB maintenance_work_mem = 50MB max_stack_depth = 40MB dynamic_shared_memory_type = posix max_files_per_process = 200 effective_io_concurrency = 5 synchronous_commit = off wal_sync_method = fsync full_page_writes = off wal_compression = on wal_init_zero = off wal_writer_delay = 2000ms checkpoint_timeout = 180min checkpoint_completion_target = 0.0 max_wal_size = 2GB archive_mode = on archive_timeout = 86400 seq_page_cost = 0.5 random_page_cost = 0.7 effective_cache_size = 1GB default_statistics_target = 1000 autovacuum = on autovacuum_naptime = 5min autovacuum_vacuum_scale_factor = 0.01 autovacuum_analyze_scale_factor = 0.05