Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uAqmx-002x1P-0M for pgsql-novice@arkaria.postgresql.org; Fri, 02 May 2025 13:50:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1uAqmt-009CMB-Um for pgsql-novice@arkaria.postgresql.org; Fri, 02 May 2025 13:50:49 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1uAqmt-009CM3-8U for pgsql-novice@lists.postgresql.org; Fri, 02 May 2025 13:50:48 +0000 Received: from mail-pf1-x431.google.com ([2607:f8b0:4864:20::431]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1uAqms-000h8f-00 for pgsql-novice@lists.postgresql.org; Fri, 02 May 2025 13:50:47 +0000 Received: by mail-pf1-x431.google.com with SMTP id d2e1a72fcca58-7394945d37eso1778952b3a.3 for ; Fri, 02 May 2025 06:50:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1746193845; x=1746798645; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=JDBvHAcri1gh+UlOodt8Ut8IiS8BJeoSw7/ctdevJac=; b=fNyPMqek9KF/FPq7SbN9hrrvZEc0Y+g9+5fKV6Stx904uNz+daCSp7JhdzolLyUWhG Q2cJlgDEatLsjh/L2ZGxjKziIJzcj5NiSpThVvmL2MLspt0SSQla8v/ZVc6C8HjIf92i 5vjVf55BAUmb3VGW9u/2pt4RME5WrcQcTVHg0rYIavOxioO34l2AJytvqwiNAVkh1zsi p9h5SpSNPO5EJ0l4pU6i0Sh0sPLRYYcA7fn4Iwhy/9bvzm+OCUsl2sDhBbFu0TkkDilS AjELwzh/uwX2uoJrEUcgBbdXLcS78G5AmWdA6oqZ42F6MzvENnUhflm1Xaq2gtkWxbN/ AWEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746193845; x=1746798645; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=JDBvHAcri1gh+UlOodt8Ut8IiS8BJeoSw7/ctdevJac=; b=j2MGSS0JkMWZy2Wcs0cOc0Ulq0Y0DvlB7jaxRWYg8fCXWKrf7fGnxQEAGbSwGz7K5u ILFG6VoCkH3eA/w6naPsqNuBNARDS7nZQptthepFX2sesEm4VepbGQ/RO3WPFNS49kDw kRvtlYLRwy1hRKEBOABIH6Ifj7pkrqveBPURhBmtfVSUiBOMXHGaT3QEzrbHtdoi96kQ FDsfGXY1X38SDaU+uFwpxwX8fYCEMI+TPt+yhFsuLwBevR/zKiR3rZ6cHBqv0jO3/C6i epKBz0Nxh2TRUpQbxIEovo0TVM4A+Xyhy8/nFROpyqyKMy84kq/wyfNltR/fzsyUTiJv /a5Q== X-Gm-Message-State: AOJu0YwcR+LJBizkbq6mlv6zwxq/yUqvOOP4xEFCn+RBqIUv7AnebsI9 V4RwgvkjLwT53oHWLi54l5dCPAqerIp+WtdEWJUS/GlX+xnI9bfX1Oqo/K7cx9Y5hpFYhxq8pFJ FpQYqNMfNJ4EosXOl1GH/2QQDD2GQNTXz X-Gm-Gg: ASbGnctKPEcZbjhsnzj8/euq52rmzGtv0M6jl1PQ0F6jlp6pwwLVCwfQ+LNA18Ne2NM bOhafRLfdjt0AmNzWKus7QuMPOB6sk/+g32dc2xy4mXLXNreXw/WFdD60eaj/5Ns2u1Zid70j8t bQ1hdHd+AyGuq5V0ezRCHWmw== X-Google-Smtp-Source: AGHT+IHQgYwTRqNoDMomCytU8ireFVblZyYOe48pMDzE20YmduLxp62iI1WfmqvGlWjfpTxch1mthjw8gY7qC5+xeWU= X-Received: by 2002:a17:90b:41:b0:2ee:8e75:4aeb with SMTP id 98e67ed59e1d1-30a4e5ce2c1mr5358024a91.17.1746193844872; Fri, 02 May 2025 06:50:44 -0700 (PDT) MIME-Version: 1.0 From: Leo Date: Fri, 2 May 2025 09:50:08 -0400 X-Gm-Features: ATxdqUGQX-ioo1YA8A3DoXppBleXWcAzV5pzwz6RVmq5_qWAH3SK01qXmLsnIv8 Message-ID: Subject: Vacuum Questions To: pgsql-novice@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000d52b1c0634276f21" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000d52b1c0634276f21 Content-Type: text/plain; charset="UTF-8" I have been working on AWS PostgreSQL RDS for a few years, but still not very experienced when it comes to performance issues. Plus RDS is slightly different from the pure PostgreSQL. I am trying to comprehend exactly how vacuum works. Here is what I am trying to do. I am purging old records from a table (500 million rows, but I am doing it in sets of 50,000,000 with a smaller loop of 100,000). That works just fine. Because of the amount of data/rows deleted, I disabled the autovacuum for this table (I want to have control over vacuum, autovacuum does not complete anyway due to the timeout, sizing, etc settings that I do not want to change system wide). I will put the autovacuum back once I am done of course. The issue is when I start vacuuming. This table has 4 indexes and a PK that I worry about. The PK takes about 30 minutes to vacuum and two of the indexes take about an hour each. The problem comes in for the other 2 indexes - they take 12+ hours each to vacuum: psql:/home/backup/leo/fws_vacuum.sql:6: INFO: vacuuming "public.pc_workflowlog" psql:/home/backup/leo/fws_vacuum.sql:6: INFO: launched 4 parallel vacuum workers for index vacuuming (planned: 4) psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "pc_workflowlog_pk" to remove 50000000 row versions DETAIL: CPU: user: 191.03 s, system: 12.43 s, elapsed: 1711.22 s psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_u_publicid_g6uqp9lkn6e8" to remove 50000000 row versions DETAIL: CPU: user: 325.75 s, system: 19.75 s, elapsed: 2674.24 s psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_n_workflow_2tc9k2hdtry9v" to remove 50000000 row versions DETAIL: CPU: user: 312.17 s, system: 16.94 s, elapsed: 3097.88 s *psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_n_frozenseti_2kjkbjgf3c6ro" to remove 50000000 row versions* *DETAIL: CPU: user: 41187.70 s, system: 216.14 s, elapsed: 42749.36 s* *psql:/home/backup/leo/fws_vacuum.sql:6: INFO: scanned index "workflowlo_n_userid_14kqw6qdsnndw" to remove 50000000 row versions * *DETAIL: CPU: user: 41280.66 s, system: 216.67 s, elapsed: 42832.16 s* psql:/home/backup/leo/fws_vacuum.sql:6: INFO: "pc_workflowlog": removed 50000000 row versions in 1129870 pages I've increased max_parallel_maintenance_workers to 8 for the session and it used parallel 4 (one for each index I assume) to handle it and the two indexes were done in ~ an hour. What I am trying to figure out is how to force the other two large indexes to be vacuumed in parallel - a few workers going against an index. It seems it is possible to do, the index size is large enough to kick in, but I have not been able to figure it out yet. Most of the parameters are at default values. What am I missing? I have a few other questions. Does vacuum time depend on the number of dead rows only and the size of the table, or does the entire storage allocation (including dead tuples) also affect it? Would it be more beneficial to drop the two large indexes, purge, vacuum, and recreate the indexes after make more sense (I know it needs to be tested)? Lastly, is it better to delete all the rows (500 mil) instead of doing it in smaller batches, and vacuum only once? The current size of the table is about 1T and the indexes add another 1.5T to it. Truncate is not an option as I am only deleting rows older than 6 months. Client was not doing purging for years, but will do it after the clean up. P.S. This is my very first post here, please advise if it is the wrong channel. Thank you in advance. --000000000000d52b1c0634276f21 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I have been working on AWS PostgreSQL RDS for a few years,= but still not very experienced=C2=A0when it comes to performance issues.= =C2=A0 Plus RDS is slightly different from the pure=C2=A0PostgreSQL.
I am trying=C2=A0to comprehend=C2=A0exactly how vacuum works.<= /div>

Here is what I am trying to do.

I am purging old records from a table (500 million rows, but I am do= ing it in sets of=C2=A0 50,000,000 with a smaller loop of 100,000).=C2=A0 T= hat works just fine.

Because of the amount of data= /rows deleted, I disabled=C2=A0the autovacuum for this table (I want to hav= e control over vacuum, autovacuum does not complete anyway due to the timeo= ut, sizing, etc settings that I do not want to change system wide).=C2=A0 I= will put the autovacuum back once I am done of course.

The issue is when I start vacuuming.=C2=A0 This table has 4 indexes a= nd a PK that I worry about.=C2=A0 The PK takes about 30 minutes to vacuum a= nd two of the indexes take about an hour each.=C2=A0 The problem comes in f= or the other 2 indexes - they take 12+ hours each to vacuum:

=

psql:/= home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 vacuuming "public.pc_workflowlog"

psql:/= home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 launched 4 parallel vacuum workers for index vacuumin= g (planned: 4)

psql:/= home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 scanned index "pc_workflowlog_pk" to remove= 50000000 row versions

DETAIL= :=C2=A0 CPU: user: 191.0= 3 s, system: 12.43 s, elapsed: 1711.22 s

psql:/= home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 scanned index "workflowlo_u_publicid_g6uqp9lkn6e= 8" to remove 50000000 row versions

DETAIL= :=C2=A0 CPU: user: 325.7= 5 s, system: 19.75 s, elapsed: 2674.24 s

psql:/= home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 scanned index "workflowlo_n_workflow_2tc9k2hdtry= 9v" to remove 50000000 row versions

DETAIL= :=C2=A0 CPU: user: 312.1= 7 s, system: 16.94 s, elapsed: 3097.88 s

psq= l:/home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 scanned index "workflowlo_n_frozenseti_2kjkbj= gf3c6ro" to remove 50000000 row versions

DET= AIL:=C2=A0 CPU: user: 41= 187.70 s, system: 216.14 s, elapsed: 42749.36 s

psq= l:/home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 scanned index "workflowlo_n_userid_14kqw6qdsn= ndw" to remove 50000000 row versions=C2=A0

DET= AIL:=C2=A0 CPU: user: 41= 280.66 s, system: 216.67 s, elapsed: 42832.16 s

psql:/= home/backup/leo/fws_vacuum.sql:6: INFO:=C2=A0 "pc_workflowlog": removed 50000000 row vers= ions in 1129870 pages


I've increase= d=C2=A0max_parallel_maintenance_workers to 8 for the session and it used pa= rallel 4 (one for each index I assume) to handle it and the two indexes wer= e done in ~ an hour.=C2=A0 What I am trying to figure out is how to force t= he other two large indexes to be vacuumed in parallel - a few workers going= against an index.=C2=A0 It seems it is possible to do, the index size is l= arge enough to kick in, but I have not been able to figure it out yet.=C2= =A0 Most of the parameters are at default values.

= What am I missing?

I have a few other questions.= =C2=A0 Does vacuum=C2=A0time depend on the number of dead rows only and the= size of the table, or does the entire storage allocation (including dead t= uples) also affect it?

Would it be more beneficial= to drop the two large indexes, purge, vacuum, and recreate the indexes aft= er make more sense (I know it needs to be tested)?

Lastly, is it better to delete all the rows (500 mil) instead of doing it = in smaller batches, and vacuum only once?

The curr= ent size of the table is about 1T and the indexes add another 1.5T to it.

Truncate is not an option as I am only deleting row= s older than 6 months.=C2=A0 Client was not doing purging for years, but wi= ll do it after the clean up.

P.S. This is my very = first post here, please advise if it is the wrong channel.=C2=A0 Thank you = in advance.
--000000000000d52b1c0634276f21--