Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbne9-001VuE-5j for pgsql-admin@arkaria.postgresql.org; Wed, 07 Aug 2024 20:52:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sbne7-00A47c-DR for pgsql-admin@arkaria.postgresql.org; Wed, 07 Aug 2024 20:52:35 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbne6-00A47U-So for pgsql-admin@lists.postgresql.org; Wed, 07 Aug 2024 20:52:35 +0000 Received: from qs51p00im-qukt01072502.me.com ([17.57.155.15]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbne3-003bHM-VE for pgsql-admin@lists.postgresql.org; Wed, 07 Aug 2024 20:52:33 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=icloud.com; s=1a1hai; t=1723063950; bh=yZUbrJ0CNHWAol9gH2QDKBOOSHXFUgSVYNkaDIPAH/k=; h=From:Message-Id:Content-Type:Mime-Version:Subject:Date:To; b=O6y5tmWtuSmqaA3d4HdXlDQvN4JXstRzxfMXcVQYkO4gKpgD9I6xz/ZM6klviXuwU RCs1hBNLwtVbt4l93OT+zoIdNEJ1qOdC53hk8vndKh7NcNc+rSLEuv8cXMxmaigqna gJr2NmewbWkvwYu3SkbOSQp0nBpSyt8+dSU6iKo2s7aKpJB8wLwOZjOTncfys8ANRr Ph5726hTaXiOlDxGipO/ZuGC8atDlx+EoH5Z0Cg3K9cmYKhjtvlnDoBcDf84AjDHXC UE2BOuCwMLl10uPaUStwiKqEVNqe2M0onJH0evNG/3KnnBPhQ4G8AwNEptoUdDUkeJ EsSdPOGW7Q0dg== Received: from smtpclient.apple (qs51p00im-dlb-asmtp-mailmevip.me.com [17.57.155.28]) by qs51p00im-qukt01072502.me.com (Postfix) with ESMTPSA id 0126B6EC02C5; Wed, 7 Aug 2024 20:52:28 +0000 (UTC) From: Rui DeSousa Message-Id: <15620280-DD56-4868-AE03-A39642D1CAFA@icloud.com> Content-Type: multipart/alternative; boundary="Apple-Mail=_E5BC3B4D-0AB2-45F8-8736-8E2CD5D2C56F" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3696.120.41.1.8\)) Subject: Re: Schedule pg_repack job with pg_cron Date: Wed, 7 Aug 2024 16:52:25 -0400 In-Reply-To: Cc: Pgsql-admin To: Ron Johnson References: <69C14B4B-0DED-442A-BD74-3CABA4DD64BD@icloud.com> X-Mailer: Apple Mail (2.3696.120.41.1.8) X-Proofpoint-GUID: 4ySbnvS6Zmodi-bK43U9igEzQC0xaZHk X-Proofpoint-ORIG-GUID: 4ySbnvS6Zmodi-bK43U9igEzQC0xaZHk X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1039,Hydra:6.0.680,FMLib:17.12.28.16 definitions=2024-08-07_11,2024-08-07_01,2024-05-17_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 mlxlogscore=999 bulkscore=0 phishscore=0 malwarescore=0 mlxscore=0 suspectscore=0 adultscore=0 spamscore=0 clxscore=1015 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2308100000 definitions=main-2408070145 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_E5BC3B4D-0AB2-45F8-8736-8E2CD5D2C56F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Aug 7, 2024, at 3:39 PM, Ron Johnson = wrote: >=20 > On Wed, Aug 7, 2024 at 3:29=E2=80=AFPM Rui DeSousa = > wrote: >> On Aug 7, 2024, at 9:24 AM, jacob ndinkwa > wrote: >>=20 >> To schedule a pg_repack job using pg_cron on an Amazon RDS for = PostgreSQL instance, you need to follow a few steps. However, it=E2=80=99s= important to note that pg_cron is only supported on certain versions of = Amazon RDS, and pg_repack is also a separate extension that must be = installed and enabled. >>=20 >=20 > Is scheduling pg_repack just a bad idea and just introducing just more = bloat? Why not just tune auto vacuum? >=20 > 80/20 rule=E2=80=A6 most schemas are going to have their large/hot = tables, etc and data has a natural life cycle. If you have a heathy = application then bloat is not an issue as free space is used by new = tuples. Each database has a data flow to it depending on the maturity = and nature of the application/database. Exiting tuples make room for new = tuples, etc. >=20 > If your have to vacuum full / pg_repack your tables on a scheduled = bases then I think there is something very wrong with your application. >=20 > Pg_repack will do more harm in the long run. i.e. the entire time = pg_repack is running xmin is frozen thus creating more bloat everywhere = else!=20 >=20 > Bloat is overrated; especially in a transaction system where all your = data access patterns should be well defined and not doing full table = scans. Just focus on identifying bloated indexes periodically and = rebuilding those. There should be no need to vacuum full tables under = normal circumstances. >=20 > Part of a properly-maintained system is regularly archive/purging = (whether that be dropping date-based partitions, or deleting old data = from unpartitioned tables or tables partitioned by something other than = a date). >=20 > For example, I gave a list of tables (all intertwined via FK = constraints) to the application support people, and they returned the = list stating how many weeks or months of data to retain in each table. = Every Saturday night a cron job goes through and deletes the old data = from, and then "manually" vacuum-analyzes them. >=20 > No bloat... >=20 > --=20 > Death to America, and butter sauce. > Iraq lobster! It really comes down to life cycle of the given data and I don=E2=80=99t = think I would say regularly archive/purging data is a requirement. I = have worked on all these type of systems; i.e.: 1. Regulatory applications where the transaction needed to be keep in = the system for at least 7 to 10 years.=20 2. Application where legally the data needed to be removed from the = system within 3 months (licensed data); new data was entering the system = just as fast as it was exiting daily (total of 2TB of data). 3. A highly transactional system where the data was purged within 30 = days; however, it was steamed in realtime to a datamart where was keep = forever and no purge cycle was defined. Having a cron job to run vacuum-analyze manually is a security blanket. = It is not needed if auto vacuum is tuned; however, it doesn=E2=80=99t = hurt and it is a good CYA plan. =20 Needing to run Vacuum FULL on a regular bases is a symptom to a larger = underling issue with the system and that=E2=80=99s my entire point.=20= --Apple-Mail=_E5BC3B4D-0AB2-45F8-8736-8E2CD5D2C56F Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8

On Aug 7, 2024, at 3:39 PM, Ron Johnson <ronljohnsonjr@gmail.com> wrote:

On Wed, Aug 7, 2024 at 3:29=E2=80=AF= PM Rui DeSousa <rui.desousa@icloud.com> wrote:
On Aug 7, 2024, at 9:24 AM, jacob ndinkwa = <jndinkwa@gmail.com> wrote:

To = schedule a pg_repack job using pg_cron on an Amazon RDS for PostgreSQL = instance, you need to follow a few steps. However, it=E2=80=99s = important to note that pg_cron is only supported on certain versions of = Amazon RDS, and pg_repack is also a separate extension that must be = installed and enabled.


Is scheduling pg_repack just a bad idea and = just introducing just more bloat? Why not just tune auto = vacuum?

80/20 = rule=E2=80=A6 most schemas are going to have their large/hot tables, etc = and data has a natural life cycle.  If you have a heathy = application then bloat is not an issue as free space is used by new = tuples.  Each database has a data flow to it depending on the = maturity and nature of the application/database. Exiting tuples make = room for new tuples, etc.

If your have to vacuum full / pg_repack your tables on a = scheduled bases then I think there is something very wrong with your = application.

Pg_repack will do more harm in the long run.  i.e. the = entire time pg_repack is running xmin is frozen thus creating more bloat = everywhere else! 

Bloat is overrated; especially in a transaction system where = all your data access patterns should be well defined and not doing full = table scans. Just focus on identifying bloated indexes periodically and = rebuilding those.  There should be no need to vacuum full tables = under normal circumstances.

Part of a properly-maintained system is = regularly archive/purging (whether that be dropping = date-based partitions, or deleting old data from unpartitioned tables or = tables partitioned by something other than a date).

For example, I gave a list of tables = (all intertwined via FK constraints) to the application support people, = and they returned the list stating how many weeks or months of data to = retain in each table.  Every Saturday night a cron job goes through = and deletes the old data from, and then "manually" vacuum-analyzes = them.

No = bloat...

--
Death to America, and butter sauce.
Iraq = lobster!


It really comes down to = life cycle of the given data and I don=E2=80=99t think I would say = regularly archive/purging data is a requirement.  I have worked on = all these type of systems;

i.e.:

1. Regulatory applications where the transaction needed to be = keep in the system for at least 7 to 10 years. 
2. Application where legally the data needed to be removed = from the system within 3 months (licensed data); new data was entering = the system just as fast as it was exiting daily (total of 2TB of = data).
3. A highly transactional system where the = data was purged within 30 days; however, it was steamed in realtime to a = datamart where was keep forever and no purge cycle was = defined.

Having = a cron job to run vacuum-analyze manually is a security blanket. =  It is not needed if auto vacuum is tuned; however, it doesn=E2=80=99= t hurt and it is a good CYA plan.  

Needing to run Vacuum FULL on a regular = bases is a symptom to a larger underling issue with the system and = that=E2=80=99s my entire point. 
= --Apple-Mail=_E5BC3B4D-0AB2-45F8-8736-8E2CD5D2C56F--