Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sTZls-0006vA-E6 for pgsql-general@arkaria.postgresql.org; Tue, 16 Jul 2024 04:26:36 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sTZlq-00FD71-Ld for pgsql-general@arkaria.postgresql.org; Tue, 16 Jul 2024 04:26:34 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sTZlq-00FD6t-8Z for pgsql-general@lists.postgresql.org; Tue, 16 Jul 2024 04:26:34 +0000 Received: from mail-vs1-xe2f.google.com ([2607:f8b0:4864:20::e2f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sTZlo-002LvZ-0e for pgsql-general@lists.postgresql.org; Tue, 16 Jul 2024 04:26:33 +0000 Received: by mail-vs1-xe2f.google.com with SMTP id ada2fe7eead31-48fe6ddc33bso1938924137.1 for ; Mon, 15 Jul 2024 21:26:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1721103989; x=1721708789; darn=lists.postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=cSg6TaTxtdG6NFC9nLXP+EzxHHEMVWn6Dw0uwg0iX4c=; b=WaBakbw30ojAG4I+bmlrW5jpJ3+/6Va4r/QBbdsmimSI+CsKMlmDXMkuvY7GtAX+4L T0icGJUC6swpBzAUYY3Q3ABFzNeKz5eNFwZRO899ovFQ9xsysporrWOQWZ5F4Mbjs1HB zTSH/9d8pLQlacdZZGPTIf06cbDrUhGN3pkwDGbInTZa1PPB7OiMLYb1KCpGWLjWIq/R p7NP4P5xkN6iUYdQfLZpFJsbPEFtWyIgmknG2ZbIAICMKqFReZN+4BVY+wtQQikPDzdh gej+ShhBwezFE/naR59+7XZQjxBrIfcnych1WY69T+Aok7RUyKh+cQbdzYgPjV3Wk+je vRaw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721103989; x=1721708789; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=cSg6TaTxtdG6NFC9nLXP+EzxHHEMVWn6Dw0uwg0iX4c=; b=qL/QUuYDKtJ/G5VgVsFL/E3i0dEsN8cMOdE2ClYkCVqjaqWsE7y1nv18EFYu5tmsjG 92uv/juMM9QVREjch0GXDSW0nncVpzYYG07kqu4srh2jLPd2PJ/qiWCQ6iZYDLh+viy7 099L/cTCxCofOsCbcH/BxHVHP7xaWv4VFkZlMVxdMW1vx5oiu4RqTD5ja8yPfBrHKF2k pI0N0iuaJADRY1xmE94tCNEYU4cPJoNFLpmkmEOmrNHFLw0f0XbySXCC6jL8QXidTghM BrvZKnS3TtjpL2QgN9AA39f0dzoMag0foBCJbihrY3FtGo+mViJz4PuL+Lnmmv085RtV T1gA== X-Gm-Message-State: AOJu0YxVTD6ZFauSRl7NZOzz4sP1a1g3rvU+sDImaf3y8wnRxCu+G9Ac 0NyF9aP7+HqXq7Ab1UDUi8sItrgv0dXYUrdYKST1hNWxDqUBXoNx1G5uXoEpr8UGKZLUKt4joGf A6PBirfHzooxw38jDHWzYFVEhfQn8Vw== X-Google-Smtp-Source: AGHT+IF5PdsK08sUK+KaU+yZChgFe3ngzSVnsLBdJoW878p4yz/bcwt4Qpb9deOUQliboXnCkQ9LZOaKque5p+SGVN4= X-Received: by 2002:a05:6102:4b84:b0:48f:e69d:5953 with SMTP id ada2fe7eead31-4914c4e9f2amr1339371137.19.1721103989245; Mon, 15 Jul 2024 21:26:29 -0700 (PDT) MIME-Version: 1.0 References: <20240713220545.cgjghaggksov3xkt@hjp.at> <20240715142744.7e7a3wtdfxinochf@hjp.at> <20240716003731.m75sye7qq74wadhe@hjp.at> In-Reply-To: <20240716003731.m75sye7qq74wadhe@hjp.at> From: sud Date: Tue, 16 Jul 2024 09:56:16 +0530 Message-ID: Subject: Re: Dropping column from big table To: pgsql-general@lists.postgresql.org, "Peter J. Holzer" Content-Type: multipart/alternative; boundary="000000000000e68d8d061d55bf51" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e68d8d061d55bf51 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Tue, Jul 16, 2024 at 6:07=E2=80=AFAM Peter J. Holzer = wrote: > > > But the only issue would be "VACUUM FULL" will take a table lock and > also it > > may take longer to run this vacuum on the full table considering the > size of > > the table in TB's. Thus, is it fine to just leave it post execution of > the > > "update" statement , so that the normal vacuum operation (which will be > online > > operation) and that will take care of the removal of old rows ? > > This is unlikely to recover the space. > > The UPDATE will duplicate all rows. Since - presumably - there isn't > much free space within each partition the new rows will go at the end of > each partition, effectively doubling its size. > > A regular VACUUM (whether autovacuum or invoked manually) will then > remove the old rows. and make the space available for new data. But > since that newly free space is at the beginning of each partition it > can't be returned to the OS. It would be available for new data written > to those partitions I guess not much is written to old partitons. > > You could, however, do this in small steps and vacuum after each. > Something like this (in Python) > > conn =3D ... > csr =3D conn.cursor() > for i in range(100): > csr.execute("UPDATE tab set id =3D id WHERE id % 100 =3D %s", (i,= )) > conn.commit() > csr.execute("VACUUM tab") > conn.commit() > > That might just be able to squeeze the new rows in between the existing > rows and not grow the table. > > > > And also, As you mentioned we may also leave it as is and wait for the > > partition to be dropped, so that the dropped column with "not null" > values > > which are still lying under the hood and are occupying space will be > removed > > automatically. But even then, is that dropped column still lying in the > rows > > with null values in it throughout its lifecycle, till the table exists > in the > > database? > > Yes. But a nullable column with a null value takes only a single bit of > storage, so that's negligible. > > > Thank you so much. Normal vacuum marks the space occupied by the dead tuples as free or reusable but vacuum full removes those completely. However even with "vacuum full", the old rows will be removed completely from the storage , but the new rows will always be there with the 'dropped' column still existing under the hood along with the table storage, with just carrying "null" values in it. However, as it's a single bit of storage so will be having negligible overhead. If we want to fully remove that column from the table , we may have to create a new table and dump the data into that from the existing table and then rename it back to old. Is this understanding correct? --000000000000e68d8d061d55bf51 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Tue, Jul 16, 2024 at 6:07=E2=80=AFAM P= eter J. Holzer <hjp-pgsql@hjp.at= > wrote:

> But the only issue would be "VACUUM FULL" will take a table = lock and also it
> may take longer to run this vacuum on the full table considering the s= ize of
> the table in TB's. Thus, is it fine to just leave it post executio= n of the
> "update" statement , so that the normal vacuum operation (wh= ich will be online
> operation) and that will take care of the removal of old rows ?

This is unlikely to recover the space.

The UPDATE will duplicate all rows. Since - presumably - there isn't much free space within each partition the new rows will go at the end of each partition, effectively doubling its size.

A regular VACUUM (whether autovacuum or invoked manually) will then
remove the old rows. and make the space available for new data. But
since that newly free space is at the beginning of each partition it
can't be returned to the OS. It would be available for new data written=
to those partitions I guess not much is written to old partitons.

You could, however, do this in small steps and vacuum after each.
Something like this (in Python)

=C2=A0 =C2=A0 conn =3D ...
=C2=A0 =C2=A0 csr =3D conn.cursor()
=C2=A0 =C2=A0 for i in range(100):
=C2=A0 =C2=A0 =C2=A0 =C2=A0 csr.execute("UPDATE tab set id =3D id WHER= E id % 100 =3D %s", (i,))
=C2=A0 =C2=A0 =C2=A0 =C2=A0 conn.commit()
=C2=A0 =C2=A0 =C2=A0 =C2=A0 csr.execute("VACUUM tab")
=C2=A0 =C2=A0 =C2=A0 =C2=A0 conn.commit()

That might just be able to squeeze the new rows in between the existing
rows and not grow the table.


> =C2=A0And also, As you mentioned we may also leave it as is and wait f= or the
> partition to be dropped, so that the dropped column with "not nul= l" values
> which are still lying under the hood and are occupying space will be r= emoved
> automatically. But even then, is that dropped column still lying in th= e rows
> with null values in it throughout its lifecycle, till the table exists= in the
> database?

Yes. But a nullable column with a null value takes only a single bit of
storage, so that's negligible.

=C2=A0 =C2=A0=C2=A0

Thank you so mu= ch.

Normal vacuum marks the space occupied by the = dead tuples as free or reusable but vacuum full removes those completely. H= owever even with "vacuum full", the old rows will be removed comp= letely from the storage , but the new rows will always be there with the &#= 39;dropped' column still existing under the hood along with the table s= torage,=C2=A0 with just carrying "null" values in it. However,=C2= =A0 as it's a single bit of storage so will be having negligible overhe= ad. If we want to fully remove that column from the table , we may have to = create a new table and dump the data into that from the existing table and = then rename it back to old. Is this understanding correct?
--000000000000e68d8d061d55bf51--