public inbox for [email protected]help / color / mirror / Atom feed
Re: Dropping column from big table 7+ messages / 3 participants [nested] [flat]
* Re: Dropping column from big table @ 2024-07-15 20:30 sud <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: sud @ 2024-07-15 20:30 UTC (permalink / raw) To: [email protected]; Laurenz Albe <[email protected]>; Peter J. Holzer <[email protected]> On Mon, Jul 15, 2024 at 7:58 PM Peter J. Holzer <[email protected]> wrote: > > > Hm, true. > > > > You can always do > > > > UPDATE tab SET id = id; > > > > followed by > > > > VACUUM (FULL) tab; > > Yes, that should work. It needs about twice the size of the table in > temporary space, though. > > Since the OP wrote that the table is "daily ... and 90 partitions" > (which understand that there is one partition per day and partitions are > kept for 90 days) it might be better to just wait. After 90 days all the > partitions with the obsolete column will be gone. > > > Thank You very much. As I understand, after dropping a column, it will still internally hold the "NOT NULL" values in that column for all the rows, even though it's not visible outside. So, after the DROP column, it will force update any of the columns as below, Then it will force create another copy of each of the rows even if the column is updated to the same value. The new copy will have the dropped column with values as NULL. And the post "VACUUM FULL '' will clean all the rows with "NOT NULL '' values of that dropped column and thus reclaim the space. But the only issue would be "VACUUM FULL" will take a table lock and also it may take longer to run this vacuum on the full table considering the size of the table in TB's. Thus, is it fine to just leave it post execution of the "update" statement , so that the normal vacuum operation (which will be online operation) and that will take care of the removal of old rows ? UPDATE tab SET id = id; VACUUM (FULL) tab; And also, As you mentioned we may also leave it as is and wait for the partition to be dropped, so that the dropped column with "not null" values which are still lying under the hood and are occupying space will be removed automatically. But even then, is that dropped column still lying in the rows with null values in it throughout its lifecycle, till the table exists in the database? Seems there is no other option exist to drop the column with space reclaimed from the table in immediate effect, other than above discussed. ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Dropping column from big table @ 2024-07-16 00:37 Peter J. Holzer <[email protected]> parent: sud <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: Peter J. Holzer @ 2024-07-16 00:37 UTC (permalink / raw) To: [email protected] On 2024-07-16 02:00:27 +0530, sud wrote: > > On Mon, Jul 15, 2024 at 7:58 PM Peter J. Holzer <[email protected]> wrote: > > Hm, true. > > > > You can always do > > > > UPDATE tab SET id = id; > > > > followed by > > > > VACUUM (FULL) tab; > > Yes, that should work. It needs about twice the size of the table in > temporary space, though. > > Since the OP wrote that the table is "daily ... and 90 partitions" > (which understand that there is one partition per day and partitions are > kept for 90 days) it might be better to just wait. After 90 days all the > partitions with the obsolete column will be gone. > > > > Thank You very much. > > As I understand, after dropping a column, it will still internally hold the > "NOT NULL" values in that column for all the rows, even though it's not visible > outside. > > So, after the DROP column, it will force update any of the columns as below, > Then it will force create another copy of each of the rows even if the column > is updated to the same value. The new copy will have the dropped column with > values as NULL. And the post "VACUUM FULL '' will clean all the rows with "NOT > NULL '' values of that dropped column and thus reclaim the space. Correct. > But the only issue would be "VACUUM FULL" will take a table lock and also it > may take longer to run this vacuum on the full table considering the size of > the table in TB's. Thus, is it fine to just leave it post execution of the > "update" statement , so that the normal vacuum operation (which will be online > operation) and that will take care of the removal of old rows ? This is unlikely to recover the space. The UPDATE will duplicate all rows. Since - presumably - there isn't much free space within each partition the new rows will go at the end of each partition, effectively doubling its size. A regular VACUUM (whether autovacuum or invoked manually) will then remove the old rows. and make the space available for new data. But since that newly free space is at the beginning of each partition it can't be returned to the OS. It would be available for new data written to those partitions I guess not much is written to old partitons. You could, however, do this in small steps and vacuum after each. Something like this (in Python) conn = ... csr = conn.cursor() for i in range(100): csr.execute("UPDATE tab set id = id WHERE id % 100 = %s", (i,)) conn.commit() csr.execute("VACUUM tab") conn.commit() That might just be able to squeeze the new rows in between the existing rows and not grow the table. > And also, As you mentioned we may also leave it as is and wait for the > partition to be dropped, so that the dropped column with "not null" values > which are still lying under the hood and are occupying space will be removed > automatically. But even then, is that dropped column still lying in the rows > with null values in it throughout its lifecycle, till the table exists in the > database? Yes. But a nullable column with a null value takes only a single bit of storage, so that's negligible. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | [email protected] | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!" Attachments: [application/pgp-signature] signature.asc (833B, 2-signature.asc) download ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Dropping column from big table @ 2024-07-16 04:26 sud <[email protected]> parent: Peter J. Holzer <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: sud @ 2024-07-16 04:26 UTC (permalink / raw) To: [email protected]; Peter J. Holzer <[email protected]> On Tue, Jul 16, 2024 at 6:07 AM Peter J. Holzer <[email protected]> wrote: > > > But the only issue would be "VACUUM FULL" will take a table lock and > also it > > may take longer to run this vacuum on the full table considering the > size of > > the table in TB's. Thus, is it fine to just leave it post execution of > the > > "update" statement , so that the normal vacuum operation (which will be > online > > operation) and that will take care of the removal of old rows ? > > This is unlikely to recover the space. > > The UPDATE will duplicate all rows. Since - presumably - there isn't > much free space within each partition the new rows will go at the end of > each partition, effectively doubling its size. > > A regular VACUUM (whether autovacuum or invoked manually) will then > remove the old rows. and make the space available for new data. But > since that newly free space is at the beginning of each partition it > can't be returned to the OS. It would be available for new data written > to those partitions I guess not much is written to old partitons. > > You could, however, do this in small steps and vacuum after each. > Something like this (in Python) > > conn = ... > csr = conn.cursor() > for i in range(100): > csr.execute("UPDATE tab set id = id WHERE id % 100 = %s", (i,)) > conn.commit() > csr.execute("VACUUM tab") > conn.commit() > > That might just be able to squeeze the new rows in between the existing > rows and not grow the table. > > > > And also, As you mentioned we may also leave it as is and wait for the > > partition to be dropped, so that the dropped column with "not null" > values > > which are still lying under the hood and are occupying space will be > removed > > automatically. But even then, is that dropped column still lying in the > rows > > with null values in it throughout its lifecycle, till the table exists > in the > > database? > > Yes. But a nullable column with a null value takes only a single bit of > storage, so that's negligible. > > > Thank you so much. Normal vacuum marks the space occupied by the dead tuples as free or reusable but vacuum full removes those completely. However even with "vacuum full", the old rows will be removed completely from the storage , but the new rows will always be there with the 'dropped' column still existing under the hood along with the table storage, with just carrying "null" values in it. However, as it's a single bit of storage so will be having negligible overhead. If we want to fully remove that column from the table , we may have to create a new table and dump the data into that from the existing table and then rename it back to old. Is this understanding correct? ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Dropping column from big table @ 2024-07-16 04:52 David G. Johnston <[email protected]> parent: sud <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: David G. Johnston @ 2024-07-16 04:52 UTC (permalink / raw) To: sud <[email protected]>; +Cc: [email protected] <[email protected]>; Peter J. Holzer <[email protected]> On Monday, July 15, 2024, sud <[email protected]> wrote: > > However even with "vacuum full", the old rows will be removed completely > from the storage , but the new rows will always be there with the 'dropped' > column still existing under the hood along with the table storage, with > just carrying "null" values in it. […] Is this understanding correct? > No. The table rewrite process involves creating new tuples that exactly conform to the current row specification. The potentially non-null data present in live tuples for columns that have been dropped are not copied into the newly constructed tuples. https://github.com/postgres/postgres/blob/d2b74882cab84b9f4fdce0f2f32e892ba9164f5c/src/backend/acces... David J. ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Dropping column from big table @ 2024-07-16 04:56 David G. Johnston <[email protected]> parent: David G. Johnston <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: David G. Johnston @ 2024-07-16 04:56 UTC (permalink / raw) To: sud <[email protected]>; +Cc: [email protected] <[email protected]>; Peter J. Holzer <[email protected]> On Monday, July 15, 2024, David G. Johnston <[email protected]> wrote: > On Monday, July 15, 2024, sud <[email protected]> wrote: > >> >> However even with "vacuum full", the old rows will be removed completely >> from the storage , but the new rows will always be there with the 'dropped' >> column still existing under the hood along with the table storage, with >> just carrying "null" values in it. […] Is this understanding correct? >> > > No. The table rewrite process involves creating new tuples that exactly > conform to the current row specification. The potentially non-null data > present in live tuples for columns that have been dropped are not copied > into the newly constructed tuples. > > https://github.com/postgres/postgres/blob/d2b74882cab84b9f4fdce0f2f32e89 > 2ba9164f5c/src/backend/access/heap/heapam_handler.c#L2499 > > My bad, stopped at the code comment. Apparently the data is just nulled, not removed, the current row descriptor contains those columns with “is dropped” and since this behavior doesn’t change the catalogs in this way the new ones must as well. We just get the space back. David J. ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Dropping column from big table @ 2024-07-16 05:04 sud <[email protected]> parent: David G. Johnston <[email protected]> 0 siblings, 1 reply; 7+ messages in thread From: sud @ 2024-07-16 05:04 UTC (permalink / raw) To: David G. Johnston <[email protected]>; +Cc: [email protected] <[email protected]>; Peter J. Holzer <[email protected]> On Tue, Jul 16, 2024 at 10:26 AM David G. Johnston < [email protected]> wrote: > > > On Monday, July 15, 2024, David G. Johnston <[email protected]> > wrote: > >> On Monday, July 15, 2024, sud <[email protected]> wrote: >> >>> >>> However even with "vacuum full", the old rows will be removed completely >>> from the storage , but the new rows will always be there with the 'dropped' >>> column still existing under the hood along with the table storage, with >>> just carrying "null" values in it. […] Is this understanding correct? >>> >> >> No. The table rewrite process involves creating new tuples that exactly >> conform to the current row specification. The potentially non-null data >> present in live tuples for columns that have been dropped are not copied >> into the newly constructed tuples. >> >> >> https://github.com/postgres/postgres/blob/d2b74882cab84b9f4fdce0f2f32e892ba9164f5c/src/backend/acces... >> >> > My bad, stopped at the code comment. Apparently the data is just nulled, > not removed, the current row descriptor contains those columns with “is > dropped” and since this behavior doesn’t change the catalogs in this way > the new ones must as well. We just get the space back. > > > Thank you for the confirmation. And if someone wants to fully remove that column from the table , then the only option is to create a new table with an exact set of active columns and insert the data into that from the existing/old table and then rename it back to old. Is this correct understanding? ^ permalink raw reply [nested|flat] 7+ messages in thread
* Re: Dropping column from big table @ 2024-07-16 05:42 David G. Johnston <[email protected]> parent: sud <[email protected]> 0 siblings, 0 replies; 7+ messages in thread From: David G. Johnston @ 2024-07-16 05:42 UTC (permalink / raw) To: sud <[email protected]>; +Cc: [email protected] <[email protected]>; Peter J. Holzer <[email protected]> On Monday, July 15, 2024, sud <[email protected]> wrote: > > Thank you for the confirmation. > And if someone wants to fully remove that column from the table , then the > only option is to create a new table with an exact set of active columns > and insert the data into that from the existing/old table and then rename > it back to old. Is this correct understanding? > You haven’t removed the column from the table, you’ve removed the whole dang table while creating a new, wholly unrelated, one that just happens to have the same name. But the name of a table isn’t its primary key, though some places do use it for lookup purposes. David J. ^ permalink raw reply [nested|flat] 7+ messages in thread
end of thread, other threads:[~2024-07-16 05:42 UTC | newest] Thread overview: 7+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-07-15 20:30 Re: Dropping column from big table sud <[email protected]> 2024-07-16 00:37 ` Peter J. Holzer <[email protected]> 2024-07-16 04:26 ` sud <[email protected]> 2024-07-16 04:52 ` David G. Johnston <[email protected]> 2024-07-16 04:56 ` David G. Johnston <[email protected]> 2024-07-16 05:04 ` sud <[email protected]> 2024-07-16 05:42 ` David G. Johnston <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox