MIME-Version: 1.0
References: <CAKna9VaJ_qHKBnw4O-VT3xGmzqThCuZ=LFXx-hPdw7E6RoqmeA@mail.gmail.com>
 <CAD=mzVUmXmkdvvMG30G1=D4Kq3WqnzGo=0ov9JnRCs1p=KJiTQ@mail.gmail.com>
 <CAKna9VZRc4+Vzbt6qPGMCauE84isPtz-wE_KX9AOt7WKfhwjiQ@mail.gmail.com>
 <CAD=mzVUX13ZM16kP4QhY+F5XiLr=ezCXftKOTKA4eUvhphgOJw@mail.gmail.com>
 <CAKna9Vb_mx+dX02XOV6mpr8RFC-5io38kM6=4xRHQj_MUvQ+aQ@mail.gmail.com>
 <CAKAnmmJ6fqyYafLB_im75oxxfTuCLUY0ftBPU57pUm0g+pm6FQ@mail.gmail.com>
 <CAKna9VZJ4fginFJZenGQxWs9eAw9Z8g-YkdnOFcie5RvuJ=5OQ@mail.gmail.com> <CAKAnmmLv72uk1p8+zmWpkC+BTatrdmRe_NpRbwRsi1LAU-cJFQ@mail.gmail.com>
In-Reply-To: <CAKAnmmLv72uk1p8+zmWpkC+BTatrdmRe_NpRbwRsi1LAU-cJFQ@mail.gmail.com>
From: Lok P <loknath.73@gmail.com>
Date: Sat, 10 Aug 2024 00:52:05 +0530
Message-ID: <CAKna9VZGwtNx9NAZ0QjdT-WhtFETAaFzpUsvM6R90mjaAoP3vA@mail.gmail.com>
Subject: Re: Column type modification in big tables
To: Greg Sabino Mullane <htamfids@gmail.com>
Cc: sud <suds1434@gmail.com>, pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000c096e5061f450f26"
Archived-At: <https://www.postgresql.org/message-id/CAKna9VZGwtNx9NAZ0QjdT-WhtFETAaFzpUsvM6R90mjaAoP3vA%40mail.gmail.com>
Precedence: bulk

--000000000000c096e5061f450f26
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Fri, Aug 9, 2024 at 9:19=E2=80=AFPM Greg Sabino Mullane <htamfids@gmail.=
com>
wrote:

> On Fri, Aug 9, 2024 at 6:39=E2=80=AFAM Lok P <loknath.73@gmail.com> wrote=
:
>
>> Thank you so much. Will definitely try to evaluate this approach. The
>> Only concern I have is , as this data is moving downstream with exactly =
the
>> same data type and length , so will it cause the downstream code to brea=
k
>> while using this column in the join or filter criteria. Also I believe t=
he
>> optimizer won't be able to utilize this information while preparing the
>> execution plan.
>>
>
> Yes, this is not as ideal as rewriting the table, but you asked for
> another approaches :) As to the impact of your downstream stuff, I think
> you have to try and see. Not clear what you mean by the optimizer, it's n=
ot
> going to really care about numeric(10) versus numeric(8) or varchar(20) v=
s
> varchar(2). It's possible the varchar -> numeric could cause issues, but
> without real-world queries and data we cannot say.
>
>
>>  Another thing , correct me if wrong, My understanding is  , if we want
>> to run the "validate constraint" command after running this "check
>> constraint with not valid" command, this will do a full table scan acros=
s
>> all the partitions , but it's still beneficial as compared to updating t=
he
>> columns values for each rows. Correct me if I'm wrong.
>>
>
> Yes, it needs to scan the entire table, but it's a lightweight lock, won'=
t
> block concurrent access, will not need to detoast, and makes no table or
> index updates. Versus an entire table rewrite which will do heavy locking=
,
> take up tons of I/O, update all the indexes, and generate quite a lot of
> WAL.
>
>
Thank you so much Greg.

Considering the option, if we are able to get large down time to get this
activity done.

Some teammates suggested altering the column with "USING" Clause. I am not
really able to understand the difference,  also when i tested on a simple
table, it seems the "USING" clause takes more time as compared to normal
ALTER. But again I don't see any way to see the progress and estimated
completion time. Can you share your thoughts on this?

ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING
mycol::NUMERIC(3);
VS
ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ;

*****
Another thing also comes to my mind whether we should just create a new
partition table(say new_part_table) from scratch from the DDL of the
existing table(say old_part_table) and then load the data into it using
command (insert into new_part_table.. select..from old_part_table). Then
create indexes and constraints etc, something as below.

Will this approach be faster/better as compared to the simple "alter table
alter column approach" as above, considering we will have 4-6 hours of
downtime for altering three different columns on this ~5TB table?


*-- Steps*
Create table exactly same as existing partition table but with the modified
column types/lengths.

drop indexes ;  (Except PK and FK indexes may be..)
drop constraints;

insert into new_part_table (...) select (...) from old_part_table;

create indexes concurrently ;
create constraints; (But this table is also a child table to another
partition table, so creating the foreign key may be resource consuming here
too).

drop the old_part_table;
rename the new_part_table to old_part_table;
rename all the partitions;

VACUUM  old_part_table  ;
ANALYZE  old_part_table  ;

--000000000000c096e5061f450f26
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><br></div><div class=3D"gmail_quote"><div=
 dir=3D"ltr" class=3D"gmail_attr">On Fri, Aug 9, 2024 at 9:19=E2=80=AFPM Gr=
eg Sabino Mullane &lt;<a href=3D"mailto:htamfids@gmail.com">htamfids@gmail.=
com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x"><div dir=3D"ltr"><div dir=3D"ltr">On Fri, Aug 9, 2024 at 6:39=E2=80=AFAM=
 Lok P &lt;<a href=3D"mailto:loknath.73@gmail.com" target=3D"_blank">loknat=
h.73@gmail.com</a>&gt; wrote:<br></div><div class=3D"gmail_quote"><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px =
solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div dir=3D"ltr">=
Thank you so much. Will definitely try to evaluate this approach. The Only =
concern I have is , as this data is moving downstream with exactly the same=
 data type and length , so will it cause the downstream code to break while=
 using this column in the join or filter criteria. Also I believe the optim=
izer won&#39;t be able to utilize this information while preparing the exec=
ution plan.<br></div></div></blockquote><div><br></div><div>Yes, this is no=
t as ideal as rewriting the table, but you asked for another=C2=A0approache=
s :) As to the impact of your downstream stuff, I think you have to try and=
 see. Not clear what you mean by the optimizer, it&#39;s not going to reall=
y care about numeric(10) versus numeric(8) or varchar(20) vs varchar(2). It=
&#39;s possible the varchar -&gt; numeric could cause issues, but without r=
eal-world queries and data we cannot say.</div><div>=C2=A0</div><blockquote=
 class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px so=
lid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><div class=3D"gmail=
_quote"><div>=C2=A0Another thing , correct me if wrong, My understanding is=
=C2=A0 , if we want to run the &quot;validate constraint&quot; command afte=
r running this &quot;check constraint with not valid&quot; command, this wi=
ll do a full table scan across all the partitions , but it&#39;s=C2=A0still=
 beneficial as compared to updating the columns values for each rows. Corre=
ct me if I&#39;m wrong.=C2=A0</div></div></div></blockquote><div><br></div>=
<div>Yes, it needs to scan the entire table, but it&#39;s a lightweight loc=
k, won&#39;t block concurrent access, will not need to detoast, and makes n=
o table or index updates. Versus an entire table rewrite which will do heav=
y locking, take up tons of I/O, update all the indexes, and generate quite =
a lot of WAL.=C2=A0</div><div><br></div></div></div></blockquote><div><br><=
/div><div>Thank you so much Greg.</div><br>Considering the option, if we ar=
e able to get large down time to get this activity done. <br><br>Some teamm=
ates suggested altering the column with &quot;USING&quot; Clause. I am not =
really able to understand the difference, =C2=A0also when i tested on a sim=
ple table, it seems the &quot;USING&quot; clause takes more time as compare=
d to normal ALTER. But again I don&#39;t see any way to see the progress an=
d estimated completion time. Can you share your thoughts on this?=C2=A0<br>=
<br>ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING mycol::NUME=
RIC(3);<br>VS<br>ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ;<br=
><br>*****</div><div class=3D"gmail_quote">Another thing also comes to my m=
ind whether we should just create a new partition table(say new_part_table)=
 from scratch from the DDL of the existing table(say old_part_table) and th=
en load the data into it using command (insert into new_part_table.. select=
..from old_part_table). Then create indexes and constraints etc, something =
as below. <br><br>Will this approach be faster/better as compared to the si=
mple &quot;alter table alter column approach&quot; as above, considering we=
 will have 4-6 hours of downtime for altering three different columns on th=
is ~5TB table?<br><br><b>-- Steps<br></b><br>Create table exactly same as e=
xisting partition table but with the modified column types/lengths. <br><br=
>drop indexes ;=C2=A0

(Except=C2=A0PK and FK indexes may be..)<br>drop constraints; <br><br>inser=
t into new_part_table (...) select (...) from old_part_table;<br><br>create=
 indexes concurrently ;<br>create constraints; (But this table is also a ch=
ild table to another partition table, so creating the foreign key may be re=
source consuming here too).<br><br>drop the old_part_table;<br>rename the n=
ew_part_table to old_part_table;<br>rename all the partitions;<br><br>VACUU=
M=C2=A0

old_part_table=C2=A0 ;<br><div>ANALYZE=C2=A0

old_part_table=C2=A0 ;=C2=A0</div></div></div>

--000000000000c096e5061f450f26--