Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1scl2S-00AFab-Tf for pgsql-general@arkaria.postgresql.org; Sat, 10 Aug 2024 12:17:41 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1scl2R-008Rq4-KB for pgsql-general@arkaria.postgresql.org; Sat, 10 Aug 2024 12:17:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1scl2R-008Rpv-7i for pgsql-general@lists.postgresql.org; Sat, 10 Aug 2024 12:17:39 +0000 Received: from mail-vk1-xa2c.google.com ([2607:f8b0:4864:20::a2c]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1scl2K-0047od-UU for pgsql-general@lists.postgresql.org; Sat, 10 Aug 2024 12:17:38 +0000 Received: by mail-vk1-xa2c.google.com with SMTP id 71dfb90a1353d-4f521a22d74so848234e0c.2 for ; Sat, 10 Aug 2024 05:17:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723292251; x=1723897051; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=wd9VnK+/nU6zJSiXcv2WlJ6j9REIVNceU99aeE+PFWY=; b=Y9Yxq+VC5I8WOMpXpeMawb4HyQOTyEyVn4TR1oowf3tTNtatu20R9vXEeukCFaHlbt uJEvaBM70gnLa5SMUJxVwYOKWsDQeiQdzaQq1hJ3Lp2T1dk3Qkt3GMdwK1l/k+3e+x5m WePFVMj3m9aB+QiXYmsIs6dPSZ0K2ITTqrjrPMog+BOh4sSy8JLIeHKgBvYlMXLdqdHs kNUstxHfVvwp8Q75x3XVNV5Q68jgssKwK9hxOUofoL6u9MIEJzYlY78Wy0SCz1kuhn5X MIApf721KgnKRO5J4FFVdR9ZRHXtjWIGMZSo1mfkl443O24cTQMw2o2AL0+eXQNkegTW dXiw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723292251; x=1723897051; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wd9VnK+/nU6zJSiXcv2WlJ6j9REIVNceU99aeE+PFWY=; b=Ws/z7QOqzs0PCy9AEaEyV5ma5sAVqc2ogTWDXGiyeN+i5Yz839gLuLXzcApxyP86be OfnaG1JkpWrMmxs24+zhiTL8nrsoWG8JAaD75PI2af2fw5smurSKOksiNOejfgkJ62AN TMixqfVy9CLxR+S7IusjEUG+xe5epTHkgL8CxND9182wMJbQYll+rumgsNlWEhHnlqM7 EapUCeLtQuCyzTBVwb/8QY9fstCM6TbY5I4ZhS+esdl5Uk5Oi949cX04HcOw+/YRRXPk cAWouYBsuhbNs95JcPq4/ZP8mJ0mqBjnNN9TuPUBO4chTejh/+UTQSdtFI1d1/9Sdl+u 4Epw== X-Forwarded-Encrypted: i=1; AJvYcCW/y45SBDfXmePleL1LE99Zak2ApokX/bAF2Jq8NTKLY1lgU4EeKaPNzgxFPD2NdllciCBQQpjsvr4cs/6iXfLWnr2v3oc4dpSUzeUuRKtJDB1U X-Gm-Message-State: AOJu0YzXZ4iYg2RcfaDnsqMnq6+sqCU0SGpo7zuWqYDH9pxxQzS2cS3J nSPuZ43DTgi7CrhZqmL+M++CC9yZ/TIk2g70hB8LkP6cjGpWzgxmD7si3XYe2ds7IxwIAY/nlRi ln3rANqIafiIqRLb0ROg4niNT7NmjHg== X-Google-Smtp-Source: AGHT+IGDQzAkzXicmoSkRNMNp9nl3JTutgOD4PJyscbn0jsU46GdpNGWCykKWwThBTcJQmfEPuPKyb7faMwKOsW1Dnc= X-Received: by 2002:a05:6122:169c:b0:4eb:5cb9:f219 with SMTP id 71dfb90a1353d-4f912ba8b11mr5168129e0c.0.1723292250609; Sat, 10 Aug 2024 05:17:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: sud Date: Sat, 10 Aug 2024 17:47:18 +0530 Message-ID: Subject: Re: Column type modification in big tables To: Lok P Cc: Greg Sabino Mullane , pgsql-general Content-Type: multipart/alternative; boundary="0000000000007121cd061f533eae" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000007121cd061f533eae Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sat, Aug 10, 2024 at 12:52=E2=80=AFAM Lok P wrote= : > > On Fri, Aug 9, 2024 at 9:19=E2=80=AFPM Greg Sabino Mullane > wrote: > >> On Fri, Aug 9, 2024 at 6:39=E2=80=AFAM Lok P wrot= e: >> >>> Thank you so much. Will definitely try to evaluate this approach. The >>> Only concern I have is , as this data is moving downstream with exactly= the >>> same data type and length , so will it cause the downstream code to bre= ak >>> while using this column in the join or filter criteria. Also I believe = the >>> optimizer won't be able to utilize this information while preparing the >>> execution plan. >>> >> >> Yes, this is not as ideal as rewriting the table, but you asked for >> another approaches :) As to the impact of your downstream stuff, I think >> you have to try and see. Not clear what you mean by the optimizer, it's = not >> going to really care about numeric(10) versus numeric(8) or varchar(20) = vs >> varchar(2). It's possible the varchar -> numeric could cause issues, but >> without real-world queries and data we cannot say. >> >> >>> Another thing , correct me if wrong, My understanding is , if we want >>> to run the "validate constraint" command after running this "check >>> constraint with not valid" command, this will do a full table scan acro= ss >>> all the partitions , but it's still beneficial as compared to updating = the >>> columns values for each rows. Correct me if I'm wrong. >>> >> >> Yes, it needs to scan the entire table, but it's a lightweight lock, >> won't block concurrent access, will not need to detoast, and makes no ta= ble >> or index updates. Versus an entire table rewrite which will do heavy >> locking, take up tons of I/O, update all the indexes, and generate quite= a >> lot of WAL. >> >> > Thank you so much Greg. > > Considering the option, if we are able to get large down time to get this > activity done. > > Some teammates suggested altering the column with "USING" Clause. I am no= t > really able to understand the difference, also when i tested on a simple > table, it seems the "USING" clause takes more time as compared to normal > ALTER. But again I don't see any way to see the progress and estimated > completion time. Can you share your thoughts on this? > > ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) USING > mycol::NUMERIC(3); > VS > ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) ; > > ***** > Another thing also comes to my mind whether we should just create a new > partition table(say new_part_table) from scratch from the DDL of the > existing table(say old_part_table) and then load the data into it using > command (insert into new_part_table.. select..from old_part_table). Then > create indexes and constraints etc, something as below. > > Will this approach be faster/better as compared to the simple "alter tabl= e > alter column approach" as above, considering we will have 4-6 hours of > downtime for altering three different columns on this ~5TB table? > > > *-- Steps* > Create table exactly same as existing partition table but with the > modified column types/lengths. > > drop indexes ; (Except PK and FK indexes may be..) > drop constraints; > > insert into new_part_table (...) select (...) from old_part_table; > > create indexes concurrently ; > create constraints; (But this table is also a child table to another > partition table, so creating the foreign key may be resource consuming he= re > too). > > drop the old_part_table; > rename the new_part_table to old_part_table; > rename all the partitions; > > VACUUM old_part_table ; > ANALYZE old_part_table ; > My 2cents. If you have enough time then from a simplicity point of view, your single line alter command may look good, but how are you going to see the amount of progress it has made so far and how much time it's going to take to finish. And you got ~6hrs of down time but if it fails at 5th hour then you will be in a bad position. --0000000000007121cd061f533eae Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Sat, Aug 10, 2024 at 12:52=E2=80=AFAM = Lok P <loknath.73@gmail.com&= gt; wrote:

On Fri, Aug 9, 2024 at 9:19=E2=80=AFPM Greg S= abino Mullane <h= tamfids@gmail.com> wrote:
On Fri, Aug 9, 2024 at 6= :39=E2=80=AFAM Lok P <loknath.73@gmail.com> wrote:
Thank you so much. Will definitely try to evaluate this appro= ach. The Only concern I have is , as this data is moving downstream with ex= actly the same data type and length , so will it cause the downstream code = to break while using this column in the join or filter criteria. Also I bel= ieve the optimizer won't be able to utilize this information while prep= aring the execution plan.

Y= es, this is not as ideal as rewriting the table, but you asked for another= =C2=A0approaches :) As to the impact of your downstream stuff, I think you = have to try and see. Not clear what you mean by the optimizer, it's not= going to really care about numeric(10) versus numeric(8) or varchar(20) vs= varchar(2). It's possible the varchar -> numeric could cause issues= , but without real-world queries and data we cannot say.
=C2=A0
=C2=A0Another thing , correct me if wrong, My u= nderstanding is=C2=A0 , if we want to run the "validate constraint&quo= t; command after running this "check constraint with not valid" c= ommand, this will do a full table scan across all the partitions , but it&#= 39;s=C2=A0still beneficial as compared to updating the columns values for e= ach rows. Correct me if I'm wrong.=C2=A0
=

Yes, it needs to scan the entire table, but it's a = lightweight lock, won't block concurrent access, will not need to detoa= st, and makes no table or index updates. Versus an entire table rewrite whi= ch will do heavy locking, take up tons of I/O, update all the indexes, and = generate quite a lot of WAL.=C2=A0


Thank you so much Greg.

Considering the o= ption, if we are able to get large down time to get this activity done.
Some teammates suggested altering the column with "USING" Cl= ause. I am not really able to understand the difference, =C2=A0also when i = tested on a simple table, it seems the "USING" clause takes more = time as compared to normal ALTER. But again I don't see any way to see = the progress and estimated completion time. Can you share your thoughts on = this?=C2=A0

ALTER TABLE foobar ALTER COLUMN mycol TYPE NUMERIC(3) US= ING mycol::NUMERIC(3);
VS
ALTER TABLE foobar ALTER COLUMN mycol TYPE = NUMERIC(3) ;

*****
Another thing als= o comes to my mind whether we should just create a new partition table(say = new_part_table) from scratch from the DDL of the existing table(say old_par= t_table) and then load the data into it using command (insert into new_part= _table.. select..from old_part_table). Then create indexes and constraints = etc, something as below.

Will this approach be faster/better as com= pared to the simple "alter table alter column approach" as above,= considering we will have 4-6 hours of downtime for altering three differen= t columns on this ~5TB table?

-- Steps

Create table ex= actly same as existing partition table but with the modified column types/l= engths.

drop indexes ;=C2=A0 (Except=C2=A0PK and FK indexes may be..)
drop constraints;

inser= t into new_part_table (...) select (...) from old_part_table;

create= indexes concurrently ;
create constraints; (But this table is also a ch= ild table to another partition table, so creating the foreign key may be re= source consuming here too).

drop the old_part_table;
rename the n= ew_part_table to old_part_table;
rename all the partitions;

VACUU= M=C2=A0 old_part_table=C2=A0 ;
ANALYZE=C2=A0 old_part_table=C2=A0 ;=C2=A0

<= div>
My 2cents.=C2=A0
If you have enough time then = from a simplicity point of view, your single line alter command may look go= od, but how are you going to see=C2=A0 the amount of progress it has made s= o far and how much time it's going to take to finish. And you got ~6hrs= of down time but if it fails at 5th hour then you will be in a bad positio= n.=C2=A0


--0000000000007121cd061f533eae--