Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u9aMp-003idK-8x for pgsql-admin@arkaria.postgresql.org; Tue, 29 Apr 2025 02:06:39 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1u9aMm-004Z1z-RW for pgsql-admin@arkaria.postgresql.org; Tue, 29 Apr 2025 02:06:37 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1u9aMm-004Z1r-Ej for pgsql-admin@lists.postgresql.org; Tue, 29 Apr 2025 02:06:37 +0000 Received: from mail-oo1-xc2e.google.com ([2607:f8b0:4864:20::c2e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1u9aMl-0006cd-1E for pgsql-admin@postgresql.org; Tue, 29 Apr 2025 02:06:37 +0000 Received: by mail-oo1-xc2e.google.com with SMTP id 006d021491bc7-605f7d3215cso2746599eaf.0 for ; Mon, 28 Apr 2025 19:06:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1745892393; x=1746497193; darn=postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=MNxIi+YtdDrs9iUVI++gEg+xpgSw+0ERa4Lfe0SfXrg=; b=YNkLzGlGv14yI6lUZVJDKAt0QEGblek4optPwtuWD3xJ0Ec6KIppXllhuLaoNEwEEF /lF427uzDi1UZqkueoHwqjrqx4hKPNdgTqlPqSIw40ToCfJRN+aRuC05wwA0lfnqiYJM pqiO3uXvGjn04TmTfWdSsSp35m2qtj9ifM24cG5RSRuNmcP3jIpgANFb3Y6cYy/YN+Uc QSBdTJPy7FuBW3fVFmR0RzRViSEluD9LrywMIsvV0HcA5GyHrFmbdAtEfiV9iFuE+W5V Ba9HfIk9x9gZan41eyf0yz+KH+G6Yrn8N5W4bQRsWHkrsiKFrI20Off8SIVlIW98Rc7l iSxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1745892393; x=1746497193; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=MNxIi+YtdDrs9iUVI++gEg+xpgSw+0ERa4Lfe0SfXrg=; b=OOKoJNru/E50zwLOiD6JcSs15wZAMcvsu9/RM2RSyxQleikmOzhnHfjCVgmx3HSaTT k1xiNO5+p3FBMpIkde6paNEKFqAeGWn88eLMfkP7HRTUgHW1cfhlqrQyN+yRNI2XRv00 M6EoXd9raMivm+Xsb6ydOTaE5F8137KagGUSh3VjAauvKXU6md+yqhFmg7jQGUxCDrv4 UEjUkafyaRY7KtLaeeBmtJClJnEiXaLvBmbbTceEqdZN6oOZyQxOhIg+T0hqHEXxQARo 7fPjz9vYFz4u3qS1mh2eQOFOf06G3pr+I3ltuALU2cW1dOuH6Oh/l2YkitkpkHKDYYW4 9elg== X-Gm-Message-State: AOJu0Yz1R5ty6+O/dqxmG+7r3xeczxV7hPYinAFeFD0ieWjh58+1BF3j sQbTEsvo8obrlLfEyrL/kpFuY+fghXCmOZ6ZCmIsPBm0Ya5HOoGOkjXDXQXCERCZZijirP1pJ3o 23r0jiin5+y1YII5FTv0Dy6owXtCAaiqx X-Gm-Gg: ASbGncv8+IhzO5PYXP5kb8E9E2/Su0KERBCkAq3QJ95De7oUilZChJ4sDGt45QoPsUt RuSKRTLpSiHkrgXlYIisnd8c5KGM5ahUzmrC1jkvqEr5NRIFQLC+dPExPKFyzb7ZQXE+oQs1Leh Xh4lk36b1rWdcotKOlhCGPgIE= X-Google-Smtp-Source: AGHT+IHX9sl0zISilGywB5hg8Ew78LWw3WAPsNOSL3qLjPlljKxEmp+gQ/1hSgAb4czht4/w5ly9a5KEkI+2j7VRmDc= X-Received: by 2002:a05:6820:3090:b0:602:7078:df55 with SMTP id 006d021491bc7-60685ee9193mr509984eaf.5.1745892392580; Mon, 28 Apr 2025 19:06:32 -0700 (PDT) MIME-Version: 1.0 References: <17230BE4-D6C6-422F-8537-6E2381B30AD8@gmail.com> In-Reply-To: <17230BE4-D6C6-422F-8537-6E2381B30AD8@gmail.com> From: Ron Johnson Date: Mon, 28 Apr 2025 22:06:21 -0400 X-Gm-Features: ATxdqUHVr53I_qzwpN26osIF5bqdTVCsvtLXofb-yXBQxRXRfIySlw-vRfU_-SA Message-ID: Subject: Re: Adding New Column with default value. To: pgsql-admin Content-Type: multipart/alternative; boundary="000000000000e044620633e13fa2" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e044620633e13fa2 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Apr 28, 2025 at 9:13=E2=80=AFPM Ed Sabol w= rote: > On Apr 28, 2025, at 1:24 PM, Gambhir Singh > wrote: > > Row Count - 50 Billion > > I've never dealt with a table that huge personally, but my concern would > be that ALTER TABLE will lock the table for a very long time. Is this in = a > production environment with active usage of this table? Just SELECTs or a= re > we talking UPDATEs and INSERTs as well? If so, you might need to do > something more complicated than just ALTER TABLE. > > If you have enough disk space in the storage area for this database to > have two identical copies of this 50 billion row table (with indexes!), y= ou > could make a copy of the table and either ALTER that copy or add the new > column at the same time as making the copy and then, in a single > transaction, rename the two tables to swap them. If you do it this way, t= he > new table will replace the old table seamlessly without interrupting usag= e > of the table. Somewhere in there, you might need to re-sync the two table= s > to make sure any rows that got inserted or updated while you were making > the copy are incorporated into the new version of the table as well. > > Just some initial thoughts on how I would accomplish this and things I > would consider when deciding how to do it. > > COPY TO of that table, and then COPY FROM into a new table would let OP experiment. Since it's 50Bn rows, COPY TO of a quarter of the rows is probably adequate. Hopefully this bolding comes through: "When a column is added with *ADD COLUMN and a non-volatile DEFAULT* is specified, the default is evaluated at the time of the statement and the result stored in the table's metadata. That value will be used for the column for all existing rows. If no DEFAULT is specified, NULL is used. *In neither case is a rewrite of the table required.*" According to https://www.postgresql.org/docs/17/sql-altertable.html, "*Addi= ng a *CHECK or *NOT NULL* constraint requires scanning the table to verify that existing rows meet the constraint, but *does not require a table rewrit*e." That's probably pretty fast, even if an exclusive lock is required. Thus, I'd probably try this on the table copy: ALTER TABLE foo ADD COLUMN bar BIGINT NOT NULL DEFAULT 0; UPDATE foo SET bar =3D 0 WHERE pk between 0*1000+0 AND 0*1000+9999; UPDATE foo SET bar =3D 0 WHERE pk between 1*1000+0 AND 1*1000+9999; UPDATE foo SET bar =3D 0 WHERE pk between 2*1000+0 AND 2*1000+9999; etc. The UPDATE statement would be in a bash loop, with the 0, 1, 2, 3... a variable. I'd also stick an occasional VACUUM in the bash script. --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --000000000000e044620633e13fa2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Mon, Apr 28, 2025 at 9:13=E2=80=AFPM E= d Sabol <edwardjsabol@gmail.co= m> wrote:
On Apr 28, 2025, at 1:24 PM, = Gambhir Singh <gambhir.singh05@gmail.com> wrote:
> Row Count - 50 Billion

I've never dealt with a table that huge personally, but my concern woul= d be that ALTER TABLE will lock the table for a very long time. Is this in = a production environment with active usage of this table? Just SELECTs or a= re we talking UPDATEs and INSERTs as well? If so, you might need to do some= thing more complicated than just ALTER TABLE.

If you have enough disk space in the storage area for this database to have= two identical copies of this 50 billion row table (with indexes!), you cou= ld make a copy of the table and either ALTER that copy or add the new colum= n at the same time as making the copy and then, in a single transaction, re= name the two tables to swap them. If you do it this way, the new table will= replace the old table seamlessly without interrupting usage of the table. = Somewhere in there, you might need to re-sync the two tables to make sure a= ny rows that got inserted or updated while you were making the copy are inc= orporated into the new version of the table as well.

Just some initial thoughts on how I would accomplish this and things I woul= d consider when deciding how to do it.


<= div>COPY TO of that table, and then COPY FROM into a new table would let OP= experiment.=C2=A0 Since it's 50Bn rows, COPY TO of a quarter of the ro= ws is probably adequate.

Hopefully this bolding co= mes through:

"When a colu= mn is added with=C2=A0ADD COLUMN=C2=A0and a non-volatile=C2= =A0DEFAULT=C2=A0is specified, the default is evaluated at th= e time of the statement and the result stored in the table's metadata. = That value will be used for the column for all existing rows. If no=C2=A0DEFAULT=C2=A0is specified, NULL is used. In neither case is a r= ewrite of the table required."

Acc= ording=C2=A0to=C2=A0https://www.postgresql.org/docs/17/sql-altertable.html, &qu= ot;Adding a=C2=A0CHECK=C2=A0or=C2=A0<= /span>NOT NULL=C2=A0constraint requires scanning the table to ve= rify that existing rows meet the constraint, but does not require a tabl= e rewrite."

Th= at's probably pretty fast, even if an exclusive lock is required.
=

Thus, I'd probably try this on the table copy:
ALTER TABLE foo ADD COLUMN bar BIGINT NOT NULL DEFAULT 0;
<= div>UPDATE foo SET bar =3D 0 WHERE pk between 0*1000+0 AND 0*1000+9999;
UPDATE foo SET bar =3D 0 WHERE pk between 1*1000+0 AND 1*1000+9= 999;
UPDATE foo SET bar =3D 0 WHERE pk between 2*1000+0= AND 2*1000+9999;
etc.

=
The UPDATE statement would be in a bash loop, with the 0, 1, 2, = 3... a variable.

I'd also stick an occasional = VACUUM in the bash script.

--
Death to <Redacted>, and butter sauce.
Don't boil m= e, I'm still alive.
<Redacted> lobster!
<= /div>
--000000000000e044620633e13fa2--