Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tGGoD-001ubS-6I for pgsql-general@arkaria.postgresql.org; Wed, 27 Nov 2024 12:06:17 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tGGoB-00EC6N-Sn for pgsql-general@arkaria.postgresql.org; Wed, 27 Nov 2024 12:06:15 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tGGoB-00EC6F-Eq for pgsql-general@lists.postgresql.org; Wed, 27 Nov 2024 12:06:15 +0000 Received: from mail-wm1-x333.google.com ([2a00:1450:4864:20::333]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tGGo8-004JhG-Nz for pgsql-general@lists.postgresql.org; Wed, 27 Nov 2024 12:06:14 +0000 Received: by mail-wm1-x333.google.com with SMTP id 5b1f17b1804b1-434aa472617so6135565e9.3 for ; Wed, 27 Nov 2024 04:06:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1732709171; x=1733313971; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=HHf79AhLm498UROGwKV8Yts/qJnu3IvSjauGVdgXWoU=; b=AnPnRQwVGsDHVG2I0LwAP43vjhvOe3AsC2sIIZ8Tg4TNvnIq92eaJFzbsxFFcV0Kdu ZhES0ay/20Q9ke1mjjtjHro+zaFUUUxmfwlNjZ6kW4Qcrn+tdcvaAEZQ3MktkDXT4c+x yNyNmvwgeNCKmIt/8NByMtbkfGy9LfIufrY06rBG0o7LcGvQzCL+ki9BHGd0v2fBb/LX Sa6oRQ26S8XEAE7UGyMq9N0qgAlOIN8LRc7W+4CuoGMIXzyO4uoewdOK/IuPiYRE3s6W ZSa+TZlbRu5bBPkF6bvDGAIQddfmtl2UjrnDwxOQ7nvbxwxGK1jIdqf5Hl6RQAge1jEM H/Ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732709171; x=1733313971; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=HHf79AhLm498UROGwKV8Yts/qJnu3IvSjauGVdgXWoU=; b=a6H/ZljxKkVCCSbHcJybb8idTMHr1TbASdJbIczk0Eb363HUDFutdCKVvI54sa8vYB d3KS0+ZEHsV9x3th+gygIu4ZhqhkNfTDFK4kqWN7T0bNydJxdFwFFXcDxuvFXinFZPEx GuYN1T3/pEGzAFJQw38g/mGAbA0SoG9rQ0C+N1IdT4Ao279iyItdVafPkVsYRXUUrhsY TIoTnsxHWeMRugn1QrKMqkSIGVf+0zFaqqiZVQKLrufWUgcC1loM6jN+7aqx7xh7xGpT yzOUDeK0JN9ZtYi6FBicS3Rd74Klwm7LVDLweZA4n1R0XLfSBr6BoHepc/bh39UIbeh4 Hejw== X-Gm-Message-State: AOJu0YynRKJ1t42N/O89S7yClzK0MploRKgPxpS8rq2hnMElrg4qLaGv yv7Es7lPcXSQaTSHH+Gjio5jX40BHz/EUVmVv79/nGGjxBypWh3wOeeMaMqnJEViDKzzLTmrQLl PZsUi/bb097XwloXhg77O7G1LV0A= X-Gm-Gg: ASbGncvRMrJCactzTzeIuv8ND/2HsHIZ8SnPDToica/HlJOThsDooYtIEjZ35Rha0wu iY9pdwkYfd1ZlpgaD0W3ikf1uEOSHz7Muz5/QSXwOXXOUqR7rXfXjVBnN7QfblA== X-Google-Smtp-Source: AGHT+IFt4zoKqmGM3QV3GMohe4vhA60PXuHgXR6+xmPhGx/VwVpFx7USZ6NnikGmzLO7FtgwEoFRsYNUgpzC2zQPbJo= X-Received: by 2002:a05:600c:154f:b0:434:a802:e99a with SMTP id 5b1f17b1804b1-434a9db8000mr25527925e9.4.1732709170546; Wed, 27 Nov 2024 04:06:10 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Philip Couling Date: Wed, 27 Nov 2024 12:05:59 +0000 Message-ID: Subject: Re: Validating check constraints without a table scan? To: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000009c11610627e3ca82" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000009c11610627e3ca82 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable It looks like updating pg_constraint isn't an option for AWS RDS due to the way AWS doesn't give you superuser access. Thanks a lot for the suggestion anyway. On Fri, 15 Nov 2024 at 15:28, Torsten F=C3=B6rtsch wrote: > On Fri, Nov 15, 2024 at 9:38=E2=80=AFAM Philip Couling wrote: > >> Is there a solid reason why adding a check constraint does not use >> existing indexes for validation. >> >> We are currently looking at partitioning a multi TB table leaving all >> existing data in place and simply attaching it as a partition to a new >> table. To prevent locking, we are trying to add an INVALID check constra= int >> first and then validate it. >> >> I can trivially prove the invalid constraint is valid with a simple >> SELECT which will use an existing index and return instantaneously. But >> AFAIK Theres no way to mark a constraint as valid without scanning all t= he >> rows. >> > > Most likely your query is not exactly the same as the check constraint. > Think about NULL and similar. > > >> This operation is really problematic on a production database with heavy >> IO load. >> >> Is there a solid ready why validating check constraints cannot use >> existing indexes? If I can prove the constraint is valid so trivially wi= th >> a SELECT, then why can Postgres not do the same (or similar)? >> > > Here is what has worked for me many times: > > 1. create the check constraint as NOT VALID. From now on no new or update= d > row is allowed to violate it. > 2. check if the constraint holds with a query on a binary replica. Make > sure the query starts only when the constraint is visible on the replica. > 3. on the master: UPDATE pg_constraint SET conisvalidated=3Dtrue WHERE > conname=3D'your_constraint_name' AND conrelid=3D'schema.table'::regclass > > How you perform step 2 is irrelevant. Checking it on a replica would > simply avoid the load on the master. You just need to make sure there is = no > conflicting data in the table. > > WARNING, you need to be damn sure of your data if you do that. But if you > are, it works. > > Here is the procedure how I solved the same problem for some of our > multi-TB tables (PG14): > > The table has a column called transaction_time. We wanted to partition by > that column. For some historical reason the column did not have a NOT NUL= L > constraint. However, there was no way our processes could insert NULL in > that column and there was no row with NULL in that field. So, first was t= o > add the NOT NULL constraint: > > BEGIN; > > ALTER TABLE my.table > ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT > NULL) NOT VALID; > > UPDATE pg_constraint > SET convalidated=3Dtrue > WHERE conname =3D 'transaction_time_not_null' > AND conrelid =3D 'my.table'::REGCLASS > RETURNING conname, conrelid::REGCLASS, convalidated; > > COMMIT; > > Now for cosmetic purposes we first turn the check constraint above into a > normal NOT NULL constraint: > > BEGIN; > > SET LOCAL client_min_messages =3D 'debug4'; > -- expecting this message > -- DEBUG: existing constraints on column "table.transaction_time" are > sufficient to prove that it does not contain nulls > ALTER TABLE my.table > ALTER COLUMN transaction_time SET NOT NULL; > RESET client_min_messages; > > ALTER TABLE my.table > DROP CONSTRAINT transaction_time_not_null; > > COMMIT; > > If you set client_min_messages to something like debug4, then the databas= e > tells you if it wants to scan the table or if existing constraints are > sufficient to prove the condition. > > transaction_time in our case is never in the future. Also database > transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust > the effect of any action performed more than 30 seconds ago in the databa= se > is visible. > > So, I set the time after which new rows go to the new partition at least > 10 minutes from now at the next hour boundary. 30 seconds would be good > enough. I chose 10 minutes just for extra safety. > > SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS > switch_time\gset > > Next comes the actual change: > > BEGIN; > > -- rename the existing table > ALTER TABLE my.table RENAME TO table_old; > > -- drop triggers. We will recreate them later. > DROP TRIGGER ... ON my.table_old; > DROP TRIGGER ...; > > -- create partitioned table > CREATE TABLE my.table ( > LIKE my.table_old > INCLUDING DEFAULTS > INCLUDING CONSTRAINTS > ) > PARTITION BY RANGE (transaction_time); > > -- recreate triggers > CREATE TRIGGER ... > BEFORE DELETE ON my.table > FOR EACH ROW EXECUTE FUNCTION my,trigger_fun(); > CREATE TRIGGER ...; > > -- create the partition for future rows > CREATE TABLE my.table_current PARTITION OF my.table > FOR VALUES FROM (:'switch_time') TO ('infinity'); > > -- and some indexes and FK > ALTER TABLE my.table_current ADD PRIMARY KEY (id); > CREATE INDEX ON my.table_current (...); > ALTER TABLE my.table_current > ADD CONSTRAINT fk_name > FOREIGN KEY (...) > REFERENCES ...(...) ON UPDATE ... ON DELETE ...; > > -- now attach the old table as partition > ALTER TABLE my.table_old > ADD CONSTRAINT partition_boundaries > CHECK('-infinity'<=3Dtransaction_time AND transaction_time<:'switch_time'= ) > NOT VALID; > > -- for procedural reasons we know the constraint is valid. Let's make PG > believe it too. > UPDATE pg_constraint > SET convalidated=3Dtrue > WHERE conname=3D'partition_boundaries' > AND conrelid=3D'my.table_old'::REGCLASS > RETURNING conname, conrelid::REGCLASS, convalidated; > > -- now attach it. We use again debug4 to check if the table scan is > avoided. > SET LOCAL client_min_messages =3D 'debug4'; > -- expecting > -- DEBUG: partition constraint for table "table_old" is implied by > existing constraints > ALTER TABLE my.table ATTACH PARTITION my.table_old > FOR VALUES FROM ('-infinity') TO (:'switch_time'); > RESET client_min_messages; > > -- drop the now unnecessary constraint > ALTER TABLE my.table_old > DROP CONSTRAINT partition_boundaries; > > COMMIT; > > Once the new partition gets too full, we will use a similar procedure to > adjust the partition boundary of the new partition and then create the ne= xt > partition. > --0000000000009c11610627e3ca82 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
It looks like updating pg_constraint isn't an opt= ion for AWS RDS due to the way AWS doesn't give you superuser access.
Thanks a lot for the suggestion anyway.

On Fri, 15 Nov 2024 = at 15:28, Torsten F=C3=B6rtsch <tfoertsch123@gmail.com> wrote:
On Fri, Nov 15, 2= 024 at 9:38=E2=80=AFAM Philip Couling <couling@gmail.com> wrote:
Is there a so= lid reason why adding a check constraint does not use existing indexes for = validation.

We are currently l= ooking at partitioning a multi TB table leaving all existing data in place = and simply attaching it as a partition to a new table. To prevent locking, = we are trying to add an INVALID check constraint first and then validate it= .

I can trivially prove = the invalid constraint is valid with a simple SELECT which will use an exis= ting index and return instantaneously. But AFAIK Theres no way to mark a co= nstraint as valid without scanning all the rows.
Most likely your query is not exactly the same as the check con= straint. Think about NULL and similar.
=C2=A0
This operation is really problematic on a production database with heavy= IO load.

Is there a sol= id ready why validating check constraints cannot use existing indexes? If I= can prove the constraint is valid so trivially with a SELECT, then why can= Postgres not do the same (or similar)?

Here is what has worked for me many times:

1. c= reate the check constraint as NOT VALID. From now on no new or updated row = is allowed to violate it.
2. check if the constraint holds with a= query on a binary replica. Make sure the query starts only when the constr= aint is visible on the replica.
3. on the master: UPDATE pg_const= raint SET=C2=A0conisvalidated=3Dtrue=C2=A0WHERE conname=3D'your_constra= int_name' AND conrelid=3D'schema.table'::regclass
How you perform step 2 is irrelevant. Checking it on a replica = would simply avoid the load on the master. You just need to make sure there= is no conflicting data in the table.

WARNING, you= need to be damn sure of your data if you do that. But if you are, it=C2=A0= works.

Here is the procedure how I solved the same= problem for some of our multi-TB tables (PG14):

T= he table has a column called transaction_time. We wanted to partition by th= at column. For some historical reason the column did not have a NOT NULL co= nstraint. However, there was no way our=C2=A0processes could insert NULL in= that column and there was no row with NULL in that field. So, first was to= add the NOT NULL constraint:

BEGIN;

ALTER = TABLE my.table
ADD CONSTRAINT transaction_time_not_null CHECK(transactio= n_time IS NOT NULL) NOT VALID;

UPDATE pg_constraint
=C2=A0 =C2=A0= SET convalidated=3Dtrue
=C2=A0WHERE conname =3D 'transaction_time_no= t_null'
=C2=A0 =C2=A0AND conrelid =3D 'my.table'::REGCLASSRETURNING conname, conrelid::REGCLASS, convalidated;

COMMIT;
=

Now for cosmetic purposes we first turn the check const= raint above into a normal NOT NULL constraint:

BEG= IN;

SET LOCAL client_min_messages =3D 'debug4';
-- expect= ing this message
-- DEBUG: =C2=A0existing constraints on column "ta= ble.transaction_time" are sufficient to prove that it does not contain= nulls
ALTER TABLE my.table
=C2=A0 =C2=A0 =C2=A0 ALTER COLUMN transac= tion_time SET NOT NULL;
RESET client_min_messages;

ALTER TABLE my= .table
=C2=A0 =C2=A0 =C2=A0 DROP CONSTRAINT transaction_time_not_null;

COMMIT;

If you set cli= ent_min_messages to something like debug4, then the database tells you if i= t wants to scan the table or if existing constraints are sufficient to prov= e the condition.

transaction_time in our case= is never in the future. Also database transactions are short-lived. Nothin= g exceeds 30 seconds. So, I can trust the effect of any action performed mo= re than 30 seconds ago in the database is visible.

So, = I set the time after which new rows go to the new partition at least 10 min= utes from now at the next hour boundary. 30 seconds would be good enough. I= chose 10 minutes just for extra safety.

SELECT da= te_trunc('hour', now()+'1h 10min'::interval)::timestamp AS = switch_time\gset

Next comes the actual change:=

BEGIN;

-- rename the exi= sting table
ALTER TABLE my.table RENAME TO table_old;

-- drop triggers. We will recreate them later.
DROP TRIGGE= R ... ON my.table_old;
DROP TRIGGER ...;

-- create par= titioned table
CREATE TABLE my.table (
=C2=A0 =C2=A0 LIKE my.t= able_old
=C2=A0 =C2=A0 INCLUDING DEFAULTS
=C2=A0 =C2=A0 INCLUDING CON= STRAINTS
)
PARTITION BY RANGE (transaction_time);

<= /div>
-- recreate triggers
CREATE TRIGGER ...
BEFORE DELET= E ON my.table
=C2=A0 =C2=A0FOR EACH ROW EXECUTE FUNCTION my,trigger_fun(= );
CREATE TRIGGER ...;

-- create the= partition for future rows
CREATE TABLE my.table_current PARTITIO= N OF my.table
=C2=A0 FOR VALUES FROM (:'switch_time') TO ('i= nfinity');

-- and some indexes and FK
ALTER TABLE my.table_current ADD PRIMARY KEY (id);
CREATE INDEX ON= my.table_current (...);
ALTER TABLE my.table_current
=C2= =A0 ADD CONSTRAINT fk_name
=C2=A0 =C2=A0 =C2=A0 FOREIGN KEY (...)
=C2= =A0 =C2=A0 =C2=A0 REFERENCES ...(...) ON UPDATE ... ON DELETE ...;

-- now attach the old table as partition
ALT= ER TABLE my.table_old
ADD CONSTRAINT partition_boundaries
CHECK('= -infinity'<=3Dtransaction_time AND transaction_time<:'switch_= time')
=C2=A0 NOT VALID;

-- for procedural reasons we know th= e constraint is valid. Let's make PG believe it too.
UPDATE p= g_constraint
=C2=A0 =C2=A0SET convalidated=3Dtrue
=C2=A0WHERE conname= =3D'partition_boundaries'
=C2=A0 =C2=A0AND conrelid=3D'my.ta= ble_old'::REGCLASS
RETURNING conname, conrelid::REGCLASS, con= validated;

-- now attach it. We use again debug4 to check if the tab= le scan is avoided.
SET LOCAL client_min_messages =3D 'debug4';<= br>-- expecting
-- DEBUG: =C2=A0partition constraint for table "tab= le_old" is implied by existing constraints
ALTER TABLE my.table ATT= ACH PARTITION my.table_old
=C2=A0 FOR VALUES FROM ('-infinity') = TO (:'switch_time');
RESET client_min_messages;

-- drop t= he now unnecessary constraint
ALTER TABLE my.table_old
=C2=A0DROP CON= STRAINT partition_boundaries;

COMMIT;

Once the new partition gets too full, we will use a simila= r procedure to adjust the partition boundary of the new partition and then = create the next partition.
--0000000000009c11610627e3ca82--