Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tDfdu-002QOL-8a for pgsql-general@arkaria.postgresql.org; Wed, 20 Nov 2024 08:00:54 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tDfds-003BJZ-V4 for pgsql-general@arkaria.postgresql.org; Wed, 20 Nov 2024 08:00:52 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tCGwO-00E2FO-4S for pgsql-general@lists.postgresql.org; Sat, 16 Nov 2024 11:26:12 +0000 Received: from mail-wr1-x435.google.com ([2a00:1450:4864:20::435]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tCGwJ-002FCL-QY for pgsql-general@lists.postgresql.org; Sat, 16 Nov 2024 11:26:11 +0000 Received: by mail-wr1-x435.google.com with SMTP id ffacd0b85a97d-38207c86695so1901641f8f.2 for ; Sat, 16 Nov 2024 03:26:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731756367; x=1732361167; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=fevsRGGQtdxbuM7qwFjhDV2BplkHleYeLFBa+tWtfd4=; b=Lbymu/NlgwquZAv8eRgi76Pa7uPkQqCMejnYzLgS04cr3ypp2FnAkNaUKcNG9jUO5R zE1fwWRupY/pzmvEomSMhFF8RBNI/FJer9lsTQIG4ZzpQws/X4KDI3ovvgmIh3+Nz3WE 4hrh8oBTaVtrXJmQwnirE/67U5kBrZC5P3/2Uz0Kb4LM4qUQTM/R1doMA30UdckkYcIi Y0Yrq+0fe+9OONkxMWx6b8Al6klIW/rbzXEj0tULqDJOPnU28LUiGAaiBa6CKmn5j+az PqUzxN27+HScfZbir2jkr22RNcV8iY3CzlXvqbWogS7W0FzRSYiwOSrcMmMT4KAk2ZxA Ftcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731756367; x=1732361167; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=fevsRGGQtdxbuM7qwFjhDV2BplkHleYeLFBa+tWtfd4=; b=l62FCWQhKnENiCOmYFVWQ1aqsK5Qwgpt5opSmtfO1jSU6fNddaWNx63/LN13+kkV5b R6H6YKLTMqptAcO2LYf3XFsflmnECWRbNW7fJTPPFLepSuzSKZDmenTGxwJW/GSjzxwq av/iH2FVHioI/RpgtAWXHAoQ3ehMPfdAULKw7m2qxEKvmMYXGU8lPQvj6jnGjYJgtpst Q6DHzUXwoiakkgo7v7jNeWxLBcvee4FNubNOmtsUnUYWyD0oGc4hv6ElExXJWQo9cCfo 8eHzKpCaNkAD1CPPRNLl4yUUQweXyo7Lfg/1g26QV8CXKWhJTtys7zzH3tsgiOD75B70 P0Ig== X-Gm-Message-State: AOJu0Yy9f7CWNq2X4mOmz9JoEvLbOFIM9SzVy4qOZ+k979iuqtYaAlWD wBGtSJucgnyZaL/g1Mppz2YKNQNtTOIhJa2etwE3JgRy0h818auq+sHi0l8A5hOSbl6OeOSs7t1 7KuzxyUgidV73zXqnl+YYm4fwOyY= X-Google-Smtp-Source: AGHT+IEfaXzR+KN596UNPU4ODuq1kDS27sd4NhqBZ5vZvgdofxHlGlczV8mEfbUbS4/2N+KKtaT1MBpIxFuIwQ2FpH4= X-Received: by 2002:a05:6000:3cf:b0:37d:5133:8cba with SMTP id ffacd0b85a97d-3822590682emr4366476f8f.20.1731756366656; Sat, 16 Nov 2024 03:26:06 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Philip Couling Date: Sat, 16 Nov 2024 11:25:55 +0000 Message-ID: Subject: Re: Validating check constraints without a table scan? To: =?UTF-8?Q?Torsten_F=C3=B6rtsch?= Cc: pgsql-general@lists.postgresql.org Content-Type: multipart/alternative; boundary="0000000000001287fe062705f3d5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000001287fe062705f3d5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for the suggestion of directly updating the system catalog, that may be the =E2=80=9Cbest=E2=80=9D option. =E2=80=9C*Most likely your query is not exactly the same as the check const= raint. Think about NULL and similar.*=E2=80=9D No that=E2=80=99s not sufficient to explain it. The SELECT is obviously the negation of the check constraint, so there is a difference there. But besides that, it=E2=80=99s very clear that Postgres does not make use of in= dexes while validating constraints. I suspect it doesn=E2=80=99t involve the quer= y planner at all. On Fri, 15 Nov 2024 at 15:28, Torsten F=C3=B6rtsch wrote: > On Fri, Nov 15, 2024 at 9:38=E2=80=AFAM Philip Couling wrote: > >> Is there a solid reason why adding a check constraint does not use >> existing indexes for validation. >> >> We are currently looking at partitioning a multi TB table leaving all >> existing data in place and simply attaching it as a partition to a new >> table. To prevent locking, we are trying to add an INVALID check constra= int >> first and then validate it. >> >> I can trivially prove the invalid constraint is valid with a simple >> SELECT which will use an existing index and return instantaneously. But >> AFAIK Theres no way to mark a constraint as valid without scanning all t= he >> rows. >> > > Most likely your query is not exactly the same as the check constraint. > Think about NULL and similar. > > >> This operation is really problematic on a production database with heavy >> IO load. >> >> Is there a solid ready why validating check constraints cannot use >> existing indexes? If I can prove the constraint is valid so trivially wi= th >> a SELECT, then why can Postgres not do the same (or similar)? >> > > Here is what has worked for me many times: > > 1. create the check constraint as NOT VALID. From now on no new or update= d > row is allowed to violate it. > 2. check if the constraint holds with a query on a binary replica. Make > sure the query starts only when the constraint is visible on the replica. > 3. on the master: UPDATE pg_constraint SET conisvalidated=3Dtrue WHERE > conname=3D'your_constraint_name' AND conrelid=3D'schema.table'::regclass > > How you perform step 2 is irrelevant. Checking it on a replica would > simply avoid the load on the master. You just need to make sure there is = no > conflicting data in the table. > > WARNING, you need to be damn sure of your data if you do that. But if you > are, it works. > > Here is the procedure how I solved the same problem for some of our > multi-TB tables (PG14): > > The table has a column called transaction_time. We wanted to partition by > that column. For some historical reason the column did not have a NOT NUL= L > constraint. However, there was no way our processes could insert NULL in > that column and there was no row with NULL in that field. So, first was t= o > add the NOT NULL constraint: > > BEGIN; > > ALTER TABLE my.table > ADD CONSTRAINT transaction_time_not_null CHECK(transaction_time IS NOT > NULL) NOT VALID; > > UPDATE pg_constraint > SET convalidated=3Dtrue > WHERE conname =3D 'transaction_time_not_null' > AND conrelid =3D 'my.table'::REGCLASS > RETURNING conname, conrelid::REGCLASS, convalidated; > > COMMIT; > > Now for cosmetic purposes we first turn the check constraint above into a > normal NOT NULL constraint: > > BEGIN; > > SET LOCAL client_min_messages =3D 'debug4'; > -- expecting this message > -- DEBUG: existing constraints on column "table.transaction_time" are > sufficient to prove that it does not contain nulls > ALTER TABLE my.table > ALTER COLUMN transaction_time SET NOT NULL; > RESET client_min_messages; > > ALTER TABLE my.table > DROP CONSTRAINT transaction_time_not_null; > > COMMIT; > > If you set client_min_messages to something like debug4, then the databas= e > tells you if it wants to scan the table or if existing constraints are > sufficient to prove the condition. > > transaction_time in our case is never in the future. Also database > transactions are short-lived. Nothing exceeds 30 seconds. So, I can trust > the effect of any action performed more than 30 seconds ago in the databa= se > is visible. > > So, I set the time after which new rows go to the new partition at least > 10 minutes from now at the next hour boundary. 30 seconds would be good > enough. I chose 10 minutes just for extra safety. > > SELECT date_trunc('hour', now()+'1h 10min'::interval)::timestamp AS > switch_time\gset > > Next comes the actual change: > > BEGIN; > > -- rename the existing table > ALTER TABLE my.table RENAME TO table_old; > > -- drop triggers. We will recreate them later. > DROP TRIGGER ... ON my.table_old; > DROP TRIGGER ...; > > -- create partitioned table > CREATE TABLE my.table ( > LIKE my.table_old > INCLUDING DEFAULTS > INCLUDING CONSTRAINTS > ) > PARTITION BY RANGE (transaction_time); > > -- recreate triggers > CREATE TRIGGER ... > BEFORE DELETE ON my.table > FOR EACH ROW EXECUTE FUNCTION my,trigger_fun(); > CREATE TRIGGER ...; > > -- create the partition for future rows > CREATE TABLE my.table_current PARTITION OF my.table > FOR VALUES FROM (:'switch_time') TO ('infinity'); > > -- and some indexes and FK > ALTER TABLE my.table_current ADD PRIMARY KEY (id); > CREATE INDEX ON my.table_current (...); > ALTER TABLE my.table_current > ADD CONSTRAINT fk_name > FOREIGN KEY (...) > REFERENCES ...(...) ON UPDATE ... ON DELETE ...; > > -- now attach the old table as partition > ALTER TABLE my.table_old > ADD CONSTRAINT partition_boundaries > CHECK('-infinity'<=3Dtransaction_time AND transaction_time<:'switch_time'= ) > NOT VALID; > > -- for procedural reasons we know the constraint is valid. Let's make PG > believe it too. > UPDATE pg_constraint > SET convalidated=3Dtrue > WHERE conname=3D'partition_boundaries' > AND conrelid=3D'my.table_old'::REGCLASS > RETURNING conname, conrelid::REGCLASS, convalidated; > > -- now attach it. We use again debug4 to check if the table scan is > avoided. > SET LOCAL client_min_messages =3D 'debug4'; > -- expecting > -- DEBUG: partition constraint for table "table_old" is implied by > existing constraints > ALTER TABLE my.table ATTACH PARTITION my.table_old > FOR VALUES FROM ('-infinity') TO (:'switch_time'); > RESET client_min_messages; > > -- drop the now unnecessary constraint > ALTER TABLE my.table_old > DROP CONSTRAINT partition_boundaries; > > COMMIT; > > Once the new partition gets too full, we will use a similar procedure to > adjust the partition boundary of the new partition and then create the ne= xt > partition. > --0000000000001287fe062705f3d5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for the suggestion of directly updating the system= catalog, that may be the =E2=80=9Cbest=E2=80=9D option.

=E2=80=9CMost likely your query is not exactly the same as the check constraint. Th= ink about NULL and similar.=E2=80=9D

No that=E2=80= =99s not sufficient to explain it. The SELECT is obviously the negation of = the check constraint, so there is a difference there. But besides that, it= =E2=80=99s very clear that Postgres does not make use of indexes while vali= dating constraints. I suspect it doesn=E2=80=99t involve the query planner = at all.

On Fri, 15 Nov 2024 at 15:28, Torsten F=C3=B6rtsch <tfoertsch123@gmail.com> wrote:
On = Fri, Nov 15, 2024 at 9:38=E2=80=AFAM Philip Couling <couling@gmail.com> wrote:
<= div class=3D"gmail_quote">
Is there a solid reason why adding a check constraint does not use existin= g indexes for validation.

We a= re currently looking at partitioning a multi TB table leaving all existing = data in place and simply attaching it as a partition to a new table. To pre= vent locking, we are trying to add an INVALID check constraint first and th= en validate it.

I can tr= ivially prove the invalid constraint is valid with a simple SELECT which wi= ll use an existing index and return instantaneously. But AFAIK Theres no wa= y to mark a constraint as valid without scanning all the rows.

Most likely your query is not exactly the same as= the check constraint. Think about NULL and similar.
=C2=A0
=
<= div dir=3D"auto">This operation is really problematic on a production datab= ase with heavy IO load.

= Is there a solid ready why validating check constraints cannot use existing= indexes? If I can prove the constraint is valid so trivially with a SELECT= , then why can Postgres not do the same (or similar)?

Here is what has worked for me many times:

<= /div>
1. create the check constraint as NOT VALID. From now on no new o= r updated row is allowed to violate it.
2. check if the constrain= t holds with a query on a binary replica. Make sure the query starts only w= hen the constraint is visible on the replica.
3. on the master: U= PDATE pg_constraint SET=C2=A0conisvalidated=3Dtrue=C2=A0WHERE conname=3D= 9;your_constraint_name' AND conrelid=3D'schema.table'::regclass=

How you perform step 2 is irrelevant. Checking it= on a replica would simply avoid the load on the master. You just need to m= ake sure there is no conflicting data in the table.

WARNING, you need to be damn sure of your data if you do that. But if you= are, it=C2=A0works.

Here is the procedure how I s= olved the same problem for some of our multi-TB tables (PG14):
The table has a column called transaction_time. We wanted to p= artition by that column. For some historical reason the column did not have= a NOT NULL constraint. However, there was no way our=C2=A0processes could = insert NULL in that column and there was no row with NULL in that field. So= , first was to add the NOT NULL constraint:

BEGIN;=

ALTER TABLE my.table
ADD CONSTRAINT transaction_time_not_null CH= ECK(transaction_time IS NOT NULL) NOT VALID;

UPDATE pg_constraint=C2=A0 =C2=A0SET convalidated=3Dtrue
=C2=A0WHERE conname =3D 'trans= action_time_not_null'
=C2=A0 =C2=A0AND conrelid =3D 'my.table= 9;::REGCLASS
RETURNING conname, conrelid::REGCLASS, convalidated;
COMMIT;

Now for cosmetic purposes we first turn t= he check constraint above into a normal NOT NULL constraint:

=
BEGIN;

SET LOCAL client_min_messages =3D 'debug4'= ;
-- expecting this message
-- DEBUG: =C2=A0existing constraints on c= olumn "table.transaction_time" are sufficient to prove that it do= es not contain nulls
ALTER TABLE my.table
=C2=A0 =C2=A0 =C2=A0 ALTER = COLUMN transaction_time SET NOT NULL;
RESET client_min_messages;

= ALTER TABLE my.table
=C2=A0 =C2=A0 =C2=A0 DROP CONSTRAINT transaction_ti= me_not_null;

COMMIT;

= If you set client_min_messages to something like debug4, then the database = tells you if it wants to scan the table or if existing constraints are suff= icient to prove the condition.

transaction_ti= me in our case is never in the future. Also database transactions are short= -lived. Nothing exceeds 30 seconds. So, I can trust the effect of any actio= n performed more than 30 seconds ago in the database is visible.
=
So, I set the time after which new rows go to the new partition a= t least 10 minutes from now at the next hour boundary. 30 seconds would be = good enough. I chose 10 minutes just for extra safety.

=
SELECT date_trunc('hour', now()+'1h 10min'::interval):= :timestamp AS switch_time\gset

Next comes the = actual change:

BEGIN;

-- = rename the existing table
ALTER TABLE my.table RENAME TO table_ol= d;

-- drop triggers. We will recreate them later.
DROP TRIGGER ... ON my.table_old;
DROP TRIGGER ...;

-- create partitioned table
CREATE TABLE my.table (
=C2=A0 = =C2=A0 LIKE my.table_old
=C2=A0 =C2=A0 INCLUDING DEFAULTS
=C2=A0 =C2= =A0 INCLUDING CONSTRAINTS
)
PARTITION BY RANGE (transaction_time);

-- recreate triggers
CREATE TRIGGER ..= .
BEFORE DELETE ON my.table
=C2=A0 =C2=A0FOR EACH ROW EXECUTE FUNCTIO= N my,trigger_fun();
CREATE TRIGGER ...;

<= div>-- create the partition for future rows
CREATE TABLE my.table= _current PARTITION OF my.table
=C2=A0 FOR VALUES FROM (:'switch_time= ') TO ('infinity');

-- and some in= dexes and FK
ALTER TABLE my.table_current ADD PRIMARY KEY (id);CREATE INDEX ON my.table_current (...);
ALTER TABLE my.tabl= e_current
=C2=A0 ADD CONSTRAINT fk_name
=C2=A0 =C2=A0 =C2=A0 FOREIGN = KEY (...)
=C2=A0 =C2=A0 =C2=A0 REFERENCES ...(...) ON UPDATE ... ON DELE= TE ...;

-- now attach the old table as partiti= on
ALTER TABLE my.table_old
ADD CONSTRAINT partition_boundarie= s
CHECK('-infinity'<=3Dtransaction_time AND transaction_time&= lt;:'switch_time')
=C2=A0 NOT VALID;

-- for procedural re= asons we know the constraint is valid. Let's make PG believe it too.
UPDATE pg_constraint
=C2=A0 =C2=A0SET convalidated=3Dtrue
=C2= =A0WHERE conname=3D'partition_boundaries'
=C2=A0 =C2=A0AND conre= lid=3D'my.table_old'::REGCLASS
RETURNING conname, conreli= d::REGCLASS, convalidated;

-- now attach it. We use again debug4 to = check if the table scan is avoided.
SET LOCAL client_min_messages =3D &#= 39;debug4';
-- expecting
-- DEBUG: =C2=A0partition constraint for= table "table_old" is implied by existing constraints
ALTER TA= BLE my.table ATTACH PARTITION my.table_old
=C2=A0 FOR VALUES FROM ('= -infinity') TO (:'switch_time');
RESET client_min_messages;<= br>
-- drop the now unnecessary constraint
ALTER TABLE my.table_old=C2=A0DROP CONSTRAINT partition_boundaries;

= COMMIT;

Once the new partition gets too full, we w= ill use a similar procedure to adjust the partition boundary of the new par= tition and then create the next partition.
--0000000000001287fe062705f3d5--