Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vcoiS-00B8np-11 for pgsql-hackers@arkaria.postgresql.org; Mon, 05 Jan 2026 17:50:05 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vcoiQ-003EAU-1u for pgsql-hackers@arkaria.postgresql.org; Mon, 05 Jan 2026 17:50:03 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vcoiQ-003EAI-0n for pgsql-hackers@lists.postgresql.org; Mon, 05 Jan 2026 17:50:03 +0000 Received: from mail-lf1-x12e.google.com ([2a00:1450:4864:20::12e]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vcoiO-004ixn-25 for pgsql-hackers@lists.postgresql.org; Mon, 05 Jan 2026 17:50:02 +0000 Received: by mail-lf1-x12e.google.com with SMTP id 2adb3069b0e04-59a10ef758aso97702e87.2 for ; Mon, 05 Jan 2026 09:50:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767635399; x=1768240199; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=xA1AXQYHqsq0VqChxuW8j4JjsL7gEUPNqVxCE2Rfu7w=; b=iVKLpH4lpRK0hJjgng51qISzAiCbWlf6E5EpYHRSiRPMAOtS/gAkJU+t+haLKTervg bjxvDTz7tzRHfI1SA9FXkeOTMiGrLpyUd4nJquOCOm71v1ytWlbdPtnInl7UYFkTNRo0 4Ba+yCPTkVQhtsgpbilz9Qs3MCnj/FIrFz/gFFtOVfBJGY9HbqNRtEySYF//oFBSIbhM CpjD5xWSM0ent/2OPXyEd0j/XuSTq83hcaq3Ij5hFoCez9MAL3KaGxgtnEsMrM6pmIcP yH2oUcc3uDLePdcez0WgeWFmWaN80ntM2MeFQExAWP77SPullVM+E7YnlSXm6eILGyFF 2s2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767635399; x=1768240199; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=xA1AXQYHqsq0VqChxuW8j4JjsL7gEUPNqVxCE2Rfu7w=; b=hbXa4cvMRvYdrSJnN1Pb2pMyH/fWm46eGMIUYmIVxDXCN5+Cz8GT/Trbwpdq/9gwvk g9niMDc1la5JeH88qbVZsPyX64473InOTnbBIwvxCCvQQ9/X11q0f4kRbobN7/JRXNOu aAxbnp4PNNt16HHXXOLqUgKFZ69u5Tfmm0PByNaw6N2mhqZkbYHiIpeSh+n1/M4d1geS kuSja0Vuo2Ooq6VB5iSOYsY5rMSq0geXjLKYOMeW1sPGG4+tFfNNZaiB2poRMwGUU822 go6bFpp/flQNWegvETDFBNofctCz9jyzQvMAefYldNgH1Y7mFsOnU3o0yWTLrTOuc75N DyCA== X-Forwarded-Encrypted: i=1; AJvYcCXt2212y2vv7kBBtNuKIyHCFy4vwegRHYkMFrSSZZ0ShzuUW/Kbs+t1M6RiZr4xtbK5UXRHT6iByzOsGenU@lists.postgresql.org X-Gm-Message-State: AOJu0YzrRe99Vrvo59bx0Fpx7mYQc8JOQJyupEieJ0g2YeG90R260TAH mb+tH2pabPROpm756QUrxBUt1spon3OGIPqVgJJWvbH6/vFMBRUvZzgz7I636vPDqQbs97p3oUn l5OrxGbM3AbfFtX4Bjs/+qWq50sJgPYc= X-Gm-Gg: AY/fxX6IcGgFHBCsCIsScBHAppji1uKl/htKGtTE2QOY6u+OmngiZE3KPKomlUqADkC Muj3xjOkZhgbUXhR/MIP30yXW/0LzXvzr6H8ZvLBCZ9PNf++bu2x7Z0YlAcPjjJiP5csd2SToPf NTmPz6nQZlVaWP+EMWv+JCJj+WoQtjr9wchx2PAsR7Z4zBS0VpqT8C4H7o4vMr+uo1sAwhZoFlO q9m531FstcElVxETvFBFw9el/JLRmx9ijZu9wGMbowpRT/ULQjfDA4yED4zdBD4z8MwOqc/E+hZ dfWw3luIeAa3teju4Q38vVOtQWtF X-Google-Smtp-Source: AGHT+IGw77l/oMbojZaTOJLOInCBwKV6h8Inuz1BggbLRj86W1GvE5b7hG2BpPnehMlg0/6bZ3sKkCWYD8C9TtQXNp0= X-Received: by 2002:a05:6512:1189:b0:594:51bd:8fcd with SMTP id 2adb3069b0e04-59b65290209mr173635e87.16.1767635399123; Mon, 05 Jan 2026 09:49:59 -0800 (PST) MIME-Version: 1.0 References: <89DE974B-F318-4D0A-A60B-51EDE84054E2@gmail.com> <9A074422-2308-4BD0-9FFA-0B6D70989935@yandex-team.ru> <70c72cb1-a39f-41b3-bfe3-e32ee7fda9c4@uni-muenster.de> In-Reply-To: <70c72cb1-a39f-41b3-bfe3-e32ee7fda9c4@uni-muenster.de> From: Marcos Magueta Date: Mon, 5 Jan 2026 14:49:47 -0300 X-Gm-Features: AQt7F2oxC0suve0LuBPNSK5h-XGlUmq5t15_KL5c6KspueLPGdw1o7A_SVNyDxI Message-ID: Subject: Re: WIP - xmlvalidate implementation from TODO list To: Jim Jones Cc: Andrey Borodin , Kirill Reshke , PostgreSQL Hackers Content-Type: multipart/alternative; boundary="0000000000000ea6020647a7b0a8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000000ea6020647a7b0a8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you all for the careful review! I'll go through the topics to fix the test and code changes today, but I have a couple of questions about a catalog. If we were to implement a catalog, I believe it would be either copying an insert to a specified relation (created on demand) or to something in the catalog, like pg_xmlschema. That could be a realistic change I could work on. But what about the privilege level and file fetch support? I believe it's not really an issue if the user is sufficiently privileged, so should it mirror COPY FROM? I haven't seen its implementation, but I suppose it already has security checks at the user privilege level. A valid alternative to not deal with privileges and to leave the same restrictions already in place to fetch arbitrary extensions to a specified schema; in that way we are just moving the schema definition to another command before being invoked and ignoring if it has any references outside of the plain text specified (therefore, not using file://, like IBM, just text). Surprisingly, the standard (I only have the 2016 here) leaves a great room for freedom on how to implement the registration. It just specifies what it should have: An XML namespace NS contained in a registered XML Schema is non-deterministic if NS contains a global element declaration schema component that is non-deterministic. A registered XML Schema is non-deterministic if it contains a non-deterministic XML namespace. A registered XML Schema is described by a registered XML Schema descriptor. A registered XML Schema descriptor includes: =E2=80=94 The target namespace URI of the registered XML Schema. =E2=80=94 The schema location URI of the registered XML Schema. =E2=80=94 The of the registered XML Schema. =E2=80=94 An indication of whether the registered XML Schema is permanently registered. =E2=80=94 An indication of whether the registered XML Schema is non-determi= nistic. =E2=80=94 An unordered collection of the namespaces defined by the register= ed XML Schema (the target namespace is one of these namespaces). =E2=80=94 For each namespace defined by the registered XML Schema, an unord= ered collection of the global element declaration schema components in that namespace, with an indication for each global element declaration schema component whether that global element declaration schema component is non-deterministic. NOTE 9 =E2=80=94 Without Feature X161, =E2=80=9CAdvanced Information Schema= for registered XML Schemas=E2=80=9D, information whether an XML Schema is deterministic, information about the collection of namespaces defined in that XML Schema, and, for each such namespace information about the global element declaration schema components in that namespace, is not available in the XML_SCHEMAS, XML_SCHEMA_NAMESPACES, and XML_SCHEMA_ELEMENTS views. A registered XML Schema is identified by its . I am tempted to go with a pg_xmlschema definition on the catalog and an interface like the one IBM has, but still restricting file access. Dealing with the security problems for that sounds excruciating. Any opinions? Regards, Magueta. --0000000000000ea6020647a7b0a8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you all for the careful review!
I'll go through the topics to fix the test and code changes tod= ay, but I have a couple of=C2=A0questions about a catalog.

If = we were to implement a catalog, I believe it would be either copying an ins= ert to a=C2=A0specified=C2=A0relation (created on demand) or to something i= n the catalog, like pg_xmlschema. That could be a realistic change I could = work on. But what about the privilege level and file fetch support? I belie= ve it's not really an issue if the user is sufficiently privileged, so = should it mirror COPY FROM? I haven't seen its implementation, but I su= ppose it already has security checks at the user privilege level. A valid a= lternative to not deal with privileges and to leave the same restrictions a= lready in place to fetch arbitrary extensions to a specified schema; in tha= t way we are just moving the schema definition to another command before be= ing invoked and ignoring if it has any references outside of the plain text= specified (therefore, not using file://, like IBM, just text).

Surprisingly, the standard (I only have the 2016 here) leaves a grea= t room for freedom on how to implement the registration. It just specifies = what it should have:

An XML namespac= e NS contained in a registered XML Schema is non-deterministic if NS contai= ns a global
element declaration schema component that is non-determinist= ic.
A registered XML Schema is non-deterministic if it contains a non-de= terministic XML namespace.
A registered XML Schema is described by a reg= istered XML Schema descriptor. A registered XML Schema
descriptor includ= es:
=E2=80=94 The target namespace URI of the registered XML Schema.
= =E2=80=94 The schema location URI of the registered XML Schema.
=E2=80= =94 The <registered XML Schema name> of the registered XML Schema.=E2=80=94 An indication of whether the registered XML Schema is permanentl= y registered.
=E2=80=94 An indication of whether the registered XML Sche= ma is non-deterministic.
=E2=80=94 An unordered collection of the namesp= aces defined by the registered XML Schema (the target namespace
is one o= f these namespaces).
=E2=80=94 For each namespace defined by the registe= red XML Schema, an unordered collection of the global element
declaratio= n schema components in that namespace, with an indication for each global e= lement declaration
schema component whether that global element declarat= ion schema component is non-deterministic.
NOTE 9 =E2=80=94 Without Feat= ure X161, =E2=80=9CAdvanced Information Schema for registered XML Schemas= =E2=80=9D, information whether an XML
Schema is deterministic, informati= on about the collection of namespaces defined in that XML Schema, and, for = each such namespace
information about the global element declaration sch= ema components in that namespace, is not available in the XML_SCHEMAS,
X= ML_SCHEMA_NAMESPACES, and XML_SCHEMA_ELEMENTS views.
A registered XML Sc= hema is identified by its <registered XML Schema name>.

=
I am tempted to go with a pg_xmlschema definition on the catalog= and an interface like the one IBM has, but still restricting file access. = Dealing with the security problems for that sounds excruciating. Any opinio= ns?

Regards, Magueta.
--0000000000000ea6020647a7b0a8--