Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vSKfn-00Bfp6-0u for pgsql-hackers@arkaria.postgresql.org; Sun, 07 Dec 2025 19:43:59 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vSKfk-00DoVI-2B for pgsql-hackers@arkaria.postgresql.org; Sun, 07 Dec 2025 19:43:56 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vSKfk-00DoV9-13 for pgsql-hackers@lists.postgresql.org; Sun, 07 Dec 2025 19:43:56 +0000 Received: from mail-lj1-x22d.google.com ([2a00:1450:4864:20::22d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vSKfh-003iqH-2H for pgsql-hackers@lists.postgresql.org; Sun, 07 Dec 2025 19:43:56 +0000 Received: by mail-lj1-x22d.google.com with SMTP id 38308e7fff4ca-37a875e3418so28552491fa.1 for ; Sun, 07 Dec 2025 11:43:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1765136632; x=1765741432; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=6mIIbgMCvxYzpLTmBXQJ0aKAfuXkHh80TXiVubw0ebg=; b=QTPt71Sanm/XP2uTOpktDyEd77svriRqihsgm7+mrSi3VeWgCxOhXW9hIdMrUqalw9 pVboSxA/PsPko+EAXqAnQeBRHQQQyBmQTJ29qRv6Xqd6xpLiZazS8rPxXsoy1ouF1V6t fXa3Je+g/tQTrwf8lRcrazj5t30JdH+aRVlZUL1qnQ/UVYZmvFEUZw4LwOW/HfE8+w+3 ScLgVPQB4fAyoAHjm71WMLSJaQdqfN7UCXWK4iLenZXekLG611RlnhN/f4HIAAcAreMH 2qNiiaQiPlzfegzBGa3P0KyepC0jCEG0WcHsCZ24MUtszKaqIKmwAWi0/WemRwuVt/jC pR1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1765136632; x=1765741432; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=6mIIbgMCvxYzpLTmBXQJ0aKAfuXkHh80TXiVubw0ebg=; b=BDDuyBbuahQ4VKa1avieuVi2wcXnj0IxuOhzWhei0Rvhd8a0W4GoERre77mdi9twFJ oVnnmpczAjXZu3DjlrwCpP9vxIgrD5QbBdh9bdBj+mlE69e2zhaxji9wcrVsp5h6vGwB f2TI7oDIH8UDw1F5CnZ0DQz0Ky1lS5lpGYSaX0UMGtQJvgRb+G2iIlVxx4IB9weeq8Mi z93yx7WHX9+wZ1XjPi9EJaAzmmFQv27IHsYCsSDRDMpGeVeuoiv6djo0F/k0ho1GfOll iGqfjfE5qRgxubAayneXPRIuh6SDnfEGqFBP3KpmVzAWAayGQL4Rwc7q+cestBDIqBEq qW7g== X-Gm-Message-State: AOJu0YyNYFSJQKLryyVih0RmcJUzXb2kzWSTiRNJjIrTf6i9BrfNVozh 6BqtIsZ9pTes33KkXQzfhPZCBryM/sy1OBsEMIxx/UPds1fP2JJ89sOS63HPfwIUaXAz7pdkSF6 kv0hFpPlUABEj9/K/Y5sG997M3L8di7/8qhltfH0QWg== X-Gm-Gg: ASbGnctqh8IKnWg3IQO7qSH7w8EsGXFkc7ic7walCgkg4N++mdsBSWoIDoQJNuAA4vB B8NZqvDibUTtClo/TXjr10ph1vVZzGkyTSmceNUTDnCky6zOX3oUWt3x7skPLlc5xaCf8/hymBo JEiKSASzOHi8D5axJShwEFRd8AIh3PQAytWmTLJ4C1nX+jCN9XVxIA0iPhnnrHD4jWHw7221BZd DG08Q1PPPXoOszaxl6VQDfw+CZ5l82lZCiSrz048dBSj1fsukAEyUElO4WgIUxGRlWDkM5+XBgV UtFVy9mR3Rqk5A01K6KYGzaOBaU= X-Google-Smtp-Source: AGHT+IEY0rsfzEDfrdFmxzWvpeSeGwMEzl35oRwiWC7EBFz9l4tqCU3iACmTUPiw/dzP8PNQcPvk0km/dUdkOvuCxVs= X-Received: by 2002:a05:6512:2390:b0:595:7f66:b6a with SMTP id 2adb3069b0e04-598853c4d87mr1142141e87.35.1765136631381; Sun, 07 Dec 2025 11:43:51 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Marcos Magueta Date: Sun, 7 Dec 2025 16:43:40 -0300 X-Gm-Features: AQt7F2re8TlkOZ19I1d71fHYBQGNyFmiTzKekLGGoCJ5bb9zRQMcn-5bKJMbGuU Message-ID: Subject: Re: WIP - xmlvalidate implementation from TODO list To: Kirill Reshke Cc: pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000e4c2a3064561e575" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000e4c2a3064561e575 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thank you for your kind review! Before I continue with the implementation, I would like to address your concerns and discuss it further and see if it's worth carrying on. 1) Will do! 2) The issue is the following: ``` ./../../src/test/regress/pg_regress --temp-instance=3D./tmp_check --inputdir=3D. --bindir=3D --dlpath=3D. --max-concurrent-tests=3D20 --schedule=3D./parallel_schedule # +++ regress check in src/test/regress +++ # initializing database system by copying initdb template # could not exec "sh": No such file or directory Bail out!# postmaster failed, examine ``` This is likely due to my setup on Nix. If any command assumes paths on conventional *nix I am often in trouble. I am checking that with a friend, but any insights are welcome. The out file I generated was while being completely blindfolded. By the way, the diff you sent is assuming global paths for some reason, so I couldn't apply it without manually changing them. 3) From what I understand, that refers to ISO/IEC 9075-14:2016, chapter and section 11.6, page 245: - Parts of the grammar that reference an URI: ::=3D | - NO NAMESPACE, which unspecifies the qualification but can access something through a LOCATI ON - ID which should allow access to a registered schema So they amount to: ACCORDING TO XMLSCHEMA URI [LOCATION ] ACCORDING TO XMLSCHEMA NO NAMESPACE [LOCATION ] ACCORDING TO XMLSCHEMA ID What I did is rely on the protection mechanisms that are already implemented to just side-step the issue of arbitrary retrieval. `PgXmlErrorContext * pg_xml_init(PgXmlStrictness strictness)` starts the xml error context preventing any attempt by libxml2 to load an external entity (DTD, XSD from URL, local file, etc.) returns an empty string instead. Check around the line 1420 of xml.c: ``` /* * Also, install an entity loader to prevent unwanted fetches of external * files and URLs. */ errcxt->saved_entityfunc =3D xmlGetExternalEntityLoader(); xmlSetExternalEntityLoader(xmlPgEntityLoader); ``` So since I am relying on a TEXT for the schema, there should be no issues of that sort. It does however cut part of the grammar that handles locations, which is part of the standard, and would require this feature to be much bigger in scope... 4) I suppose you refer to doc/src/sgml/func/func-xml.sgml. Will do 5) Hmm my intent was to simply handle TEXT on the xmlschema portion, so the expr rule on that side is indeed an oversight. Now about the first argument, that is just following the pattern already specified in other xml functions, like XMLPARSE and XMLSERIALIZE, which have the same specified in the grammar. So that might have alraedy diverged from the standard a while back... This is currently grammatically valid, for example: ``` select xmlparse(DOCUMENT (select oid from pg_class limit 1)); ERROR: invalid XML document DETAIL: line 1: Start tag expected, '<' not found 2619 ^ ``` As a summary, it does not fully implement schemas as first-class objects, as that would require extra parts of the grammar specified in the standard, so I capitulated to use schemas as provided text. That's already safe in my understanding given the shielding in place. If we are to implement the rest, I think other serious concerns would arise, like role management, how to store schemas, etc. And when it comes to that, is it worth all the trouble just for xml? I would like this feature and I think the solution of relying on text is decent, since the cost of complying 100% seems very high for low returns. Em dom., 7 de dez. de 2025 =C3=A0s 04:34, Kirill Reshke escreveu: > On Sun, 7 Dec 2025 at 04:38, Marcos Magueta > wrote: > > > > Hello! > > > > I am likely one of the few people still using XML, but I noticed XSD > schema validation is still a TODO on postgresql, which I have some person= al > use cases for. > > > > In this patch I attempt to implement XMLVALIDATE with the already in us= e > libxml following a version I got my hands on of the SQL/XML standard of > 2016. > > > > In short, I had to add the ACCORDING word to comply with it and > completely ignored the version that was already in the src that fetches > arbitrary schemas (it refers to validations of dtds, which is another mor= e > troublesome TODO). > > > > I had problems running the regression tests on my machine, so I could > only test the feature by spawning a modified instance of postgresql and > issuing queries through psql, therefore I am marking it as WIP. If anyone > can assert the tests pass, I would be glad. > > > > Also, this is my first patch, so I might have not followed standard > practices as best as I could, so please pay particular attention to that = on > review. > > > > Cheers, > > Marcos Magueta. > > HI! > > 1) > > + // Default case since nothing got returned > > + // out of the normal path for validation calls to libxml > > PostgreSQL uses /**/ comments style. > > 2) > XML regression test suite fails, see attached. By the way, what are > your issues with running `make check` ? > > 3) > By the way, in [0] we have this > > ` > The function in PostgreSQL produces an =E2=80=9Cunimplemented=E2=80=9D er= ror, because > PostgreSQL does not have any implementation of the mechanism assumed > in the standard for registering schemas in advance, which is necessary > to address the security implications of a function that could refer to > schemas from arbitrary locations. > ` > > How does your patch resolve this? I did not find any change in this area > > 4) > Also I want to mention that without a doc, the patch is not in a > commitable shape. > > 5) I am a bit surprised by this grammar rule: > > > XMLVALIDATE '(' document_or_content a_expr ACCORDING TO XMLSCHEMA > a_expr ')' > > this allow a wide class of expressions accepted by parser, like > > `SELECT xmlvalidate(DOCUMENT (select oid from pg_class) ACCORDING TO > XMLSCHEMA (select 32)) AS is_valid FROM xml_validation_test;` > > Is this expected? a_expr is way more than string constants and column > references.. If yes, the regression test that you added, does not > test this.. > > > p.s. I failed to find in google SQL/XML standard of 2016. So, I cannot > double-check if this feature is compliant with it... > > [0] https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL/XML_Standards > > -- > Best regards, > Kirill Reshke > --000000000000e4c2a3064561e575 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thank you for your kind review!

Before I continue w= ith the implementation, I would like to address
your concerns and discus= s it further and see if it's worth carrying
on.

1) Will do!
2) The issue is the following:
```
./../../src/test/regress/pg_= regress --temp-instance=3D./tmp_check --inputdir=3D. --bindir=3D =C2=A0 =C2= =A0 --dlpath=3D. --max-concurrent-tests=3D20 =C2=A0--schedule=3D./parallel_= schedule =C2=A0
# +++ regress check in src/test/regress +++
# initial= izing database system by copying initdb template
# could not exec "= sh": No such file or directory
Bail out!# postmaster failed, examin= e
```

This is likely due to my setup on Nix. If any command assum= es paths on
conventional *nix I am often in trouble. I am checking that = with a
friend, but any insights are welcome. The out file I generated wa= s
while being completely blindfolded.

By the way, the diff you se= nt is assuming global paths for some
reason, so I couldn't apply it = without manually changing them.

3) From what I understand, that refe= rs to ISO/IEC 9075-14:2016, chapter and section 11.6, page 245:
=C2=A0- = Parts of the grammar that reference an URI: <XML valid according to what= > ::=3D <XML valid according to URI> | <XML valid according to = identifier>
=C2=A0- NO NAMESPACE, which unspecifies the qualification= but can access something through a LOCATI ON
=C2=A0- ID which should al= low access to a registered schema
So they amount to:
=C2=A0 ACCORDING= TO XMLSCHEMA URI <uri> [LOCATION <uri>]
=C2=A0 ACCORDING TO= XMLSCHEMA NO NAMESPACE [LOCATION <uri>]
=C2=A0 ACCORDING TO XMLSC= HEMA ID <registered_schema_name>

What I did is rely on the pro= tection mechanisms that are already
implemented to just side-step the is= sue of arbitrary retrieval.

`PgXmlErrorContext * pg_xml_init(PgXmlSt= rictness strictness)` starts
the xml error context preventing any attemp= t by libxml2 to load an
external entity (DTD, XSD from URL, local file, = etc.) returns an empty
string instead. Check around the line 1420 of xml= .c:

```
/*
=C2=A0* Also, install an entity loader to prevent u= nwanted fetches of external
=C2=A0* files and URLs.
=C2=A0*/
errcx= t->saved_entityfunc =3D xmlGetExternalEntityLoader();
xmlSetExternalE= ntityLoader(xmlPgEntityLoader);
```

So since I am relying on a TE= XT for the schema, there should be no
issues of that sort. It does howev= er cut part of the grammar that
handles locations, which is part of the = standard, and would require
this feature to be much bigger in scope...
4) I suppose you refer to doc/src/sgml/func/func-xml.sgml. Will do
5) Hmm my intent was to simply handle TEXT on the xmlschema portion,so the expr rule on that side is indeed an oversight. Now about the
fi= rst argument, that is just following the pattern already specified
in ot= her xml functions, like XMLPARSE and XMLSERIALIZE, which have the
same &= lt;XML Value Expression> specified in the grammar. So that might
have= alraedy diverged from the standard a while back...

This is currentl= y grammatically valid, for example:
```
select xmlparse(DOCUMENT =C2= =A0(select oid from pg_class limit 1));
ERROR: =C2=A0invalid XML documen= t
DETAIL: =C2=A0line 1: Start tag expected, '<' not found
= 2619
^
```

As a summary, it does not fully implement schemas a= s first-class
objects, as that would require extra parts of the grammar = specified in
the standard, so I capitulated to use schemas as provided t= ext. That's
already safe in my understanding given the shielding in = place. If we
are to implement the rest, I think other serious concerns w= ould arise,
like role management, how to store schemas, etc. And when it= comes
to that, is it worth all the trouble just for xml? I would like t= his
feature and I think the solution of relying on text is decent, since=
the cost of complying 100% seems very high for low returns.

Em dom., 7 de dez. de 2025 =C3=A0s 04:34, Kirill Reshke <reshkekirill@gmail.com> escreve= u:
On Sun, 7 Dec= 2025 at 04:38, Marcos Magueta <maguetamarcos@gmail.com> wrote:
>
> Hello!
>
> I am likely one of the few people still using XML, but I noticed XSD s= chema validation is still a TODO on postgresql, which I have some personal = use cases for.
>
> In this patch I attempt to implement XMLVALIDATE with the already in u= se libxml following a version I got my hands on of the SQL/XML standard of = 2016.
>
> In short, I had to add the ACCORDING word to comply with it and comple= tely ignored the version that was already in the src that fetches arbitrary= schemas (it refers to validations of dtds, which is another more troubleso= me TODO).
>
> I had problems running the regression tests on my machine, so I could = only test the feature by spawning a modified instance of postgresql and iss= uing queries through psql, therefore I am marking it as WIP. If anyone can = assert the tests pass, I would be glad.
>
> Also, this is my first patch, so I might have not followed standard pr= actices as best as I could, so please pay particular attention to that on r= eview.
>
> Cheers,
> Marcos Magueta.

HI!

1)
> + // Default case since nothing got returned
> + // out of the normal path for validation calls to libxml

PostgreSQL uses /**/ comments style.

2)
XML regression test suite fails, see attached. By the way, what are
your issues with running `make check` ?

3)
By the way, in [0] we have this

`
The function in PostgreSQL produces an =E2=80=9Cunimplemented=E2=80=9D erro= r, because
PostgreSQL does not have any implementation of the mechanism assumed
in the standard for registering schemas in advance, which is necessary
to address the security implications of a function that could refer to
schemas from arbitrary locations.
`

How does your patch resolve this? I did not find any change in this area
4)
Also I want to mention that without a doc, the patch is not in a
commitable shape.

5) I am a bit surprised by this grammar rule:

>=C2=A0 XMLVALIDATE '(' document_or_content a_expr ACCORDING TO = XMLSCHEMA a_expr ')'

this allow a wide class of expressions accepted by parser, like

`SELECT xmlvalidate(DOCUMENT=C2=A0 (select oid from pg_class) ACCORDING TO<= br> XMLSCHEMA (select 32)) AS is_valid FROM xml_validation_test;`

Is this expected? a_expr is way more than string constants and column
references..=C2=A0 If yes, the regression test that you added, does not
test this..


p.s. I failed to find in google SQL/XML standard of 2016. So, I cannot
double-check if this feature is compliant with it...

[0] https://wiki.postgresql.org/wiki= /PostgreSQL_vs_SQL/XML_Standards

--
Best regards,
Kirill Reshke
--000000000000e4c2a3064561e575--