Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vb2mp-000iwL-0v for pgsql-hackers@arkaria.postgresql.org; Wed, 31 Dec 2025 20:27:16 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vb2mm-007EaB-0A for pgsql-hackers@arkaria.postgresql.org; Wed, 31 Dec 2025 20:27:12 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vb2ml-007Ea2-25 for pgsql-hackers@lists.postgresql.org; Wed, 31 Dec 2025 20:27:12 +0000 Received: from mail-lf1-x12b.google.com ([2a00:1450:4864:20::12b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vb2mi-003pDC-1f for pgsql-hackers@lists.postgresql.org; Wed, 31 Dec 2025 20:27:10 +0000 Received: by mail-lf1-x12b.google.com with SMTP id 2adb3069b0e04-5959187c5a9so9494193e87.1 for ; Wed, 31 Dec 2025 12:27:07 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1767212827; x=1767817627; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oc5hALz42WF2S/QSA7dtDH7Azy9BMoXhVbmSZlAFsl4=; b=KapRXCtLqKtwP5m5EjhSFKjN0k9KFTMjn2N0LrL83rHzzbyBXv3cte1BK5P7124LeR wcfokAFipgJ3grnqHas/w7kCjD+Y+Esu9ezSHWeWVd39TckjQUlvpSNweEkq7kLkORM9 EKWGDMIIVkwL/QLCj2qF5m0asVuYLuSy+PZfui0O4eWvNwryZ5p1e4csIlE/gzofGgdX VNKoE5MCqorp/wI6+2vOWikWIDxAIwaF6k3sXlx5TjssRmwZuRghN7WzHbsYTtEgPhi2 On1w0Sy7t3JDCcLUUYsezPt16z8y4oh3+GuSbagYvFdXzquP20jyYnZHXH88lAK7rR1S VLUQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1767212827; x=1767817627; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=oc5hALz42WF2S/QSA7dtDH7Azy9BMoXhVbmSZlAFsl4=; b=jQH+61IARb+voucI6NjqMvZOP03x+4rPPseUC57ha1zfG8WMc+uPRjtqgfV8yPMLo9 vegdxb8kfHHZHMRKv3pddGokqXYh45jSnZ6JA3qSfsEzw8wrvZcPL8h4ju7rmWz3mePs OrSNdFSZJ5XYYdun7mk3/VkbBiDt1/JHVDQ0RJpi8bqVqKxotbMWycLn6LO1tn+oFgwZ RFbm36vSdqmNpbdpDUndyH93IZyiQi4ys3BbjwyOTJpcQmhUWBYeEhNL0hXfUUfsBYB2 NwVHTk5yHOom/RluHkGFauyeaqMLMNbVibxSZtTP2EggxE6e0f6OR4ghzNX4DDIrSv5d pvrg== X-Gm-Message-State: AOJu0YxoEJ68VgWYcAEQcDmQTEF7vnLa8tWx3CNMl212I8WHtgDAGdf8 RsgyhqZdY3XGNqxChFYR3Ln5jKkCAdZMz6Rm8iJc1n/ObBzHGx/7eFNM/dufvjbnhvUn8FItuph XYjp25MVgqN0SMeaXzGJOX2uBlwGxZis= X-Gm-Gg: AY/fxX4MvWG1wcwisU7NKVIjfDsJWn0S6ngEs8BzHbDv2vaTfYYdmmTOfga63qavWi4 h5b2TRbu2DPofDCHn113MldeKd2rfowaGpxlAOwvhGNBYF2mx4aEt8By/28u91+Al1LX1qz7ULd q/1aNjsjHpdOaRssUxtyCPJqX/SOHQvsK6TLUZZBXgs/SrwLO2mqYGqtsUAEMMKYCyN2h21iw5s HwbsMGpAfFH3Zgs0CbE5CoSqa/214lybokh+PkjbC4xko+/m3C78nk+MZF8tN2kqKqlFmOf5vG9 mDLE6GebO8WY27ERI0nmM0aFATxNj8fcw89Umqc= X-Google-Smtp-Source: AGHT+IHbIGbjpOMloSQAwYoV8mWxTIzhNUZloswVFd2HfgTrt1MVrk74IpWAGVqhwcXcfjR3BbYOayvWB5YFay/fDPo= X-Received: by 2002:a05:6512:104f:b0:59a:1240:dec3 with SMTP id 2adb3069b0e04-59a17de0950mr12871001e87.52.1767212826983; Wed, 31 Dec 2025 12:27:06 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: Marcos Magueta Date: Wed, 31 Dec 2025 17:26:55 -0300 X-Gm-Features: AQt7F2rWdYDyGQxRMNz8McpkdOKjTGqeXX6OzQO3rTsDfBE8IKcdyN-P_zBwQzA Message-ID: Subject: Re: WIP - xmlvalidate implementation from TODO list To: Kirill Reshke Cc: pgsql-hackers@lists.postgresql.org Content-Type: multipart/alternative; boundary="000000000000cb810c0647454cd8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000cb810c0647454cd8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello again! Is there any interest in this? I understand PostgreSQL has bigger fish to fry, but I would like to at least know; in case this was just forgotten. Regards! Em sex., 19 de dez. de 2025 =C3=A0s 00:25, Marcos Magueta < maguetamarcos@gmail.com> escreveu: > Hello again! > > I took some time to actually finish this feature. I think the answers > for the previous questions are now clearer. I checked the > initialization and the protections are indeed in place since commit > a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specifically states: > > > Prevent access to external files/URLs via XML entity references. > > > xml_parse() would attempt to fetch external files or URLs as needed to > > resolve DTD and entity references in an XML value, thus allowing > > unprivileged database users to attempt to fetch data with the privilege= s > > of the database server. While the external data wouldn't get returned > > directly to the user, portions of it could be exposed in error messages > > if the data didn't parse as valid XML; and in any case the mere ability > > to check existence of a file might be useful to an attacker. > > > > The ideal solution to this would still allow fetching of references tha= t > > are listed in the host system's XML catalogs, so that documents can be > > validated according to installed DTDs. However, doing that with the > > available libxml2 APIs appears complex and error-prone, so we're not > going > > to risk it in a security patch that necessarily hasn't gotten wide > review. > > So this patch merely shuts off all access, causing any external fetch t= o > > silently expand to an empty string. A future patch may improve this. > > With that, the obvious affordance on the xmlvalidate implementation > was to not rely on external schema sources on the host > catalog. Therefore the implementation relies solely on expressions > that necessarily evaluate to a schema in plain text. > > I added the requested documentation and a bunch of tests for each > scenario. I would appreciate another round of reviews whenever someone > has the time and patience. > > At last, to nourish the curiosity: I had issues with make check, as > stated above on the e-mail thread. These got resolved when I changed > `execl` to `execlp` on `pg_regress.c`. I of course did not commit > such, but more people I know have had the very same issue while > relying on immutable package managers. > --000000000000cb810c0647454cd8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello again!

Is there any interest = in this? I understand PostgreSQL has bigger fish to fry, but I would like t= o at least know; in case this was just forgotten.

Regards!
Em sex., 19 de dez. de 2025 =C3=A0s 00:25, Marcos Magueta= <maguetamarcos@gmail.com= > escreveu:
<= div dir=3D"ltr">Hello again!

I took some time to actually finish thi= s feature. I think the answers
for the previous questions are now cleare= r. I checked the
initialization and the protections are indeed in place = since commit
a4b0c0aaf093a015bebe83a24c183e10a66c8c39, which specificall= y states:

> Prevent access to external files/URLs via XML entity = references.

> xml_parse() would attempt to fetch external files o= r URLs as needed to
> resolve DTD and entity references in an XML val= ue, thus allowing
> unprivileged database users to attempt to fetch d= ata with the privileges
> of the database server.=C2=A0 While the ext= ernal data wouldn't get returned
> directly to the user, portions= of it could be exposed in error messages
> if the data didn't pa= rse as valid XML; and in any case the mere ability
> to check existen= ce of a file might be useful to an attacker.
>
> The ideal sol= ution to this would still allow fetching of references that
> are lis= ted in the host system's XML catalogs, so that documents can be
>= validated according to installed DTDs.=C2=A0 However, doing that with the<= br>> available libxml2 APIs appears complex and error-prone, so we'r= e not going
> to risk it in a security patch that necessarily hasn= 9;t gotten wide review.
> So this patch merely shuts off all access, = causing any external fetch to
> silently expand to an empty string.= =C2=A0 A future patch may improve this.

With that, the obvious affor= dance on the xmlvalidate implementation
was to not rely on external sche= ma sources on the host
catalog. Therefore the implementation relies sole= ly on expressions
that necessarily evaluate to a schema in plain text.
I added the requested documentation and a bunch of tests for each
= scenario. I would appreciate another round of reviews whenever someone
h= as the time and patience.

At last, to nourish the curiosity: I had i= ssues with make check, as
stated above on the e-mail thread. These got r= esolved when I changed
`execl` to `execlp` on `pg_regress.c`. I of cours= e did not commit
such, but more people I know have had the very same iss= ue while
relying on immutable package managers.
--000000000000cb810c0647454cd8--