public inbox for [email protected]  
help / color / mirror / Atom feed
From: Noah Misch <[email protected]>
To: Andrew Dunstan <[email protected]>
Cc: Mahendra Singh Thalor <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: jian he <[email protected]>
Cc: Srinath Reddy <[email protected]>
Cc: [email protected]
Subject: Re: Non-text mode for pg_dumpall
Date: Mon, 21 Jul 2025 17:53:39 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<CAKYtNAoE1xUcMmBq-1qCaZGUDQLfMSVa6hzBVA-PqbmSTzPf5A@mail.gmail.com>
	<CAKYtNAq530UTAtnVs5xfONQ0j6vS-Sys50p5+SNfC7G7_ghCVQ@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<CAKYtNAr-6_CCz+tnJ0T2597Ar4iAZXkE1qPimsO9EY-Kn7LvzQ@mail.gmail.com>
	<[email protected]>
	<CAKYtNAqXTvfAw-y4FHvzprg72cFrC63cge9xMVDO_R4=NHc5rA@mail.gmail.com>
	<CAKYtNAppY=vVtTMdkxRPV48cHKyzLeYF6mrPk15Wczme7e9=ww@mail.gmail.com>
	<[email protected]>

On Mon, Jul 21, 2025 at 04:41:03PM -0400, Andrew Dunstan wrote:
> On 2025-07-17 Th 6:18 AM, Mahendra Singh Thalor wrote
> > > > > > > --- a/src/bin/pg_dump/pg_restore.c
> > > > > > > +++ b/src/bin/pg_dump/pg_restore.c
> > > > > > > +/*
> > > > > > > + * read_one_statement
> > > > > > > + *
> > > > > > > + * This will start reading from passed file pointer using fgetc and read till
> > > > > > > + * semicolon(sql statement terminator for global.dat file)
> > > > > > > + *
> > > > > > > + * EOF is returned if end-of-file input is seen; time to shut down.
> > > > > > What makes it okay to use this particular subset of SQL lexing?
> > > > > To support complex syntax, we used this code from another file.
> > > > I'm hearing that you copied this code from somewhere.  Running
> > > > "git grep 'time to shut down'" suggests you copied it from
> > > > InteractiveBackend().  Is that right?  I do see other similarities between
> > > > read_one_statement() and InteractiveBackend().
> > > > 
> > > > Copying InteractiveBackend() provides negligible assurance that this is the
> > > > right subset of SQL lexing.  Only single-user mode uses InteractiveBackend().
> > > > Single-user mode survives mostly as a last resort for recovering from having
> > > > reached xidStopLimit, is rarely used, and only superusers write queries to it.
> > > Yes, we copied this from InteractiveBackend to read statements from
> > > global.dat file.
> 
> Maybe we should ensure that identifiers with CR or LF are turned into
> Unicode quoted identifiers, so each SQL statement would always only occupy
> one line.

Interesting.  That might work.

> Or just reject role and tablespace names with CR or LF altogether,
> just as we do for database names.

There are other ways to get multi-line statements.  Non-exhaustive list:

- pg_db_role_setting.setconfig
- pg_shdescription.description
- pg_shseclabel.label
- pg_tablespace.spcoptions (if we add a text option in the future)

I think this decision about lexing also ties to other unfinished open item
work of aligning "pg_dumpall -Fd;pg_restore [options]" behavior with "pg_dump
-Fd;pg_restore [options]".  "pg_restore --no-privileges" should not restore
pg_tablespace.spcacl, and "pg_restore --no-comments" should not emit COMMENT
statements.

I suspect this is going to end with a structured dump like we use on the
pg_dump (per-database) side.  It's not an accident that v17 pg_restore doesn't
lex text files to do its job.  pg_dumpall deals with a more-limited set of
statements than pg_dump deals with, but they're not _that much_ more limited.
I won't veto a lexing-based approach if it gets the behaviors right, but I
don't have high hopes for it getting the behaviors right and staying that way.

(I almost said "pg_restore --no-owner" should not restore
pg_tablespace.spcowner, but v17 "pg_dumpall --no-owner" does restore it.  One
could argue for or against aligning $SUBJECT behavior w/ v17's mistake there.)





view thread (100+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Non-text mode for pg_dumpall
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox