public inbox for [email protected]
help / color / mirror / Atom feedFrom: Mahendra Singh Thalor <[email protected]>
To: Noah Misch <[email protected]>
Cc: Andrew Dunstan <[email protected]>
Cc: Álvaro Herrera <[email protected]>
Cc: jian he <[email protected]>
Cc: Srinath Reddy <[email protected]>
Cc: [email protected]
Subject: Re: Non-text mode for pg_dumpall
Date: Thu, 17 Jul 2025 15:46:41 +0530
Message-ID: <CAKYtNAqXTvfAw-y4FHvzprg72cFrC63cge9xMVDO_R4=NHc5rA@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CAKYtNArtO8T-u5=E4ibxFfW+zQ8dBmOZQFgRa36+27NFkdtK0Q@mail.gmail.com>
<[email protected]>
<CAKYtNAoE1xUcMmBq-1qCaZGUDQLfMSVa6hzBVA-PqbmSTzPf5A@mail.gmail.com>
<CAKYtNAq530UTAtnVs5xfONQ0j6vS-Sys50p5+SNfC7G7_ghCVQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<CAKYtNAr-6_CCz+tnJ0T2597Ar4iAZXkE1qPimsO9EY-Kn7LvzQ@mail.gmail.com>
<[email protected]>
Thanks Noah for the feedback.
On Wed, 16 Jul 2025 at 05:50, Noah Misch <[email protected]> wrote:
>
> On Thu, Jul 10, 2025 at 12:21:03AM +0530, Mahendra Singh Thalor wrote:
> > On Wed, 9 Jul 2025 at 02:58, Noah Misch <[email protected]> wrote:
> > > On Fri, Apr 04, 2025 at 04:11:05PM -0400, Andrew Dunstan wrote:
> > > > Thanks. I have pushed these now with a few further small tweaks.
> > >
> > > This drops all databases:
> > >
> > > pg_dumpall --clean -Fd -f /tmp/dump
> > > pg_restore -d template1 --globals-only /tmp/dump
> > >
> > > That didn't match my expectations given this help text:
> > >
> > > $ pg_restore --help|grep global
> > > -g, --globals-only restore only global objects, no databases
> >
> > Databases are global objects so due to --clean command, we are putting
> > drop commands in global.dat for all the databases. While restoring, we
> > used the "--globals-only" option so we are dropping all these
> > databases by global.dat file.
> >
> > Please let us know your expectations for this specific case.
>
> Be consistent with "pg_dump". A quick check suggests "pg_dump --clean"
> affects plain format only. For non-plain formats, only the pg_restore
> argument governs the final commands:
>
> $ rm -r /tmp/dump; pg_dump --clean -Fd -f /tmp/dump && pg_restore -f- /tmp/dump | grep DROP
> $ rm -r /tmp/dump; pg_dump --clean -Fd -f /tmp/dump && pg_restore --clean -f- /tmp/dump | grep DROP
> DROP TABLE public.example;
>
> That said, you should audit code referencing the --clean flag and see if
> there's more to it than that quick test suggests. Note that aligning with
> pg_dump will require changes for object types beyond databases. "pg_restore
> --clean" of a global dump should emit DROP TABLESPACE and DROP ROLE as
> appropriate, regardless of whether the original pg_dumpall had --clean.
>
> For my earlier example (pg_dumpall --clean; pg_restore --globals-only) I
> expect the same outcome as plain-format "pg_dumpall --globals-only", which is
> no databases dropped or created. The help line says "no databases". Plain
> "pg_dumpall --globals-only" and even "pg_dumpall --globals-only --clean" do
> not drop or create databases.
To pg_restore, we are giving a dump of pg_dumpall which has a
global.dat file and we have drop commands in the global.dat file so
when we are using 'globals-only', we are dropping databases as we have
DROP commands.
As of now, we don't have any filter for global.dat file in restore. If
a user wants to restore only globals(without droping db), then they
should use 'globals-only' in pg_dumpall.
Or if we don't want to DROP databases by global.dat file, then we
should add a filter in pg_restore (hard to implement as we have SQL
commands in global.dat file). I think, for this case, we can do some
more doc changes.
Example: pg_restore --globals-only : this will restore the global.dat
file(including all drop commands). It might drop databases if any drop
commands.
@Andrew Dunstan Please add your opinion.
>
> > > commit 1495eff wrote:
> > > > --- a/src/bin/pg_dump/pg_dumpall.c
> > > > +++ b/src/bin/pg_dump/pg_dumpall.c
> > >
> > > > @@ -1612,9 +1683,27 @@ dumpDatabases(PGconn *conn)
> > > > continue;
> > > > }
> > > >
> > > > + /*
> > > > + * If this is not a plain format dump, then append dboid and dbname to
> > > > + * the map.dat file.
> > > > + */
> > > > + if (archDumpFormat != archNull)
> > > > + {
> > > > + if (archDumpFormat == archCustom)
> > > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".dmp", db_subdir, oid);
> > > > + else if (archDumpFormat == archTar)
> > > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".tar", db_subdir, oid);
> > > > + else
> > > > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\"", db_subdir, oid);
> > >
> > > Use appendShellString() instead. Plain mode already does that for the
> > > "pg_dumpall -f" argument, which is part of db_subdir here. We don't want
> > > weird filename characters to work out differently for plain vs. non-plain
> > > mode. Also, it's easier to search for appendShellString() than to search for
> > > open-coded shell quoting.
> >
> > Yes, we can use appendShellString also. We are using snprintf in the
> > pg_dump.c file also.
> > Ex: snprintf(tagbuf, sizeof(tagbuf), "LARGE OBJECTS %u..%u",
> > loinfo->looids[0], loinfo->looids[loinfo->numlos - 1]);
>
> It's true snprintf() is not banned in these programs, but don't use it to do
> the quoting for OS shell command lines or fragments thereof. dbfilepath is a
> fragment of an OS shell command line. The LARGE OBJECTS string is not one of
> those. Hence, the LARGE OBJECTS scenario should keep using snprintf().
>
> > If we want to use appendShellString, I can write a patch for these.
> > Please let me know your opinion.
>
> Use appendShellString() for shell quoting. Don't attempt to use it for other
> purposes.
Okay. Fixed in attached patch.
>
> > > > --- a/src/bin/pg_dump/pg_restore.c
> > > > +++ b/src/bin/pg_dump/pg_restore.c
> > >
> > > > +/*
> > > > + * read_one_statement
> > > > + *
> > > > + * This will start reading from passed file pointer using fgetc and read till
> > > > + * semicolon(sql statement terminator for global.dat file)
> > > > + *
> > > > + * EOF is returned if end-of-file input is seen; time to shut down.
> > >
> > > What makes it okay to use this particular subset of SQL lexing?
> >
> > To support complex syntax, we used this code from another file.
>
> I'm hearing that you copied this code from somewhere. Running
> "git grep 'time to shut down'" suggests you copied it from
> InteractiveBackend(). Is that right? I do see other similarities between
> read_one_statement() and InteractiveBackend().
>
> Copying InteractiveBackend() provides negligible assurance that this is the
> right subset of SQL lexing. Only single-user mode uses InteractiveBackend().
> Single-user mode survives mostly as a last resort for recovering from having
> reached xidStopLimit, is rarely used, and only superusers write queries to it.
Yes, we copied this from InteractiveBackend to read statements from
global.dat file.
>
> > > > +/*
> > > > + * get_dbnames_list_to_restore
> > > > + *
> > > > + * This will mark for skipping any entries from dbname_oid_list that pattern match an
> > > > + * entry in the db_exclude_patterns list.
> > > > + *
> > > > + * Returns the number of database to be restored.
> > > > + *
> > > > + */
> > > > +static int
> > > > +get_dbnames_list_to_restore(PGconn *conn,
> > > > + SimpleOidStringList *dbname_oid_list,
> > > > + SimpleStringList db_exclude_patterns)
> > > > +{
> > > > + int count_db = 0;
> > > > + PQExpBuffer query;
> > > > + PGresult *res;
> > > > +
> > > > + query = createPQExpBuffer();
> > > > +
> > > > + if (!conn)
> > > > + pg_log_info("considering PATTERN as NAME for --exclude-database option as no db connection while doing pg_restore.");
> > >
> > > When do we not have a connection here? We'd need to document this behavior
> > > variation if it stays, but I'd prefer if we can just rely on having a
> > > connection.
> >
> > Yes, we can document this behavior.
>
> My review asked a question there. I don't see an answer to that question.
> Would you answer that question?
Example: if there is no active database, even postgres/template1, then
we will consider PATTEREN as NAME. This is the rare case.
In attached patch, I added one doc line also for this case.
> > @@ -1612,9 +1683,27 @@ dumpDatabases(PGconn *conn)
> > continue;
> > }
> >
> > + /*
> > + * If this is not a plain format dump, then append dboid and dbname to
> > + * the map.dat file.
> > + */
> > + if (archDumpFormat != archNull)
> > + {
> > + if (archDumpFormat == archCustom)
> > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".dmp", db_subdir, oid);
> > + else if (archDumpFormat == archTar)
> > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".tar", db_subdir, oid);
> > + else
> > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\"", db_subdir, oid);
>
> Use appendShellString() instead. Plain mode already does that for the
> "pg_dumpall -f" argument, which is part of db_subdir here. We don't want
> weird filename characters to work out differently for plain vs. non-plain
> mode. Also, it's easier to search for appendShellString() than to search for
> open-coded shell quoting.
Fixed.
>
> > @@ -1641,19 +1727,30 @@ dumpDatabases(PGconn *conn)
> > if (filename)
> > fclose(OPF);
> >
> > - ret = runPgDump(dbname, create_opts);
> > + ret = runPgDump(dbname, create_opts, dbfilepath, archDumpFormat);
> > if (ret != 0)
> > pg_fatal("pg_dump failed on database \"%s\", exiting", dbname);
> >
> > if (filename)
> > {
> > - OPF = fopen(filename, PG_BINARY_A);
> > + char global_path[MAXPGPATH];
> > +
> > + if (archDumpFormat != archNull)
> > + snprintf(global_path, MAXPGPATH, "%s/global.dat", filename);
> > + else
> > + snprintf(global_path, MAXPGPATH, "%s", filename);
> > +
> > + OPF = fopen(global_path, PG_BINARY_A);
> > if (!OPF)
> > pg_fatal("could not re-open the output file \"%s\": %m",
> > - filename);
> > + global_path);
>
> Minor item: plain mode benefits from reopening, because pg_dump appended to
> the plain output file. There's no analogous need to reopen global.dat, since
> just this one process writes to global.dat.
Fixed.
>
> > @@ -1672,17 +1770,36 @@ runPgDump(const char *dbname, const char *create_opts)
> > initPQExpBuffer(&connstrbuf);
> > initPQExpBuffer(&cmd);
> >
> > - printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
> > - pgdumpopts->data, create_opts);
> > -
> > /*
> > - * If we have a filename, use the undocumented plain-append pg_dump
> > - * format.
> > + * If this is not a plain format dump, then append file name and dump
> > + * format to the pg_dump command to get archive dump.
> > */
> > - if (filename)
> > - appendPQExpBufferStr(&cmd, " -Fa ");
> > + if (archDumpFormat != archNull)
> > + {
> > + printfPQExpBuffer(&cmd, "\"%s\" -f %s %s", pg_dump_bin,
> > + dbfile, create_opts);
> > +
> > + if (archDumpFormat == archDirectory)
> > + appendPQExpBufferStr(&cmd, " --format=directory ");
> > + else if (archDumpFormat == archCustom)
> > + appendPQExpBufferStr(&cmd, " --format=custom ");
> > + else if (archDumpFormat == archTar)
> > + appendPQExpBufferStr(&cmd, " --format=tar ");
> > + }
> > else
> > - appendPQExpBufferStr(&cmd, " -Fp ");
> > + {
> > + printfPQExpBuffer(&cmd, "\"%s\" %s %s", pg_dump_bin,
> > + pgdumpopts->data, create_opts);
>
> This uses pgdumpopts for plain mode only, so many pg_dumpall options silently
> have no effect in non-plain mode. Example:
>
> strace -f pg_dumpall --lock-wait-timeout=10 2>&1 >/dev/null | grep exec
> strace -f pg_dumpall --lock-wait-timeout=10 -Fd -f /tmp/dump3 2>&1 >/dev/null | grep exec
Fixed.
> > + /* If database is already created, then don't set createDB flag. */
> > + if (opts->cparams.dbname)
> > + {
> > + PGconn *test_conn;
> > +
> > + test_conn = ConnectDatabase(db_cell->str, NULL, opts->cparams.pghost,
> > + opts->cparams.pgport, opts->cparams.username, TRI_DEFAULT,
> > + false, progname, NULL, NULL, NULL, NULL);
> > + if (test_conn)
> > + {
> > + PQfinish(test_conn);
> > +
> > + /* Use already created database for connection. */
> > + opts->createDB = 0;
> > + opts->cparams.dbname = db_cell->str;
> > + }
> > + else
> > + {
> > + /* we'll have to create it */
> > + opts->createDB = 1;
> > + opts->cparams.dbname = connected_db;
> > + }
>
> In released versions, "pg_restore --create" fails if the database exists, and
> pg_restore w/o --create fails unless the database exists. I think we should
> continue that pattern in this new feature. If not, pg_restore should document
> how it treats pg_dumpall-sourced dumps with the "create if not exists"
> semantics appearing here.
Added one more doc line for this case.
Here, I am attaching a patch. Please let me know feedback.
--
Thanks and Regards
Mahendra Singh Thalor
EnterpriseDB: http://www.enterprisedb.com
Attachments:
[application/octet-stream] v01_db2978-WIP-implement-setNodeValue-function.patch (5.2K, 2-v01_db2978-WIP-implement-setNodeValue-function.patch)
download | inline diff:
From ff9b28250dc0c016b37477536ece0b14b6d07c03 Mon Sep 17 00:00:00 2001
From: Mahendra Singh Thalor <[email protected]>
Date: Wed, 9 Jul 2025 11:42:47 +0530
Subject: [PATCH] implement setNodeValue function
DB-2978
---
contrib/dbms_xmldom/dbms_xmldom.c | 58 ++++++++++++++++++-
contrib/dbms_xmldom/dbms_xmldom.sql.in | 13 +++++
contrib/dbms_xmldom/dbms_xmldom_public.sql.in | 1 +
3 files changed, 71 insertions(+), 1 deletion(-)
diff --git a/contrib/dbms_xmldom/dbms_xmldom.c b/contrib/dbms_xmldom/dbms_xmldom.c
index ed49245b0cb..a297ddca6d3 100644
--- a/contrib/dbms_xmldom/dbms_xmldom.c
+++ b/contrib/dbms_xmldom/dbms_xmldom.c
@@ -42,6 +42,7 @@ PG_FUNCTION_INFO_V1(dbms_xmldom_free_document);
PG_FUNCTION_INFO_V1(dbms_xmldom_set_version);
PG_FUNCTION_INFO_V1(dbms_xmldom_get_nodename);
PG_FUNCTION_INFO_V1(dbms_xmldom_get_nodevalue);
+PG_FUNCTION_INFO_V1(dbms_xmldom_set_nodevalue);
PG_FUNCTION_INFO_V1(dbms_xmldom_get_firstchild);
PG_FUNCTION_INFO_V1(dbms_xmldom_get_childnodes);
PG_FUNCTION_INFO_V1(dbms_xmldom_get_nodelistlength);
@@ -49,7 +50,6 @@ PG_FUNCTION_INFO_V1(dbms_xmldom_get_nodelistitem);
PG_FUNCTION_INFO_V1(dbms_xmldom_make_element);
PG_FUNCTION_INFO_V1(dbms_xmldom_replace_child);
PG_FUNCTION_INFO_V1(dbms_xmldom_remove_child);
-PG_FUNCTION_INFO_V1(dbms_xmldom_set_node_value);
/* Function Declarations */
@@ -1287,6 +1287,62 @@ dbms_xmldom_get_nodevalue(PG_FUNCTION_ARGS)
PG_RETURN_VARCHAR_P(cstring_to_text((char *) nodevalue));
}
+/*
+ * dbms_xmldom_set_nodevalue
+ *
+ * Implements the functionality of dbms_xmldom.setNodeValue.
+ * sets the value to DOMNode
+ */
+Datum
+dbms_xmldom_set_nodevalue(PG_FUNCTION_ARGS)
+{
+ char docHashKey[HASHKEYLEN];
+ uint32 docid = 0;
+ uint64 nodeid = 0;
+ DocInfoPtr docInfo = NULL;
+ DocNodeInfoPtr nodeInfo = NULL;
+ char *nodevalue = NULL;
+ char *nodevaluenull = NULL;
+
+ /* If node is NULL, then return. */
+ if (PG_ARGISNULL(0))
+ PG_RETURN_NULL();
+
+ if (PG_ARGISNULL(1) || strlen(text_to_cstring(PG_GETARG_TEXT_P(1))) == 0)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("invalid value for parameter \"%s\"", "tagName")));
+
+ nodevalue = text_to_cstring(PG_GETARG_TEXT_P(1));
+
+ memcpy(docHashKey, VARDATA_ANY(PG_GETARG_TEXT_P(0)), HASHKEYLEN);
+ extract_ids(docHashKey, &docid, &nodeid);
+
+ docInfo = getDocInfo(docid);
+ if (!docInfo)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ err_redwood_sqlcode(-31181),
+ errmsg("invalid value for parameter \"%s\"", "n")));
+
+ nodeInfo =
+ (DocNodeInfoPtr) hash_search(docInfo->nodeInfo,
+ &nodeid,
+ HASH_FIND,
+ NULL);
+
+ Assert(nodeInfo && nodeInfo->node);
+
+ /* Now set new vaule. */
+// xmlNodeSetContent((xmlNodePtr) nodeInfo->node, (const xmlChar *) nodevaluenull);
+// xmlNodeAddContent((xmlNodePtr) nodeInfo->node, (const xmlChar *) nodevalue);
+
+ xmlNodeSetName((xmlNodePtr) nodeInfo->node, (const xmlChar *) nodevalue);
+// xmlNodeSetContent((xmlNodePtr) nodeInfo->node, (const xmlChar *) nodevalue);
+
+ PG_RETURN_TEXT_P(cstring_to_text_with_len(docHashKey, HASHKEYLEN));
+}
+
/*
* dbms_xmldom_get_firstchild
*
diff --git a/contrib/dbms_xmldom/dbms_xmldom.sql.in b/contrib/dbms_xmldom/dbms_xmldom.sql.in
index 2236612d754..2eae94ec656 100644
--- a/contrib/dbms_xmldom/dbms_xmldom.sql.in
+++ b/contrib/dbms_xmldom/dbms_xmldom.sql.in
@@ -65,6 +65,10 @@ CREATE FUNCTION dbms_xmldom_getNodeValue(nodeid RAW) RETURNS VARCHAR2
AS '$libdir/dbms_xmldom', 'dbms_xmldom_get_nodevalue'
LANGUAGE C IMMUTABLE PARALLEL SAFE;
+CREATE FUNCTION dbms_xmldom_setNodeValue(nodeid RAW, name IN VARCHAR2) RETURNS RAW
+AS '$libdir/dbms_xmldom', 'dbms_xmldom_set_nodevalue'
+LANGUAGE C IMMUTABLE PARALLEL SAFE;
+
CREATE FUNCTION dbms_xmldom_getFirstChild(nodeid RAW) RETURNS RAW
AS '$libdir/dbms_xmldom', 'dbms_xmldom_get_firstchild'
LANGUAGE C IMMUTABLE PARALLEL SAFE;
@@ -145,6 +149,14 @@ CREATE OR REPLACE PACKAGE BODY dbms_xmldom IS
return dbms_xmldom_getNodeValue(n.id);
END;
+ FUNCTION setNodeValue(n domnode, name IN VARCHAR2) RETURN DOMNode SET search_path = pg_catalog, pg_temp IS
+ DECLARE
+ node DOMNode;
+ BEGIN
+ n.id = dbms_xmldom_setNodeValue(n.id, name);
+ return node;
+ END;
+
FUNCTION getFirstChild(n DOMNode) RETURN DOMNode SET search_path = pg_catalog, pg_temp IS
DECLARE
node DOMNode;
@@ -199,6 +211,7 @@ CREATE OR REPLACE PACKAGE BODY dbms_xmldom IS
return node;
END;
+
FUNCTION createElement(doc DOMDocument, tagName IN VARCHAR2) RETURN DOMElement SET search_path = pg_catalog, pg_temp IS
DECLARE
node DOMElement;
diff --git a/contrib/dbms_xmldom/dbms_xmldom_public.sql.in b/contrib/dbms_xmldom/dbms_xmldom_public.sql.in
index b7b1686e1fe..8fd2463394d 100644
--- a/contrib/dbms_xmldom/dbms_xmldom_public.sql.in
+++ b/contrib/dbms_xmldom/dbms_xmldom_public.sql.in
@@ -32,6 +32,7 @@ CREATE OR REPLACE PACKAGE dbms_xmldom AUTHID CURRENT_USER AS
FUNCTION getNodeName(n DOMNode) RETURN VARCHAR2;
FUNCTION getNodeValue(n domnode) RETURN VARCHAR2;
+ FUNCTION setNodeValue(n domnode, name IN VARCHAR2) RETURN DOMNode;
FUNCTION getFirstChild(n DOMNode) RETURN DOMNode;
FUNCTION getChildNodes(n DOMNode) RETURN DOMNodeList;
FUNCTION appendChild(n DOMNode, newChild IN DOMNode) RETURN DOMNode;
--
2.39.3
view thread (100+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Non-text mode for pg_dumpall
In-Reply-To: <CAKYtNAqXTvfAw-y4FHvzprg72cFrC63cge9xMVDO_R4=NHc5rA@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox