Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wW0zD-002Kkb-2x for pgsql-hackers@arkaria.postgresql.org; Sun, 07 Jun 2026 00:03:32 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wW0yA-00GfYU-2V for pgsql-hackers@arkaria.postgresql.org; Sun, 07 Jun 2026 00:02:26 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wW0yA-00GfYL-0D for pgsql-hackers@lists.postgresql.org; Sun, 07 Jun 2026 00:02:26 +0000 Received: from mail-dl1-x122a.google.com ([2607:f8b0:4864:20::122a]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wW0y6-00000001SnU-0PYX for pgsql-hackers@lists.postgresql.org; Sun, 07 Jun 2026 00:02:23 +0000 Received: by mail-dl1-x122a.google.com with SMTP id a92af1059eb24-13721dfd471so4139052c88.1 for ; Sat, 06 Jun 2026 17:02:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leadboat.com; s=google; t=1780790541; x=1781395341; darn=lists.postgresql.org; h=user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=sMVngKrxvYcvfS7Nf+e5L6NmeH1HJ67+zHtynz5LItI=; b=Hqu7zGjQp8J6c/nIBx/G8tebL+4UOOPWizJArLCvxJtNiGdvOKKc551mht4A8IiOgF xNTLUUZJRxU1Yt8WxNmDVRA0izhrEz6/+gZelTdhH8mvofCkvXi0wYL5PDGhFReD2qdQ Mf0DTTbSlfS6AAQb0+jMcnOoSdR+eBURplGV4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1780790541; x=1781395341; h=user-agent:in-reply-to:content-transfer-encoding :content-disposition:mime-version:references:message-id:subject:cc :to:from:date:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=sMVngKrxvYcvfS7Nf+e5L6NmeH1HJ67+zHtynz5LItI=; b=GVpLv+0Tqw9WGaGz3N6kTCyeBivXscrdoHunOtFo1Gdm/Sg0x9GlI3SnpqjuFEnVxK 5V3lZ/eA2+EX6lIhcZq4mtO96Va/aVsygPk/MBYOTNJ89Hbtu04LdI1edByuXG5pihD1 7aMrCxnpE8bUdtLei3YoLCNUovm9qLsC3FeLdoM5CBl4+dsLvwgPGt+06gFrYVw7yFvN eUC71s6/lhU9xV1hcOlRa548kXWdYI31o0hjbSyotl0Mst576PbOEiq7tapocWSwEwKj BMhbD2JkU9LEZfP9JJMDalL2zQizj5++hLuhE3raHopgziCerhgg0idRd+kl+zr2dsT8 Ygug== X-Forwarded-Encrypted: i=1; AFNElJ+5m3rJjH+w6K4dyYmg+kV45T8lbSQy+TiWzuGcd5n0yUIxShKYyWDA+r3vFnJteKcGzb9+wdYLzG1JJRSW@lists.postgresql.org X-Gm-Message-State: AOJu0YzEE35VQ9QEZGTQxiE0WdHsiirbSWC9wQzEQhtASXqKluvuDZAI LdyH+e5XRJ0QnLnUGezD3xifFr4AutQAD/nAhyZo29bfAVbmfKc+ehxCqy1sysN64g== X-Gm-Gg: Acq92OHc2ssmkNcdA+blxiFXcRmp19bQUbJMFEvR/MYPhtZSkUhpXp+8o2g1DIXA65y D5ZevuBNZ+LqGo35oxytznnuZnRzLZPQW3+WmdsAihwjsdgDH94GdvK0VTdFrztKfhUeUgXIgBj fIzxxh+llb9oLq5MlgH5/D/PVszOAgdprWQJCBt3JlMZqk9j/IkFpDkJAzYoEf3AGMrgdlZJLIY t6soXW3zZu5xrN7lUK2NkRqw+jjyx/shjJmT5WGxTZw1oQkbjOkRxEMdg4S6q31ifjpiCEfsYcz s2Cdqox0TwoTqs/eBVOSSf6Sxh8dAaKd8jX05lUCHlXAGg9i/1DmIW4C7DrQo2sgSF6XSHTIrzW oD2hwZrjnYu26kioQykLvWmtcGDzg6GDk/umBvj1frED8uRCwms1TdxQppRARHS+8H5wQzvYDx+ Je/GVgX5LhkGldosEm9oVtSNIxrT/7qoYEzQfqU6TFC2nyOOtO/OfxmptrxzEQ19Bs6Q== X-Received: by 2002:a05:7022:487:b0:137:ed87:817e with SMTP id a92af1059eb24-13806713123mr5242847c88.28.1780790540874; Sat, 06 Jun 2026 17:02:20 -0700 (PDT) Received: from microsoft.com (c-73-15-160-255.hsd1.ca.comcast.net. [73.15.160.255]) by smtp.gmail.com with ESMTPSA id a92af1059eb24-137f54c9c12sm9118309c88.6.2026.06.06.17.02.19 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 06 Jun 2026 17:02:20 -0700 (PDT) Date: Sat, 6 Jun 2026 17:02:18 -0700 From: Noah Misch To: Andrew Dunstan Cc: jian he , Mahendra Singh Thalor , tushar , Vaibhav Dalvi , pgsql-hackers@lists.postgresql.org Subject: Re: Non-text mode for pg_dumpall Message-ID: <20260607000218.96.noahmisch@microsoft.com> References: <4022765f-38ee-48a3-b246-615b3f8e1c23@dunslane.net> <59d3616f-6d6d-40d5-87e2-e019e350b52d@dunslane.net> <7faf3a59-cfe1-45cb-a972-55e05560b414@dunslane.net> <2e7ada75-422c-4153-9437-ea0ce8d63521@dunslane.net> <84764ef6-2bcb-43f7-9037-d7b589857da9@dunslane.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <84764ef6-2bcb-43f7-9037-d7b589857da9@dunslane.net> User-Agent: Mutt/2.3.0 (2026-01-25) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Feb 26, 2026 at 09:02:48AM -0500, Andrew Dunstan wrote: > pushed with a slight tweak. Having now reviewed commit 763aaa0, I don't think it's ready to remain part of v19. While some points from my v18 review are now resolved, other points still seem unresolved. I didn't find discussion of the unresolved points. I also see new issues. Wider issues: > @@ -796,24 +964,45 @@ dropRoles(PGconn *conn) > + if (archDumpFormat == archNull) > + { > + appendPQExpBuffer(delQry, "DROP ROLE %s%s;\n", > + if_exists ? "IF EXISTS " : "", > + fmtId(rolename)); > + fprintf(OPF, "%s", delQry->data); > + } > + else > + { > + appendPQExpBuffer(delQry, "DROP ROLE IF EXISTS %s;\n", > + fmtId(rolename)); > + > + ArchiveEntry(fout, > + nilCatalogId, /* catalog ID */ > + createDumpId(), /* dump ID */ > + ARCHIVE_OPTS(.tag = psprintf("ROLE %s", fmtId(rolename)), > + .description = "DROP_GLOBAL", > + .section = SECTION_PRE_DATA, > + .createStmt = delQry->data)); > + } pg_dumpall.c should call ArchiveEntry() in the same way pg_dump.c does. In pg_dump.c, per-object-class support code calls ArchiveEntry() unconditionally, and the object-independent infra of pg_backup_archiver.c deals with the difference between plain and non-plain formats. There should be one appendPQExpBuffer(delQry, "DROP ROLE ..."), not one for plain format and another for non-plain formats. Having two creates excess risk of format-specific bugs, something pg_dump.c has long avoided well. (I'm echoing my postgr.es/m/20250708212819.09.nmisch@google.com review. I wrote, "The strength of the archiver architecture shows in how rarely new features need format-specific logic and how rarely format-specific bugs get reported." That holds for the way pg_dump.c uses the archiver, but it doesn't hold for the way pg_dumpall.c now uses the archiver.) Separately, DROP should be in dropStmt, not in createStmt. This likely entails refactoring to merge dumpRoles() and dropRoles() into one function, per the style of pg_dump.c. > + /* > + * For pg_dumpall archives, --clean implies --if-exists since global > + * objects may not exist in the target cluster. > + */ > + if (opts->dropSchema && !opts->if_exists) > + { > + opts->if_exists = 1; > + pg_log_info("--if-exists is implied by --clean for pg_dumpall archives"); > + } The last comment of postgr.es/m/20250708212819.09.nmisch@google.com disagreed with this decision, so that remains unresolved. > + /* If database is already created, then don't set createDB flag. */ > + if (tmpopts->cparams.dbname) > + { > + PGconn *test_conn; > + > + test_conn = ConnectDatabase(dbidname->str, NULL, tmpopts->cparams.pghost, > + tmpopts->cparams.pgport, tmpopts->cparams.username, TRI_DEFAULT, > + false, progname, NULL, NULL, NULL, NULL); > + if (test_conn) > + { > + PQfinish(test_conn); > + > + /* Use already created database for connection. */ > + tmpopts->createDB = 0; > + tmpopts->cparams.dbname = dbidname->str; > + } > + else > + { > + /* We'll have to create it */ > + tmpopts->createDB = 1; > + tmpopts->cparams.dbname = connected_db; > + } postgr.es/m/20250708212819.09.nmisch@google.com called for this to change. None of this is meant to say the feature is impossible. But I don’t think this commit is at the point where post-commit fixups are the right workflow. I recommend reverting, posting a new version, and letting commitfest review finish. Localized issues: There's new code to write map and globals files, but I don't see corresponding code to fsync those files, like we fsync a plain-format dump. > @@ -3027,6 +3049,16 @@ _tocEntryRequired(TocEntry *te, teSection curSection, ArchiveHandle *AH) > return 0; > } > > + /* > + * Global object TOC entries (e.g., ROLEs or TABLESPACEs) must not be > + * ignored. > + */ > + if (strcmp(te->desc, "ROLE") == 0 || > + strcmp(te->desc, "ROLE PROPERTIES") == 0 || > + strcmp(te->desc, "TABLESPACE") == 0 || > + strcmp(te->desc, "DROP_GLOBAL") == 0) > + return REQ_SCHEMA; > + If I delete this addition, no src/bin/pg_dump test fails. Would you explain the rationale for this addition? > + if (comment_buf->data[0] != '\0') > + ArchiveEntry(fout, > + nilCatalogId, /* catalog ID */ > + createDumpId(), /* dump ID */ > + ARCHIVE_OPTS(.tag = tag, > + .description = "COMMENT", > + .section = SECTION_PRE_DATA, > + .createStmt = comment_buf->data)); > + > + if (seclabel_buf->data[0] != '\0') > + ArchiveEntry(fout, > + nilCatalogId, /* catalog ID */ > + createDumpId(), /* dump ID */ > + ARCHIVE_OPTS(.tag = tag, > + .description = "SECURITY LABEL", > + .section = SECTION_PRE_DATA, > + .createStmt = seclabel_buf->data)); COMMENT and SECURITY LABEL should use .deps and .nDeps to record a dependency on the main object. Representative example from pg_dump.c: ArchiveEntry(fout, nilCatalogId, createDumpId(), ARCHIVE_OPTS(.tag = target->data, .namespace = tbinfo->dobj.namespace->dobj.name, .owner = tbinfo->rolname, .description = "SECURITY LABEL", .section = SECTION_NONE, .createStmt = query->data, .deps = &(tbinfo->dobj.dumpId), .nDeps = 1)); This probably has no user-visible consequences, since "/* Parallel execution is not supported for global object restoration. */". Still, best to follow the standard. > + /* > + * If this is not a plain format dump, then append dboid and dbname to > + * the map.dat file. > + */ The first six lines inside the block aren't for the purpose described here. The one line that is for this purpose has its own comment. Please delete this comment. > + if (archDumpFormat != archNull) > + { > + if (archDumpFormat == archCustom) > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".dmp", db_subdir, oid); > + else if (archDumpFormat == archTar) > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\".tar", db_subdir, oid); > + else > + snprintf(dbfilepath, MAXPGPATH, "\"%s\"/\"%s\"", db_subdir, oid); My review in postgr.es/m/20250708212819.09.nmisch@google.com called for this to change. I'm finding no discussion of the rationale for not changing it. > + > + /* Put one line entry for dboid and dbname in map file. */ > + fprintf(map_file, "%s %s\n", oid, dbname); > + } > +runPgDump(const char *dbname, const char *create_opts, char *dbfile) > + if (archDumpFormat != archNull) > + { > + printfPQExpBuffer(&cmd, "\"%s\" %s -f %s %s", pg_dump_bin, > + pgdumpopts->data, dbfile, create_opts); A different "-f" argument is already in pgdumpopts. That works, but it's untidy. > + pg_fatal("option %s cannot exclude %s when restoring a pg_dumpall archive", > + "--section", "--pre-data"); s/--pre-data/pre-data/ > + /* > + * Always restore global objects, even if --exclude-database results > + * in zero databases to process. If 'globals-only' is set, exit > + * immediately. I think this comment is out of date, because the code doesn't have an exit: > + */ > + snprintf(global_path, MAXPGPATH, "%s/toc.glo", inputFileSpec); > + > + n_errors = restore_global_objects(global_path, tmpopts); > + > + if (globals_only) > + pg_log_info("database restoring skipped because option %s was specified", > + "-g/--globals-only"); > + else > + { > + /* Now restore all the databases from map.dat */ > + n_errors = n_errors + restore_all_databases(inputFileSpec, db_exclude_patterns, > + opts, numWorkers); > + } > + > + /* Free db pattern list. */ > + simple_string_list_destroy(&db_exclude_patterns); > + } > + /* Don't output TOC entry comments when restoring globals */ > + ((ArchiveHandle *) AH)->noTocComments = 1; Why not? > +static int > +get_dbnames_list_to_restore(PGconn *conn, > + SimplePtrList *dbname_oid_list, > + SimpleStringList db_exclude_patterns) > +{ ... > + if (pg_strcasecmp(dbidname->str, pat_cell->val) == 0) > + skip_db_restore = true; > + /* Otherwise, try a pattern match if there is a connection */ The code assumes a connection unconditionally (which seems right to me): > + else > + { > + int dotcnt; > + > + appendPQExpBufferStr(query, "SELECT 1 "); > + processSQLNamePattern(conn, query, pat_cell->val, false, > + false, NULL, db_lit->data, > + NULL, NULL, NULL, &dotcnt); > + > + if (dotcnt > 0) > + { > + pg_log_error("improper qualified name (too many dotted names): %s", > + dbidname->str); > + PQfinish(conn); > + exit_nicely(1); > + } > + > + res = executeQuery(conn, query->data);