Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ue1GQ-0044aj-Bm for pgsql-hackers@arkaria.postgresql.org; Tue, 22 Jul 2025 00:53:51 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1ue1GN-003wGU-FH for pgsql-hackers@arkaria.postgresql.org; Tue, 22 Jul 2025 00:53:47 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1ue1GN-003wGF-6E for pgsql-hackers@lists.postgresql.org; Tue, 22 Jul 2025 00:53:47 +0000 Received: from mail-pf1-x42a.google.com ([2607:f8b0:4864:20::42a]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1ue1GJ-0008LY-0e for pgsql-hackers@lists.postgresql.org; Tue, 22 Jul 2025 00:53:46 +0000 Received: by mail-pf1-x42a.google.com with SMTP id d2e1a72fcca58-74b56b1d301so3241936b3a.1 for ; Mon, 21 Jul 2025 17:53:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leadboat.com; s=google; t=1753145622; x=1753750422; darn=lists.postgresql.org; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:from:to:cc:subject:date :message-id:reply-to; bh=kxnChCE57AyYXO3I89/a46RekLMrMZVoWJEbTDjnIPg=; b=SKPhiXKKlORvvX52nwD4VrjfOGQcGdPAqmuKnGIoyrn1dxXxmla3T7U4VfPNMgmBqx ccmEksTs5rDqjUuSILfh/qou1aslw1E1vJnR5gJQ/gHxBp+wKSgDbSG9dhXogj6mCgq3 XflA6eBUwQcdXy0Wg4YVvNFAe26bkoDxjrDNI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1753145622; x=1753750422; h=user-agent:in-reply-to:content-disposition:mime-version:references :message-id:subject:cc:to:from:date:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kxnChCE57AyYXO3I89/a46RekLMrMZVoWJEbTDjnIPg=; b=QWFJ/OZ7VjPdEWW+29AKLzg5a19T2ITgb6EB9bJ1/NvvKWrGZs0AFSSxbqpk2Hjfqn XfiQlSzvOUdnXRhjnkPk+5yGw5MJEKYciOYSAwH4pzFsA6lEwjhSiBLhsfeQ8U0EITZA EooJn8WxQaOffBhYX2+Rn4JEFJ1KDnZfXsqbsaTqq2PIjdLhRuGGi3H9avRIRz/lV7h3 kQcoy9/SAiG+UX72IyqlA+/LzB34AgKJAXUKt2bBtHxaVWepPUzuR7Q4Qm63TzapfGWj 8OANFn9RLQuW3UV9/H0Z9bo6/A0ZETH4q0lHxkZc+W9S8+DTaz8senqgiq6/nUVP4v41 X2hg== X-Forwarded-Encrypted: i=1; AJvYcCX/PyeE8SfRmTDjiIwq0mm5EO+cHYjiD+UkYHQXcGMr4Qon9MDGXqkxwyPL/WvCMQU88ZFiNI1JW1FEhYP7@lists.postgresql.org X-Gm-Message-State: AOJu0Yx3YdJBk+062WcbtWLvNTYrJVPJAW0vJv2ott1JeqUSFzfL+1Mm XnpVhYzvKq96noP/PPVl9EutLxfGGLO/ykI/iyBeBpZhlorPUQOEh0YSjbjUlzz40w== X-Gm-Gg: ASbGncu33Qm0qjaQ7LSI1nbOIhOZP1YWgeGeS2hNGJwXsQ8yG8PZmfVOYLPaMsEOBKI FUQGKgdXxstB6eZYqDq+x8scalpY89Cci7I0oiDVGPL+/Tgq1PDdKR4egAAq+g3tcKjGCBcNels OBNLrtBozPesvyU1BHRof72LexPvdZsFFGFPXwxVSCUDUNMFwoi3N+mGG0kv00Ixa3RlWGvU+Fl mfl7aXKNqxGl1OnNrzvnvlHah/ioVsKoUrmbBuYockzHqzaXHX9h9GqZR8/vtYTHfZswqh5X4G7 rglC6QOhqx5njvAEMZ64CSbAi+pLdvNNyLj3IbpOj/l3pEIUSfCkhK2eDGmE/7AaBAN9juV7FL7 +gkiSYKCWlkf9t67+nNG9d0s1CIXUMmVjUYKxN39kNEwQS6PpscE= X-Google-Smtp-Source: AGHT+IG/sTKsKB1Hx9Lj3hb7Ae/cxqU5k59zyOWxtOA5J/MGSuugqcpPldxMFXGIcQOBYlBt42GTxw== X-Received: by 2002:a05:6a00:bd84:b0:748:f74f:6d27 with SMTP id d2e1a72fcca58-75725480a1bmr34251648b3a.24.1753145621802; Mon, 21 Jul 2025 17:53:41 -0700 (PDT) Received: from google.com (c-73-15-160-255.hsd1.ca.comcast.net. [73.15.160.255]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-759cb1546a6sm6446733b3a.92.2025.07.21.17.53.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 21 Jul 2025 17:53:41 -0700 (PDT) Date: Mon, 21 Jul 2025 17:53:39 -0700 From: Noah Misch To: Andrew Dunstan Cc: Mahendra Singh Thalor , =?iso-8859-1?Q?=C1lvaro?= Herrera , jian he , Srinath Reddy , pgsql-hackers@lists.postgresql.org Subject: Re: Non-text mode for pg_dumpall Message-ID: <20250722005339.ca.nmisch@google.com> References: <616efe2c-3986-43cf-b88c-4435849acf9e@dunslane.net> <948154fe-0278-4f4c-8f5a-085e12f03163@dunslane.net> <20250708212819.09.nmisch@google.com> <20250716001957.c6.nmisch@google.com> <3f22a8bb-29e8-40cc-97a1-309181da2c13@dunslane.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3f22a8bb-29e8-40cc-97a1-309181da2c13@dunslane.net> User-Agent: Mutt/2.2.12 (2023-09-09) List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Jul 21, 2025 at 04:41:03PM -0400, Andrew Dunstan wrote: > On 2025-07-17 Th 6:18 AM, Mahendra Singh Thalor wrote > > > > > > > --- a/src/bin/pg_dump/pg_restore.c > > > > > > > +++ b/src/bin/pg_dump/pg_restore.c > > > > > > > +/* > > > > > > > + * read_one_statement > > > > > > > + * > > > > > > > + * This will start reading from passed file pointer using fgetc and read till > > > > > > > + * semicolon(sql statement terminator for global.dat file) > > > > > > > + * > > > > > > > + * EOF is returned if end-of-file input is seen; time to shut down. > > > > > > What makes it okay to use this particular subset of SQL lexing? > > > > > To support complex syntax, we used this code from another file. > > > > I'm hearing that you copied this code from somewhere. Running > > > > "git grep 'time to shut down'" suggests you copied it from > > > > InteractiveBackend(). Is that right? I do see other similarities between > > > > read_one_statement() and InteractiveBackend(). > > > > > > > > Copying InteractiveBackend() provides negligible assurance that this is the > > > > right subset of SQL lexing. Only single-user mode uses InteractiveBackend(). > > > > Single-user mode survives mostly as a last resort for recovering from having > > > > reached xidStopLimit, is rarely used, and only superusers write queries to it. > > > Yes, we copied this from InteractiveBackend to read statements from > > > global.dat file. > > Maybe we should ensure that identifiers with CR or LF are turned into > Unicode quoted identifiers, so each SQL statement would always only occupy > one line. Interesting. That might work. > Or just reject role and tablespace names with CR or LF altogether, > just as we do for database names. There are other ways to get multi-line statements. Non-exhaustive list: - pg_db_role_setting.setconfig - pg_shdescription.description - pg_shseclabel.label - pg_tablespace.spcoptions (if we add a text option in the future) I think this decision about lexing also ties to other unfinished open item work of aligning "pg_dumpall -Fd;pg_restore [options]" behavior with "pg_dump -Fd;pg_restore [options]". "pg_restore --no-privileges" should not restore pg_tablespace.spcacl, and "pg_restore --no-comments" should not emit COMMENT statements. I suspect this is going to end with a structured dump like we use on the pg_dump (per-database) side. It's not an accident that v17 pg_restore doesn't lex text files to do its job. pg_dumpall deals with a more-limited set of statements than pg_dump deals with, but they're not _that much_ more limited. I won't veto a lexing-based approach if it gets the behaviors right, but I don't have high hopes for it getting the behaviors right and staying that way. (I almost said "pg_restore --no-owner" should not restore pg_tablespace.spcowner, but v17 "pg_dumpall --no-owner" does restore it. One could argue for or against aligning $SUBJECT behavior w/ v17's mistake there.)