public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ron Johnson <[email protected]>
To: Tom Lane <[email protected]>
Cc: Pgsql-admin <[email protected]>
Subject: Re: Request For Feature: pg_dump
Date: Fri, 22 May 2026 13:20:58 -0400
Message-ID: <CANzqJaCJGS9S=ibJ_rOmP=acA7n0XCA5R4FbMH+AH3WURhey8Q@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CANzqJaCNaqhQ9duPP+eKU2OR1wjGYW-FBQ2FZXqnBRb6XAnKQA@mail.gmail.com>
	<[email protected]>

On Fri, May 22, 2026 at 12:53 PM Tom Lane <[email protected]> wrote:

> Ron Johnson <[email protected]> writes:
> > In --format=directory mode, remove .dat files with zero data records, and
> > mark that table's toc.dat entry that it's an empty table.
>
> > Justification: *lots* of empty tables means *lots* of teeny-tiny files in
> > the DB's dump directory.  That unnecessarily bloats the fs, and makes "du
> > -c" really really slow.
>
> Evidence please?  Most file systems that I've looked at optimize
> zero-size files pretty well.
>

They aren't zero bytes.
It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) byte
files.  66 thousand tiny files plus 8 thousand files with data in them
makes for a 2.4MB directory.  That's big and slow.

$ find . -size 14c | wc
  66180   66180 1191240

$ zstd -dk 2115841.dat.zst
2115841.dat.zst     : 5 bytes

$ cat 2115841.dat
\.

$ dir | grep " 14 " | head -n20
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115841.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115842.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115843.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115844.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115845.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115851.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115899.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115901.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115902.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115903.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115905.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115907.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115909.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115913.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115915.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115917.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115919.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115923.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115926.dat.zst
-rw-r--r-- 1 postgres postgres         14 2026-05-22 00:50:30
2115931.dat.zst

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected]
  Subject: Re: Request For Feature: pg_dump
  In-Reply-To: <CANzqJaCJGS9S=ibJ_rOmP=acA7n0XCA5R4FbMH+AH3WURhey8Q@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox