public inbox for [email protected]help / color / mirror / Atom feed
Request For Feature: pg_dump 5+ messages / 3 participants [nested] [flat]
* Request For Feature: pg_dump @ 2026-05-22 13:32 Ron Johnson <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Ron Johnson @ 2026-05-22 13:32 UTC (permalink / raw) To: Pgsql-admin <[email protected]> In --format=directory mode, remove .dat files with zero data records, and mark that table's toc.dat entry that it's an empty table. Justification: *lots* of empty tables means *lots* of teeny-tiny files in the DB's dump directory. That unnecessarily bloats the fs, and makes "du -c" really really slow. But why are there sooo many empty tables? You shouldn't have so many empty tables! Yeah, well, software (especially 3rd-party software that must be generic to satisfy the varying needs of a large and varied customer base) can't always be perfectly tuned to the precise and immediate needs of a particular site. Partitioning makes that much much worse. We've survived this long without it, but pgbackrest has a similar feature (though implemented differently from how pg_dump would do it), and it's *really* handy. -- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive. <Redacted> lobster! ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Request For Feature: pg_dump @ 2026-05-22 16:52 Tom Lane <[email protected]> parent: Ron Johnson <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Tom Lane @ 2026-05-22 16:52 UTC (permalink / raw) To: Ron Johnson <[email protected]>; +Cc: Pgsql-admin <[email protected]> Ron Johnson <[email protected]> writes: > In --format=directory mode, remove .dat files with zero data records, and > mark that table's toc.dat entry that it's an empty table. > Justification: *lots* of empty tables means *lots* of teeny-tiny files in > the DB's dump directory. That unnecessarily bloats the fs, and makes "du > -c" really really slow. Evidence please? Most file systems that I've looked at optimize zero-size files pretty well. regards, tom lane ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Request For Feature: pg_dump @ 2026-05-22 17:20 Ron Johnson <[email protected]> parent: Tom Lane <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Ron Johnson @ 2026-05-22 17:20 UTC (permalink / raw) To: Tom Lane <[email protected]>; +Cc: Pgsql-admin <[email protected]> On Fri, May 22, 2026 at 12:53 PM Tom Lane <[email protected]> wrote: > Ron Johnson <[email protected]> writes: > > In --format=directory mode, remove .dat files with zero data records, and > > mark that table's toc.dat entry that it's an empty table. > > > Justification: *lots* of empty tables means *lots* of teeny-tiny files in > > the DB's dump directory. That unnecessarily bloats the fs, and makes "du > > -c" really really slow. > > Evidence please? Most file systems that I've looked at optimize > zero-size files pretty well. > They aren't zero bytes. It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) byte files. 66 thousand tiny files plus 8 thousand files with data in them makes for a 2.4MB directory. That's big and slow. $ find . -size 14c | wc 66180 66180 1191240 $ zstd -dk 2115841.dat.zst 2115841.dat.zst : 5 bytes $ cat 2115841.dat \. $ dir | grep " 14 " | head -n20 -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115841.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115842.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115843.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115844.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115845.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115851.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115899.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115901.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115902.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115903.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115905.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115907.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115909.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115913.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115915.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115917.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115919.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115923.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115926.dat.zst -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 2115931.dat.zst -- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive. <Redacted> lobster! ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Request For Feature: pg_dump @ 2026-05-22 18:09 Holger Jakobs <[email protected]> parent: Ron Johnson <[email protected]> 0 siblings, 1 reply; 5+ messages in thread From: Holger Jakobs @ 2026-05-22 18:09 UTC (permalink / raw) To: [email protected] Am 22.05.26 um 19:20 schrieb Ron Johnson: > On Fri, May 22, 2026 at 12:53 PM Tom Lane <[email protected]> wrote: > > Ron Johnson <[email protected]> writes: > > In --format=directory mode, remove .dat files with zero data > records, and > > mark that table's toc.dat entry that it's an empty table. > > > Justification: *lots* of empty tables means *lots* of teeny-tiny > files in > > the DB's dump directory. That unnecessarily bloats the fs, and > makes "du > > -c" really really slow. > > Evidence please? Most file systems that I've looked at optimize > zero-size files pretty well. > > > They aren't zero bytes. > It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) > byte files. 66 thousand tiny files plus 8 thousand files with data in > them makes for a 2.4MB directory. That's big and slow. > > $ find . -size 14c | wc > 66180 66180 1191240 > > $ zstd -dk 2115841.dat.zst > 2115841.dat.zst : 5 bytes > > $ cat 2115841.dat > \. > > $ dir | grep " 14 " | head -n20 > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115841.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115842.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115843.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115844.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115845.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115851.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115899.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115901.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115902.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115903.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115905.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115907.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115909.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115913.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115915.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115917.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115919.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115923.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115926.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115931.dat.zst > -- > Death to <Redacted>, and butter sauce. > Don't boil me, I'm still alive. > <Redacted> lobster! Maybe just avoiding to compress empty files would already do the job. I think any file below a certain size isn't worth compressing. Regards, Holger -- Holger Jakobs ^ permalink raw reply [nested|flat] 5+ messages in thread
* Re: Request For Feature: pg_dump @ 2026-05-22 18:41 Ron Johnson <[email protected]> parent: Holger Jakobs <[email protected]> 0 siblings, 0 replies; 5+ messages in thread From: Ron Johnson @ 2026-05-22 18:41 UTC (permalink / raw) To: Pgsql-admin <[email protected]> On Fri, May 22, 2026 at 2:09 PM Holger Jakobs <[email protected]> wrote: > Am 22.05.26 um 19:20 schrieb Ron Johnson: > > On Fri, May 22, 2026 at 12:53 PM Tom Lane <[email protected]> wrote: > >> Ron Johnson <[email protected]> writes: >> > In --format=directory mode, remove .dat files with zero data records, >> and >> > mark that table's toc.dat entry that it's an empty table. >> >> > Justification: *lots* of empty tables means *lots* of teeny-tiny files >> in >> > the DB's dump directory. That unnecessarily bloats the fs, and makes >> "du >> > -c" really really slow. >> >> Evidence please? Most file systems that I've looked at optimize >> zero-size files pretty well. >> > > They aren't zero bytes. > It's those pesky 5 (or 14 or whatever size that gzip and lz4 produces) > byte files. 66 thousand tiny files plus 8 thousand files with data in them > makes for a 2.4MB directory. That's big and slow. > > $ find . -size 14c | wc > 66180 66180 1191240 > > $ zstd -dk 2115841.dat.zst > 2115841.dat.zst : 5 bytes > > $ cat 2115841.dat > \. > > $ dir | grep " 14 " | head -n20 > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115841.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115842.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115843.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115844.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115845.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115851.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115899.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115901.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115902.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115903.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115905.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115907.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115909.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115913.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115915.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115917.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115919.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115923.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115926.dat.zst > -rw-r--r-- 1 postgres postgres 14 2026-05-22 00:50:30 > 2115931.dat.zst > > -- > Death to <Redacted>, and butter sauce. > Don't boil me, I'm still alive. > <Redacted> lobster! > > Maybe just avoiding to compress empty files would already do the job. > The files aren't empty, though, since they have the terminating "\." > I think any file below a certain size isn't worth compressing. > -- Death to <Redacted>, and butter sauce. Don't boil me, I'm still alive. <Redacted> lobster! ^ permalink raw reply [nested|flat] 5+ messages in thread
end of thread, other threads:[~2026-05-22 18:41 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2026-05-22 13:32 Request For Feature: pg_dump Ron Johnson <[email protected]> 2026-05-22 16:52 ` Tom Lane <[email protected]> 2026-05-22 17:20 ` Ron Johnson <[email protected]> 2026-05-22 18:09 ` Holger Jakobs <[email protected]> 2026-05-22 18:41 ` Ron Johnson <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox