public inbox for [email protected]  
help / color / mirror / Atom feed
deb package sizes
10+ messages / 5 participants
[nested] [flat]

* deb package sizes
@ 2025-01-09 08:53 Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
  2025-01-09 15:43 ` Re: deb package sizes Álvaro Hernández <[email protected]>
  0 siblings, 2 replies; 10+ messages in thread

From: Jeremy Schneider @ 2025-01-09 08:53 UTC (permalink / raw)
  To: [email protected]

Hello, I hope I found a good mailing list for this topic?

Recently, I've been spending some time looking at the official Postgres
docker images. https://hub.docker.com/_/postgres/

I think there are a lot of people using these to quickly spin up a
postgres database for testing on their local dev machine. Right now,
they are also the base image used for building CloudNativePG production
postgres images.

These official docker images are a repackaging of the PGDG debian
packages, combined with a minimal set of debian OS packages. Docker
images are built using both debian stable and debian oldstable branches
(with tags like "17.1-bookworm" and "17.1-bullseye").

With docker images, we like to get the container images to be as
minimal and small as possible. I have spent a little time looking at the
make-up of the official docker images from a size perspective, which is
driven by debian package sizes.

Before adding any PGDG postgres packages or dependencies, our base OS
container image is 74MB and includes about 88 debian packages.

We install only 5 PGDG postgres packages: postgresql,
postgresql-client, postgresql-client-common, postgresql-common and
libpq5. The "common" packages are tiny, libpq is 1MB, client is 10MB
and the postgresql package itself is 60MB.

What's more interesting is all of the additional dependencies that the
postgresql package pulls in: an extra 53 debian packages that are over
250MB in total size.

The biggest size contributors are libllvm & libz3 (143MB), libperl &
perl-modules (45MB total) and libicu (36MB). These three things alone
make up 64% of the total postgres-specific bytes.

I'm wondering if there might be any support for providing a
"postgresql-slim" package on PGDG which excludes llvm and python? I
think this might almost cut the total install size in half, and I think
there might be many users who would value having the option.

Even though ICU is a larger package, I would argue for still
including it in a "slim" build. Because of the drama around glibc
collation I view ICU as especially important to make available.

Interested to know others' thoughts about having a slimmer package.

Thanks,
Jeremy Schneider



PS. here are the commands I used to get the sizes (apologies that the
formatting isn't great) and the full list of postgresql-specific
packages


docker run --rm debian:bookworm-slim dpkg-query --show
--showformat='${Package}\t${Installed-Size} KB\n' > base-pkgs


docker run --rm postgres:17-bookworm dpkg-query --show
--showformat='${Package}\t${Installed-Size} KB\n' > pg-pkgs


docker run --rm postgres:17-bookworm apt rdepends libz3-4

libz3-4
Reverse Depends:
  Depends: libllvm16 (>= 4.8.12)


diff -b base-pkgs pg-pkgs |grep '^>'|sort -k3 -n | 
 awk '{total+=$3;printf "%-30s %s",$0,
   "| running total size: "total
   " KB | running total percentage: "total*100/355572"%\n"}'

netbase       36 KB                | running total size: 36 KB  
  | running total percentage: 0.0101245%

libkeyutils1  40 KB           | running total size: 76 KB  
  | running total percentage: 0.021374%

libnpth0      50 KB               | running total size: 126 KB  
  | running total percentage: 0.0354359% 

sensible-utils        56 KB    | running total size: 182 KB
  | running total percentage: 0.0511851% 

ssl-cert      64 KB               | running total size: 246 KB
  | running total percentage: 0.0691843%

libgdbm-compat4 70 KB        | running total size: 316 KB
  | running total percentage: 0.0888709% 

libsasl2-modules-db   77 KB    | running total size: 393 KB
  | running total percentage: 0.110526% 

readline-common     89 KB        | running total size: 482 KB
  | running total percentage: 0.135556% 

libnss-wrapper        99 KB         | running total size: 581 KB
  | running total percentage: 0.163399%

libio-pty-perl        103 KB        | running total size: 684 KB
  | running total percentage: 0.192366% 

libassuan0    117 KB  | running total size: 801 KB
  | running total percentage: 0.225271%

libgdbm6      129 KB              | running total size: 930 KB
  | running total percentage: 0.26155% 

libkrb5support0       133 KB | running total size: 1063 KB
  | running total percentage: 0.298955%

postgresql-client-common      133 KB | running total size: 1196 KB
  | running total percentage: 0.336359%

pinentry-curses       140 KB  | running total size: 1336 KB
  | running total percentage: 0.375733%

libsasl2-2    167 KB            | running total size: 1503 KB
  | running total percentage: 0.422699%

libbsd0       202 KB  | running total size: 1705 KB
  | running total percentage: 0.479509%

ucf   214 KB                   | running total size: 1919 KB
  | running total percentage: 0.539694% 

libjson-perl  244 KB          | running total size: 2163 KB
  | running total percentage: 0.608316%

libedit2      258 KB              | running total size: 2421 KB
  | running total percentage: 0.680875% 

libk5crypto3  260 KB          | running total size: 2681 KB
  | running total percentage: 0.753996%

libipc-run-perl       267 KB       | running total size: 2948 KB
  | running total percentage: 0.829087% 

less  313 KB                  | running total size: 3261 KB
  | running total percentage: 0.917114%

libksba8      316 KB              | running total size: 3577 KB
  | running total percentage: 1.00598%

libncursesw6  412 KB          | running total size: 3989 KB
  | running total percentage: 1.12185%

libgssapi-krb5-2      424 KB      | running total size: 4413 KB
  | running total percentage: 1.2411% 

libreadline8  475 KB          | running total size: 4888 KB
  | running total percentage: 1.37469%

libxslt1.1    504 KB            | running total size: 5392 KB
  | running total percentage: 1.51643% 

libldap-2.5-0 553 KB         | running total size: 5945 KB
  | running total percentage: 1.67195%

gpg-wks-server        657 KB        | running total size: 6602 KB
  | running total percentage: 1.85673% 

postgresql-common     667 KB  | running total size: 7269 KB
  | running total percentage: 2.04431%

perl  669 KB                  | running total size: 7938 KB
  | running total percentage: 2.23246% 

gpg-wks-client        682 KB  | running total size: 8620 KB
  | running total percentage: 2.42426%

gpgconf       803 KB               | running total size: 9423 KB
  | running total percentage: 2.6501% 

gnupg 885 KB                 | running total size: 10308 KB
  | running total percentage: 2.89899%

gpgsm 992 KB                 | running total size: 11300 KB
  | running total percentage: 3.17798% 

libpq5        1068 KB  | running total size: 12368 KB
  | running total percentage: 3.47834%

libkrb5-3     1076 KB            | running total size: 13444 KB
  | running total percentage: 3.78095%

xz-utils      1226 KB | running total size: 14670 KB
  | running total percentage: 4.12575%

dirmngr       1328 KB              | running total size: 15998 KB
  | running total percentage: 4.49923% 

gpg-agent     1348 KB  | running total size: 17346 KB
  | running total percentage: 4.87834%

gpg   1581 KB                  | running total size: 18927 KB
  | running total percentage: 5.32297% 

libsqlite3-0  1682 KB         | running total size: 20609 KB
  | running total percentage: 5.79601%

gnupg-utils   1836 KB          | running total size: 22445 KB
  | running total percentage: 6.31236% 

libxml2       1866 KB | running total size: 24311 KB
  | running total percentage: 6.83715%

zstd  2102 KB                 | running total size: 26413 KB
  | running total percentage: 7.42831% 

openssl       2296 KB | running total size: 28709 KB
  | running total percentage: 8.07403%

libc-l10n     4348 KB            | running total size: 33057 KB
  | running total percentage: 9.29685% 

gnupg-l10n    4874 KB           | running total size: 37931 KB
  | running total percentage: 10.6676%

libssl3       6021 KB              | running total size: 43952 KB
  | running total percentage: 12.3609% 

postgresql-client-17  9947 KB | running total size: 53899 KB
  | running total percentage: 15.1584%

locales       15845 KB             | running total size: 69744 KB
  | running total percentage: 19.6146% 

perl-modules-5.36     17816 KB  | running total size: 87560 KB
  | running total percentage: 24.6251%

libz3-4       22767 KB             | running total size: 110327 KB
  | running total percentage: 31.028% 

libperl5.36   28862 KB         | running total size: 139189 KB
  | running total percentage: 39.1451%

libicu72      36170 KB            | running total size: 175359 KB
  | running total percentage: 49.3174%

postgresql-17 59671 KB       | running total size: 235030 KB
  | running total percentage: 66.0991%

libllvm16     120542 KB          | running total size: 355572 KB
  | running total percentage: 100%






^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
@ 2025-01-09 09:07 ` Christoph Berg <[email protected]>
  2025-01-09 16:06   ` Re: deb package sizes Álvaro Hernández <[email protected]>
  1 sibling, 1 reply; 10+ messages in thread

From: Christoph Berg @ 2025-01-09 09:07 UTC (permalink / raw)
  To: [email protected]

Re: Jeremy Schneider
> I'm wondering if there might be any support for providing a
> "postgresql-slim" package on PGDG which excludes llvm and python? I
> think this might almost cut the total install size in half, and I think
> there might be many users who would value having the option.

Hi,

could you explain why 250 MB is too much? Disk space these days is
ultra cheap and removing functionality (query JITing) does have cost
as well.

> Even though ICU is a larger package, I would argue for still
> including it in a "slim" build. Because of the drama around glibc
> collation I view ICU as especially important to make available.

Note that ICU does not fix the collation drama either, you will have
to reindex on ICU upgrades as well.

Christoph





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
@ 2025-01-09 16:06   ` Álvaro Hernández <[email protected]>
  2025-01-09 17:08     ` Re: deb package sizes Jeremy Schneider <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Álvaro Hernández @ 2025-01-09 16:06 UTC (permalink / raw)
  To: Christoph Berg <[email protected]>; [email protected]



On 9/1/25 10:07, Christoph Berg wrote:
> Re: Jeremy Schneider
>> I'm wondering if there might be any support for providing a
>> "postgresql-slim" package on PGDG which excludes llvm and python? I
>> think this might almost cut the total install size in half, and I think
>> there might be many users who would value having the option.
> Hi,
>
> could you explain why 250 MB is too much? Disk space these days is
> ultra cheap

     Hi Christoph.

     Container images allow (are meant to) contain only the necessary 
files needed to run the process that will be run when the image is run. 
As such, any additional file poses two main problems:

* Disk space is cheap. Bandwidth not so much. Time to start a container 
may have a notable cost. Making container images slimmer helps in all 
these dimensions. When you run the same container image in many places, 
with high frequency, and end up pulling it multiple times, it all that 
has a cost. In particular for Postgres, time pulling and running an 
image may affect uptime. So it can become quite important.

* Security analysis. Unneeded files (specially binaries, but not only) 
may lead to container images having (more) security vulnerabilities than 
they could. For many, container images must pass vulnerability analysis 
scans, and the more (unneeded) packages present, the bigger the chances 
are that they may contain vulnerabilities. It's anyway a basic security 
principle, to only contain the files needed to run the files needed, and 
no more.

>   and removing functionality (query JITing) does have cost
> as well.

     If it can be made optional, then users can decide whether they want 
container images with this functionality or not.

>> Even though ICU is a larger package, I would argue for still
>> including it in a "slim" build. Because of the drama around glibc
>> collation I view ICU as especially important to make available.
> Note that ICU does not fix the collation drama either, you will have
> to reindex on ICU upgrades as well.

     Agreed that it doesn't solve the whole drama, but reindexes are not 
needed if container images for upgrades are provided while keeping the 
ICU version constant (which is doable).

     Álvaro






^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
  2025-01-09 16:06   ` Re: deb package sizes Álvaro Hernández <[email protected]>
@ 2025-01-09 17:08     ` Jeremy Schneider <[email protected]>
  2025-01-09 22:40       ` Re: deb package sizes Álvaro Hernández <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Jeremy Schneider @ 2025-01-09 17:08 UTC (permalink / raw)
  To: Álvaro Hernández <[email protected]>; +Cc: Christoph Berg <[email protected]>; [email protected]

On Thu, 9 Jan 2025 17:06:57 +0100
Álvaro Hernández <[email protected]> wrote:

> On 9/1/25 10:07, Christoph Berg wrote:
> > Re: Jeremy Schneider  
> >> I'm wondering if there might be any support for providing a
> >> "postgresql-slim" package on PGDG which excludes llvm and python? I
> >> think this might almost cut the total install size in half, and I
> >> think there might be many users who would value having the option.
> >>  
> > Hi,
> >
> > could you explain why 250 MB is too much? Disk space these days is
> > ultra cheap  
> 
>      Hi Christoph.
> 
>      Container images allow (are meant to) contain only the necessary 
> files needed to run the process that will be run when the image is
> run. As such, any additional file poses two main problems:
> 
> * Disk space is cheap. Bandwidth not so much. Time to start a
> 
> * Security analysis. Unneeded files (specially binaries, but not

Another concern is the impact of image rebuilds as dependencies are
updated. Tianon (a primary maintainer of the docker images) has noted
that they limit frequency of the debian base containers, because every
rebuild of the base container triggers an avalance of downstream
rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
python dependency was updated you'd get a new image - boto3 was
notorious for very frequent updates. So with a different image version
for every day, a single server running multiple copies of postgres might
easily end up with multiple image versions on the server as copies are
slowly updated.


> 
> >   and removing functionality (query JITing) does have cost
> > as well.  
> 
>      If it can be made optional, then users can decide whether they
> want container images with this functionality or not.

To be clear, I definitely don't want the "default" postgres packages to
not have JIT. I was just suggesting a non-default "slim" alternative.

Honestly I don't know if this is going to introduce a bunch of
complexity in dependency management between debian packages, and how
feasible it would be actually do it... but wanted to ask the question
and raise the topic.

> >> Even though ICU is a larger package, I would argue for still
> >> including it in a "slim" build. Because of the drama around glibc
> >> collation I view ICU as especially important to make available.  
> > Note that ICU does not fix the collation drama either, you will have
> > to reindex on ICU upgrades as well.  
> 
>      Agreed that it doesn't solve the whole drama, but reindexes are
> not needed if container images for upgrades are provided while
> keeping the ICU version constant (which is doable).

Yes, I'm definitely  well aware of how ICU isn't really changing
anything about rebuild requirement - I've said many times that people
should default to builtin C collation starting with pg17, and set
linguistic collation at a table or query level. The big advantage of
this is that it's much easier to know everything that needs rebuilding,
since postgres does good dependency tracking of objects using nondefault
collation.

But with ICU there is at least the option that someone could rebuild an
old version and run it on the new debian release. That's nearly
impossible with glibc.

-Jeremy





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
  2025-01-09 16:06   ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-09 17:08     ` Re: deb package sizes Jeremy Schneider <[email protected]>
@ 2025-01-09 22:40       ` Álvaro Hernández <[email protected]>
  2025-01-10 09:52         ` Re: deb package sizes Magnus Hagander <[email protected]>
  0 siblings, 1 reply; 10+ messages in thread

From: Álvaro Hernández @ 2025-01-09 22:40 UTC (permalink / raw)
  To: Jeremy Schneider <[email protected]>; +Cc: Christoph Berg <[email protected]>; [email protected]



On 9/1/25 18:08, Jeremy Schneider wrote:
> On Thu, 9 Jan 2025 17:06:57 +0100
> Álvaro Hernández<[email protected]>  wrote:
>
>> On 9/1/25 10:07, Christoph Berg wrote:
>>> Re: Jeremy Schneider
>>>> I'm wondering if there might be any support for providing a
>>>> "postgresql-slim" package on PGDG which excludes llvm and python? I
>>>> think this might almost cut the total install size in half, and I
>>>> think there might be many users who would value having the option.
>>>>   
>>> Hi,
>>>
>>> could you explain why 250 MB is too much? Disk space these days is
>>> ultra cheap
>>       Hi Christoph.
>>
>>       Container images allow (are meant to) contain only the necessary
>> files needed to run the process that will be run when the image is
>> run. As such, any additional file poses two main problems:
>>
>> * Disk space is cheap. Bandwidth not so much. Time to start a
>>
>> * Security analysis. Unneeded files (specially binaries, but not
> Another concern is the impact of image rebuilds as dependencies are
> updated. Tianon (a primary maintainer of the docker images) has noted
> that they limit frequency of the debian base containers, because every
> rebuild of the base container triggers an avalance of downstream
> rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
> python dependency was updated you'd get a new image - boto3 was
> notorious for very frequent updates. So with a different image version
> for every day, a single server running multiple copies of postgres might
> easily end up with multiple image versions on the server as copies are
> slowly updated.

     I see this as a symptom of a different, bigger issue: that package 
versions, and all transitive dependencies, should be version pinned when 
building container images. I haven't seen too many examples of taking 
the effort to do this. But it's the only way to have a way to re-run 
building images and guarantee outputs that are reproducible. Once you 
have this in place, you can decide how and when you upgrade which versions.

     Actually, even version pinning is not enough, unless the package 
system guarantees that a version of a package is strictly immutable (and 
AFAIK this is usually not the case). So digest pinning is essentially 
required.

> But with ICU there is at least the option that someone could rebuild an
> old version and run it on the new debian release. That's nearly
> impossible with glibc.
>

     Exactly, and this is doable.


     Álvaro


-- 

Alvaro Hernandez


-----------
OnGres


^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
  2025-01-09 16:06   ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-09 17:08     ` Re: deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 22:40       ` Re: deb package sizes Álvaro Hernández <[email protected]>
@ 2025-01-10 09:52         ` Magnus Hagander <[email protected]>
  2025-01-10 11:32           ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-21 09:26           ` Re: deb package sizes Cédric Villemain <[email protected]>
  0 siblings, 2 replies; 10+ messages in thread

From: Magnus Hagander @ 2025-01-10 09:52 UTC (permalink / raw)
  To: Álvaro Hernández <[email protected]>; +Cc: Jeremy Schneider <[email protected]>; Christoph Berg <[email protected]>; [email protected]

On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <[email protected]> wrote:

>
>
> On 9/1/25 18:08, Jeremy Schneider wrote:
>
> On Thu, 9 Jan 2025 17:06:57 +0100
> Álvaro Hernández <[email protected]> <[email protected]> wrote:
>
>
> On 9/1/25 10:07, Christoph Berg wrote:
>
> Re: Jeremy Schneider
>
> I'm wondering if there might be any support for providing a
> "postgresql-slim" package on PGDG which excludes llvm and python? I
> think this might almost cut the total install size in half, and I
> think there might be many users who would value having the option.
>
>
> Hi,
>
> could you explain why 250 MB is too much? Disk space these days is
> ultra cheap
>
>      Hi Christoph.
>
>      Container images allow (are meant to) contain only the necessary
> files needed to run the process that will be run when the image is
> run. As such, any additional file poses two main problems:
>
> * Disk space is cheap. Bandwidth not so much. Time to start a
>
> * Security analysis. Unneeded files (specially binaries, but not
>
> Another concern is the impact of image rebuilds as dependencies are
> updated. Tianon (a primary maintainer of the docker images) has noted
> that they limit frequency of the debian base containers, because every
> rebuild of the base container triggers an avalance of downstream
> rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
> python dependency was updated you'd get a new image - boto3 was
> notorious for very frequent updates. So with a different image version
> for every day, a single server running multiple copies of postgres might
> easily end up with multiple image versions on the server as copies are
> slowly updated.
>
>
>     I see this as a symptom of a different, bigger issue: that package
> versions, and all transitive dependencies, should be version pinned when
> building container images. I haven't seen too many examples of taking the
> effort to do this. But it's the only way to have a way to re-run building
> images and guarantee outputs that are reproducible. Once you have this in
> place, you can decide how and when you upgrade which versions.
>

I'm guessing most container builders are just not interested in doing that
much work. It's easier to just "always upgrade", but as noted that comes
with a whole different set of problems. It's only really feasible if you
manage to first reduce the set of dependencies substantially.



>
>     Actually, even version pinning is not enough, unless the package
> system guarantees that a version of a package is strictly immutable (and
> AFAIK this is usually not the case). So digest pinning is essentially
> required.
>

Debian (as this was talking about it) is actually doing a very good job ot
that these days, though they're not there all the way. But
https://tests.reproducible-builds.org/debian/reproducible.htmlshows they're
doing really well.


-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/;
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/;


^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
  2025-01-09 16:06   ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-09 17:08     ` Re: deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 22:40       ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-10 09:52         ` Re: deb package sizes Magnus Hagander <[email protected]>
@ 2025-01-10 11:32           ` Álvaro Hernández <[email protected]>
  2025-01-10 12:17             ` Re: deb package sizes Christoph Berg <[email protected]>
  1 sibling, 1 reply; 10+ messages in thread

From: Álvaro Hernández @ 2025-01-10 11:32 UTC (permalink / raw)
  To: Magnus Hagander <[email protected]>; +Cc: Jeremy Schneider <[email protected]>; Christoph Berg <[email protected]>; [email protected]



On 10/1/25 10:52, Magnus Hagander wrote:
> On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <[email protected]> wrote:
>
>
>
>     On 9/1/25 18:08, Jeremy Schneider wrote:
>>     On Thu, 9 Jan 2025 17:06:57 +0100
>>     Álvaro Hernández<[email protected]>  <mailto:[email protected]>  wrote:
>>
>>>     On 9/1/25 10:07, Christoph Berg wrote:
>>>>     Re: Jeremy Schneider
>>>>>     I'm wondering if there might be any support for providing a
>>>>>     "postgresql-slim" package on PGDG which excludes llvm and python? I
>>>>>     think this might almost cut the total install size in half, and I
>>>>>     think there might be many users who would value having the option.
>>>>>       
>>>>     Hi,
>>>>
>>>>     could you explain why 250 MB is too much? Disk space these days is
>>>>     ultra cheap
>>>           Hi Christoph.
>>>
>>>           Container images allow (are meant to) contain only the necessary
>>>     files needed to run the process that will be run when the image is
>>>     run. As such, any additional file poses two main problems:
>>>
>>>     * Disk space is cheap. Bandwidth not so much. Time to start a
>>>
>>>     * Security analysis. Unneeded files (specially binaries, but not
>>     Another concern is the impact of image rebuilds as dependencies are
>>     updated. Tianon (a primary maintainer of the docker images) has noted
>>     that they limit frequency of the debian base containers, because every
>>     rebuild of the base container triggers an avalance of downstream
>>     rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
>>     python dependency was updated you'd get a new image - boto3 was
>>     notorious for very frequent updates. So with a different image version
>>     for every day, a single server running multiple copies of postgres might
>>     easily end up with multiple image versions on the server as copies are
>>     slowly updated.
>
>         I see this as a symptom of a different, bigger issue: that
>     package versions, and all transitive dependencies, should be
>     version pinned when building container images. I haven't seen too
>     many examples of taking the effort to do this. But it's the only
>     way to have a way to re-run building images and guarantee outputs
>     that are reproducible. Once you have this in place, you can decide
>     how and when you upgrade which versions.
>
>
> I'm guessing most container builders are just not interested in doing 
> that much work. It's easier to just "always upgrade", but as noted 
> that comes with a whole different set of problems. It's only really 
> feasible if you manage to first reduce the set of dependencies 
> substantially.

     Yes, it comes with a whole set of problems. The main one, other 
than upgrades, is that you may end up with inconsistent environments: 
cases where not all images deployed are the same because some 
dependencies have different versions. This may also lead to different 
CVEs present on different servers. This if far from ideal and a problem 
that is starting to be more and more visible.

     While container builders may not be interested in doing all this 
work, I think that it should be done regardless. And over time, it will 
be done more and more. When security and supply-chain attacks are a 
serious concern, precise knowledge of your dependencies is key.

>
>
>         Actually, even version pinning is not enough, unless the
>     package system guarantees that a version of a package is strictly
>     immutable (and AFAIK this is usually not the case). So digest
>     pinning is essentially required.
>
>
> Debian (as this was talking about it) is actually doing a very good 
> job ot that these days, though they're not there all the way. But 
> https://tests.reproducible-builds.org/debian/reproducible.htmlshows 
> they're doing really well.

     Debian is doing a great job towards reproducibility of the build 
efforts of their packages. However, AFAIK a given package version can be 
updated with a different content --and that's why a service like 
https://snapshot.debian.org exists.


     Álvaro

-- 

Alvaro Hernandez


-----------
OnGres


^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
  2025-01-09 16:06   ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-09 17:08     ` Re: deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 22:40       ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-10 09:52         ` Re: deb package sizes Magnus Hagander <[email protected]>
  2025-01-10 11:32           ` Re: deb package sizes Álvaro Hernández <[email protected]>
@ 2025-01-10 12:17             ` Christoph Berg <[email protected]>
  0 siblings, 0 replies; 10+ messages in thread

From: Christoph Berg @ 2025-01-10 12:17 UTC (permalink / raw)
  To: Álvaro Hernández <[email protected]>; +Cc: Magnus Hagander <[email protected]>; Jeremy Schneider <[email protected]>; [email protected]

Re: Álvaro Hernández
>     Debian is doing a great job towards reproducibility of the build efforts
> of their packages. However, AFAIK a given package version can be updated
> with a different content --and that's why a service like
> https://snapshot.debian.org exists.

That will never happen, new packages always have new version/revision numbers.
Same on apt.postgresql.org.

Christoph





^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 09:07 ` Re: deb package sizes Christoph Berg <[email protected]>
  2025-01-09 16:06   ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-09 17:08     ` Re: deb package sizes Jeremy Schneider <[email protected]>
  2025-01-09 22:40       ` Re: deb package sizes Álvaro Hernández <[email protected]>
  2025-01-10 09:52         ` Re: deb package sizes Magnus Hagander <[email protected]>
@ 2025-01-21 09:26           ` Cédric Villemain <[email protected]>
  1 sibling, 0 replies; 10+ messages in thread

From: Cédric Villemain @ 2025-01-21 09:26 UTC (permalink / raw)
  To: Magnus Hagander <[email protected]>; Álvaro Hernández <[email protected]>; +Cc: Jeremy Schneider <[email protected]>; Christoph Berg <[email protected]>; [email protected]


On 10/01/2025 10:52, Magnus Hagander wrote:
> On Thu, Jan 9, 2025 at 11:40 PM Álvaro Hernández <[email protected]> wrote:
>
>
>
>     On 9/1/25 18:08, Jeremy Schneider wrote:
>>     On Thu, 9 Jan 2025 17:06:57 +0100
>>     Álvaro Hernández<[email protected]> <mailto:[email protected]> wrote:
>>
>>>     On 9/1/25 10:07, Christoph Berg wrote:
>>>>     Re: Jeremy Schneider
>>>>>     I'm wondering if there might be any support for providing a
>>>>>     "postgresql-slim" package on PGDG which excludes llvm and python? I
>>>>>     think this might almost cut the total install size in half, and I
>>>>>     think there might be many users who would value having the option.
>>>>>       
>>>>     Hi,
>>>>
>>>>     could you explain why 250 MB is too much? Disk space these days is
>>>>     ultra cheap
>>>           Hi Christoph.
>>>
>>>           Container images allow (are meant to) contain only the necessary
>>>     files needed to run the process that will be run when the image is
>>>     run. As such, any additional file poses two main problems:
>>>
>>>     * Disk space is cheap. Bandwidth not so much. Time to start a
>>>
>>>     * Security analysis. Unneeded files (specially binaries, but not
>>     Another concern is the impact of image rebuilds as dependencies are
>>     updated. Tianon (a primary maintainer of the docker images) has noted
>>     that they limit frequency of the debian base containers, because every
>>     rebuild of the base container triggers an avalance of downstream
>>     rebuilds. CNPG was doing daily rebuilds for awhile, and every time any
>>     python dependency was updated you'd get a new image - boto3 was
>>     notorious for very frequent updates. So with a different image version
>>     for every day, a single server running multiple copies of postgres might
>>     easily end up with multiple image versions on the server as copies are
>>     slowly updated.
>
>         I see this as a symptom of a different, bigger issue: that
>     package versions, and all transitive dependencies, should be
>     version pinned when building container images. I haven't seen too
>     many examples of taking the effort to do this. But it's the only
>     way to have a way to re-run building images and guarantee outputs
>     that are reproducible. Once you have this in place, you can decide
>     how and when you upgrade which versions.
>
>
> I'm guessing most container builders are just not interested in doing 
> that much work. It's easier to just "always upgrade", but as noted 
> that comes with a whole different set of problems. It's only really 
> feasible if you manage to first reduce the set of dependencies 
> substantially.
>
>
>         Actually, even version pinning is not enough, unless the
>     package system guarantees that a version of a package is strictly
>     immutable (and AFAIK this is usually not the case). So digest
>     pinning is essentially required.
>
>
> Debian (as this was talking about it) is actually doing a very good 
> job ot that these days, though they're not there all the way. But 
> https://tests.reproducible-builds.org/debian/reproducible.htmlshows 
> they're doing really well.


Also on debian.net : https://amd64.reproduce.debian.net/#postgresql-17 
for "non fancy" webpage.


There was a talk on this very topic, at minidebconf recently (by kpcyrd):

https://toulouse2024.mini.debconf.org/talks/4-reproducible-builds-rebuilding-what-is-distributed-fro...

"Since about a month we’ve also been rebuilding trying to exactly match 
the builds being distributed via ftp.d.o - this talk will describe the 
setup and the lessons learned so far, and why the results currently are 
what they are (spoiler: less <30% reproducible) and what we can do to 
fix that."

And rebuilderd is surely of interest for people willing to work on 
reproducible builds: https://github.com/kpcyrd/rebuilderd

  

---
Cédric Villemain +33 6 20 30 22 52
https://www.Data-Bene.io
PostgreSQL Support, Expertise, Training, R&D


^ permalink  raw  reply  [nested|flat] 10+ messages in thread

* Re: deb package sizes
  2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
@ 2025-01-09 15:43 ` Álvaro Hernández <[email protected]>
  1 sibling, 0 replies; 10+ messages in thread

From: Álvaro Hernández @ 2025-01-09 15:43 UTC (permalink / raw)
  To: Jeremy Schneider <[email protected]>; [email protected]



On 9/1/25 9:53, Jeremy Schneider wrote:
> Hello, I hope I found a good mailing list for this topic?
>
> Recently, I've been spending some time looking at the official Postgres
> docker images. https://hub.docker.com/_/postgres/

     Hi Jeremy.

     Nitpicking a bit, but I'd call these the "Official Docker Postgres 
images". They are official from Docker's perspective. I say this for 
general awareness, not everybody understand it's like this.

> With docker images, we like to get the container images to be as
> minimal and small as possible.

     Agreed.

>   I have spent a little time looking at the
> make-up of the official docker images from a size perspective, which is
> driven by debian package sizes.

     In my opinion, "system packages" (deb, rpm, etc) are not 
necessarily the best way to compose container images. They are designed 
for "systems", and usually contain many files that may not be needed on 
a container.

> Before adding any PGDG postgres packages or dependencies, our base OS
> container image is 74MB and includes about 88 debian packages.

     Something to consider here is using Distroless 
(https://github.com/GoogleContainerTools/distroless) which is a bit of a 
misnomer as it really it's based on Debian too.

> We install only 5 PGDG postgres packages: postgresql,
> postgresql-client, postgresql-client-common, postgresql-common and
> libpq5. The "common" packages are tiny, libpq is 1MB, client is 10MB
> and the postgresql package itself is 60MB.
>
> What's more interesting is all of the additional dependencies that the
> postgresql package pulls in: an extra 53 debian packages that are over
> 250MB in total size.
>
> The biggest size contributors are libllvm & libz3 (143MB), libperl &
> perl-modules (45MB total) and libicu (36MB). These three things alone
> make up 64% of the total postgres-specific bytes.

     While the results are not too different from your analysis, I'd do 
it from the layers that compose the image itself. Here's a simple way to 
do it:

$ docker history --no-trunc --format '{{ .Size }} {{ .CreatedBy }}' 
postgres  |egrep '^[0-9]+(\.[0-9]+)?MB'  | cut -b 1-72

330MB RUN /bin/sh -c set -ex;   export PYTHONDONTWRITEBYTECODE=1; dpkg
3.61MB RUN /bin/sh -c set -eux;  apt-get update;  apt-get install -y --n
26.9MB RUN /bin/sh -c set -eux;  if [ -f /etc/dpkg/dpkg.cfg.d/docker ];
4.27MB RUN /bin/sh -c set -eux;  savedAptMark="$(apt-mark showmanual)";
10.8MB RUN /bin/sh -c set -ex;  apt-get update;  apt-get install -y --no
85.2MB # debian.sh --arch 'amd64' out/ 'bookworm' '@1734912000'

(see attached a non-truncated version for completeness)

     "Base" image is 85MB, Postgres plus dependencies is 330MB (which 
you distilled in more detail) and then there's some other 27MB in 
locales and 11MB in additional tools.

     Also to note is that Docker's official Postgres image compiles from 
source packages, not just installs from PGDG (e.g. see 
https://github.com/docker-library/postgres/blob/cb049360d9a316e429740d47431e0d6fa129d11a/17/bookworm...).

> I'm wondering if there might be any support for providing a
> "postgresql-slim" package on PGDG which excludes llvm and python? I
> think this might almost cut the total install size in half, and I think
> there might be many users who would value having the option.
>
> Even though ICU is a larger package, I would argue for still
> including it in a "slim" build. Because of the drama around glibc
> collation I view ICU as especially important to make available.
>
> Interested to know others' thoughts about having a slimmer package.

     +1

     I believe there should be place for slimmer, or even better, 
user-configurable Postgres images. Different use cases need different 
containers. Postgres on testcontainers use case needs little to no 
additional features, while a production setup may require different 
additional tools. Similarly, different environments (ICU / not ICU, sets 
of locales, parallel query or not) may require different images. Having 
choice here would be of great benefit.


     Álvaro

-- 

Alvaro Hernandez


-----------
OnGres

330MB RUN /bin/sh -c set -ex;   export PYTHONDONTWRITEBYTECODE=1;   dpkgArch="$(dpkg --print-architecture)";  aptRepo="[ signed-by=/usr/local/share/keyrings/postgres.gpg.asc ] http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main $PG_MAJOR";  case "$dpkgArch" in   amd64 | arm64 | ppc64el | s390x)    echo "deb $aptRepo" > /etc/apt/sources.list.d/pgdg.list;    apt-get update;    ;;   *)    echo "deb-src $aptRepo" > /etc/apt/sources.list.d/pgdg.list;       savedAptMark="$(apt-mark showmanual)";       tempDir="$(mktemp -d)";    cd "$tempDir";       apt-get update;    apt-get install -y --no-install-recommends dpkg-dev;    echo "deb [ trusted=yes ] file://$tempDir ./" > /etc/apt/sources.list.d/temp.list;    _update_repo() {     dpkg-scanpackages . > Packages;     apt-get -o Acquire::GzipIndexes=false update;    };    _update_repo;       nproc="$(nproc)";    export DEB_BUILD_OPTIONS="nocheck parallel=$nproc";    apt-get build-dep -y postgresql-common pgdg-keyring;    apt-get source --compile postgresql-common pgdg-keyring;    _update_repo;    apt-get build-dep -y "postgresql-$PG_MAJOR=$PG_VERSION";    apt-get source --compile "postgresql-$PG_MAJOR=$PG_VERSION";          apt-mark showmanual | xargs apt-mark auto > /dev/null;    apt-mark manual $savedAptMark;       ls -lAFh;    _update_repo;    grep '^Package: ' Packages;    cd /;    ;;  esac;   apt-get install -y --no-install-recommends postgresql-common;  sed -ri 's/#(create_main_cluster) .*$/\1 = false/' /etc/postgresql-common/createcluster.conf;  apt-get install -y --no-install-recommends   "postgresql-$PG_MAJOR=$PG_VERSION"  ;   rm -rf /var/lib/apt/lists/*;   if [ -n "$tempDir" ]; then   apt-get purge -y --auto-remove;   rm -rf "$tempDir" /etc/apt/sources.list.d/temp.list;  fi;   find /usr -name '*.pyc' -type f -exec bash -c 'for pyc; do dpkg -S "$pyc" &> /dev/null || rm -vf "$pyc"; done' -- '{}' +;   postgres --version # buildkit
3.61MB RUN /bin/sh -c set -eux;  apt-get update;  apt-get install -y --no-install-recommends   libnss-wrapper   xz-utils   zstd  ;  rm -rf /var/lib/apt/lists/* # buildkit
26.9MB RUN /bin/sh -c set -eux;  if [ -f /etc/dpkg/dpkg.cfg.d/docker ]; then   grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker;   sed -ri '/\/usr\/share\/locale/d' /etc/dpkg/dpkg.cfg.d/docker;   ! grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker;  fi;  apt-get update; apt-get install -y --no-install-recommends locales; rm -rf /var/lib/apt/lists/*;  echo 'en_US.UTF-8 UTF-8' >> /etc/locale.gen;  locale-gen;  locale -a | grep 'en_US.utf8' # buildkit
4.27MB RUN /bin/sh -c set -eux;  savedAptMark="$(apt-mark showmanual)";  apt-get update;  apt-get install -y --no-install-recommends ca-certificates wget;  rm -rf /var/lib/apt/lists/*;  dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')";  wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch";;  wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch.asc";;  export GNUPGHOME="$(mktemp -d)";  gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4;  gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu;  gpgconf --kill all;  rm -rf "$GNUPGHOME" /usr/local/bin/gosu.asc;  apt-mark auto '.*' > /dev/null;  [ -z "$savedAptMark" ] || apt-mark manual $savedAptMark > /dev/null;  apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false;  chmod +x /usr/local/bin/gosu;  gosu --version;  gosu nobody true # buildkit
10.8MB RUN /bin/sh -c set -ex;  apt-get update;  apt-get install -y --no-install-recommends   gnupg   less  ;  rm -rf /var/lib/apt/lists/* # buildkit
85.2MB # debian.sh --arch 'amd64' out/ 'bookworm' '@1734912000'


Attachments:

  [text/plain] postgres-image.layers.size.txt (3.7K, 2-postgres-image.layers.size.txt)
  download | inline:
330MB RUN /bin/sh -c set -ex;   export PYTHONDONTWRITEBYTECODE=1;   dpkgArch="$(dpkg --print-architecture)";  aptRepo="[ signed-by=/usr/local/share/keyrings/postgres.gpg.asc ] http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main $PG_MAJOR";  case "$dpkgArch" in   amd64 | arm64 | ppc64el | s390x)    echo "deb $aptRepo" > /etc/apt/sources.list.d/pgdg.list;    apt-get update;    ;;   *)    echo "deb-src $aptRepo" > /etc/apt/sources.list.d/pgdg.list;       savedAptMark="$(apt-mark showmanual)";       tempDir="$(mktemp -d)";    cd "$tempDir";       apt-get update;    apt-get install -y --no-install-recommends dpkg-dev;    echo "deb [ trusted=yes ] file://$tempDir ./" > /etc/apt/sources.list.d/temp.list;    _update_repo() {     dpkg-scanpackages . > Packages;     apt-get -o Acquire::GzipIndexes=false update;    };    _update_repo;       nproc="$(nproc)";    export DEB_BUILD_OPTIONS="nocheck parallel=$nproc";    apt-get build-dep -y postgresql-common pgdg-keyring;    apt-get source --compile postgresql-common pgdg-keyring;    _update_repo;    apt-get build-dep -y "postgresql-$PG_MAJOR=$PG_VERSION";    apt-get source --compile "postgresql-$PG_MAJOR=$PG_VERSION";          apt-mark showmanual | xargs apt-mark auto > /dev/null;    apt-mark manual $savedAptMark;       ls -lAFh;    _update_repo;    grep '^Package: ' Packages;    cd /;    ;;  esac;   apt-get install -y --no-install-recommends postgresql-common;  sed -ri 's/#(create_main_cluster) .*$/\1 = false/' /etc/postgresql-common/createcluster.conf;  apt-get install -y --no-install-recommends   "postgresql-$PG_MAJOR=$PG_VERSION"  ;   rm -rf /var/lib/apt/lists/*;   if [ -n "$tempDir" ]; then   apt-get purge -y --auto-remove;   rm -rf "$tempDir" /etc/apt/sources.list.d/temp.list;  fi;   find /usr -name '*.pyc' -type f -exec bash -c 'for pyc; do dpkg -S "$pyc" &> /dev/null || rm -vf "$pyc"; done' -- '{}' +;   postgres --version # buildkit
3.61MB RUN /bin/sh -c set -eux;  apt-get update;  apt-get install -y --no-install-recommends   libnss-wrapper   xz-utils   zstd  ;  rm -rf /var/lib/apt/lists/* # buildkit
26.9MB RUN /bin/sh -c set -eux;  if [ -f /etc/dpkg/dpkg.cfg.d/docker ]; then   grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker;   sed -ri '/\/usr\/share\/locale/d' /etc/dpkg/dpkg.cfg.d/docker;   ! grep -q '/usr/share/locale' /etc/dpkg/dpkg.cfg.d/docker;  fi;  apt-get update; apt-get install -y --no-install-recommends locales; rm -rf /var/lib/apt/lists/*;  echo 'en_US.UTF-8 UTF-8' >> /etc/locale.gen;  locale-gen;  locale -a | grep 'en_US.utf8' # buildkit
4.27MB RUN /bin/sh -c set -eux;  savedAptMark="$(apt-mark showmanual)";  apt-get update;  apt-get install -y --no-install-recommends ca-certificates wget;  rm -rf /var/lib/apt/lists/*;  dpkgArch="$(dpkg --print-architecture | awk -F- '{ print $NF }')";  wget -O /usr/local/bin/gosu "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch";  wget -O /usr/local/bin/gosu.asc "https://github.com/tianon/gosu/releases/download/$GOSU_VERSION/gosu-$dpkgArch.asc";  export GNUPGHOME="$(mktemp -d)";  gpg --batch --keyserver hkps://keys.openpgp.org --recv-keys B42F6819007F00F88E364FD4036A9C25BF357DD4;  gpg --batch --verify /usr/local/bin/gosu.asc /usr/local/bin/gosu;  gpgconf --kill all;  rm -rf "$GNUPGHOME" /usr/local/bin/gosu.asc;  apt-mark auto '.*' > /dev/null;  [ -z "$savedAptMark" ] || apt-mark manual $savedAptMark > /dev/null;  apt-get purge -y --auto-remove -o APT::AutoRemove::RecommendsImportant=false;  chmod +x /usr/local/bin/gosu;  gosu --version;  gosu nobody true # buildkit
10.8MB RUN /bin/sh -c set -ex;  apt-get update;  apt-get install -y --no-install-recommends   gnupg   less  ;  rm -rf /var/lib/apt/lists/* # buildkit
85.2MB # debian.sh --arch 'amd64' out/ 'bookworm' '@1734912000'

^ permalink  raw  reply  [nested|flat] 10+ messages in thread


end of thread, other threads:[~2025-01-21 09:26 UTC | newest]

Thread overview: 10+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-01-09 08:53 deb package sizes Jeremy Schneider <[email protected]>
2025-01-09 09:07 ` Christoph Berg <[email protected]>
2025-01-09 16:06   ` Álvaro Hernández <[email protected]>
2025-01-09 17:08     ` Jeremy Schneider <[email protected]>
2025-01-09 22:40       ` Álvaro Hernández <[email protected]>
2025-01-10 09:52         ` Magnus Hagander <[email protected]>
2025-01-10 11:32           ` Álvaro Hernández <[email protected]>
2025-01-10 12:17             ` Christoph Berg <[email protected]>
2025-01-21 09:26           ` Cédric Villemain <[email protected]>
2025-01-09 15:43 ` Álvaro Hernández <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox