Re: Estimating HugePages Requirements?

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Don Seiler <[email protected]>
To: Justin Pryzby <[email protected]>
Cc: P C <[email protected]>
Cc: Magnus Hagander <[email protected]>
Cc: Julien Rouhaud <[email protected]>
Cc: Tom Lane <[email protected]>
Cc: pgsql-admin <[email protected]>
Subject: Re: Estimating HugePages Requirements?
Date: Mon, 14 Jun 2021 09:16:39 -0500
Message-ID: <CAHJZqBAZ+SYR4jZ-Jy5nHYwUP3vYF+UjPGKwCR+gZm0z8vyoag@mail.gmail.com> (raw)
In-Reply-To: <[email protected]>
References: <CAHJZqBBLHFNs6it-fcJ6LEUXeC5t73soR3h50zUSFpg7894qfQ@mail.gmail.com>
	<CAOBaU_ZHtRhG+MvoT6=HpbMoK=JnsJHBMmxQ-XLSVoh6fFqJHQ@mail.gmail.com>
	<CABUevExXvoPvLN70CznmQfbjwxnrdXo9gXxZwGpBoUhjtFi3Ng@mail.gmail.com>
	<[email protected]>
	<CABUevEzn7fZmZ-8m4ph0wBa1bqz32ghWh-vjhf+Vh0OjW9EdQw@mail.gmail.com>
	<[email protected]>
	<CABUevEzwS3zG3X6dpSZTy-u1LoJ8kRS_A493mFdMii3_AhX-Ew@mail.gmail.com>
	<CADrzpjFQ8awR62Y0GC7K=ohtnBeAL06jkuMHqh6neCF3H89jMw@mail.gmail.com>
	<CAHJZqBATodSGXZ2vD_4efKdmAdaN0ucP=m93KL7Xmf5jqNzvYw@mail.gmail.com>
	<[email protected]>

On Thu, Jun 10, 2021 at 7:23 PM Justin Pryzby <[email protected]> wrote:

> On Wed, Jun 09, 2021 at 10:55:08PM -0500, Don Seiler wrote:
> > On Wed, Jun 9, 2021, 21:03 P C <[email protected]> wrote:
> >
> > > I agree, its confusing for many and that confusion arises from the fact
> > > that you usually talk of shared_buffers in MB or GB whereas hugepages
> have
> > > to be configured in units of 2mb. But once they understand they
> realize its
> > > pretty simple.
> > >
> > > Don, we have experienced the same not just with postgres but also with
> > > oracle. I havent been able to get to the root of it, but what we
> usually do
> > > is, we add another 100-200 pages and that works for us. If the SGA or
> > > shared_buffers is high eg 96gb, then we add 250-500 pages. Those few
> > > hundred MBs  may be wasted (because the moment you configure
> hugepages, the
> > > operating system considers it as used and does not use it any more) but
> > > nowadays, servers have 64 or 128 gb RAM easily and wasting that 500mb
> to
> > > 1gb does not hurt really.
> >
> > I don't have a problem with the math, just wanted to know if it was
> > possible to better estimate what the actual requirements would be at
> > deployment time. My fallback will probably be you did and just pad with
> an
> > extra 512MB by default.
>
> It's because the huge allocation isn't just shared_buffers, but also
> wal_buffers:
>
> | The amount of shared memory used for WAL data that has not yet been
> written to disk.
> | The default setting of -1 selects a size equal to 1/32nd (about 3%) of
> shared_buffers, ...
>
> .. and other stuff:
>
> src/backend/storage/ipc/ipci.c
>          * Size of the Postgres shared-memory block is estimated via
>          * moderately-accurate estimates for the big hogs, plus 100K for
> the
>          * stuff that's too small to bother with estimating.
>          *
>          * We take some care during this phase to ensure that the total
> size
>          * request doesn't overflow size_t.  If this gets through, we don't
>          * need to be so careful during the actual allocation phase.
>          */
>         size = 100000;
>         size = add_size(size, PGSemaphoreShmemSize(numSemas));
>         size = add_size(size, SpinlockSemaSize());
>         size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
>
>                sizeof(ShmemIndexEnt)));
>         size = add_size(size, dsm_estimate_size());
>         size = add_size(size, BufferShmemSize());
>         size = add_size(size, LockShmemSize());
>         size = add_size(size, PredicateLockShmemSize());
>         size = add_size(size, ProcGlobalShmemSize());
>         size = add_size(size, XLOGShmemSize());
>         size = add_size(size, CLOGShmemSize());
>         size = add_size(size, CommitTsShmemSize());
>         size = add_size(size, SUBTRANSShmemSize());
>         size = add_size(size, TwoPhaseShmemSize());
>         size = add_size(size, BackgroundWorkerShmemSize());
>         size = add_size(size, MultiXactShmemSize());
>         size = add_size(size, LWLockShmemSize());
>         size = add_size(size, ProcArrayShmemSize());
>         size = add_size(size, BackendStatusShmemSize());
>         size = add_size(size, SInvalShmemSize());
>         size = add_size(size, PMSignalShmemSize());
>         size = add_size(size, ProcSignalShmemSize());
>         size = add_size(size, CheckpointerShmemSize());
>         size = add_size(size, AutoVacuumShmemSize());
>         size = add_size(size, ReplicationSlotsShmemSize());
>         size = add_size(size, ReplicationOriginShmemSize());
>         size = add_size(size, WalSndShmemSize());
>         size = add_size(size, WalRcvShmemSize());
>         size = add_size(size, PgArchShmemSize());
>         size = add_size(size, ApplyLauncherShmemSize());
>         size = add_size(size, SnapMgrShmemSize());
>         size = add_size(size, BTreeShmemSize());
>         size = add_size(size, SyncScanShmemSize());
>         size = add_size(size, AsyncShmemSize());
> #ifdef EXEC_BACKEND
>         size = add_size(size, ShmemBackendArraySize());
> #endif
>
>         /* freeze the addin request size and include it */
>         addin_request_allowed = false;
>         size = add_size(size, total_addin_request);
>
>         /* might as well round it off to a multiple of a typical page size
> */
>         size = add_size(size, 8192 - (size % 8192));
>
> BTW, I think it'd be nice if this were a NOTICE:
> | elog(DEBUG1, "mmap(%zu) with MAP_HUGETLB failed, huge pages disabled:
> %m", allocsize);
>

Great detail. I did some trial and error around just a few variables
(shared_buffers, wal_buffers, max_connections) and came up with a formula
that seems to be "good enough" for at least a rough default estimate.

The pseudo-code is basically:

ceiling((shared_buffers + 200 + (25 * shared_buffers/1024) +
10*(max_connections-100)/200 + wal_buffers-16)/2)

This assumes that all values are in MB and that wal_buffers is set to a
value other than the default of -1 obviously. I decided to default
wal_buffers to 16MB in our environments since that's what -1 should go to
based on the description in the documentation for an instance with
shared_buffers of the sizes in our deployments.

This formula did come up a little short (2MB) when I had a low
shared_buffers value at 2GB. Raising that starting 200 value to something
like 250 would take care of that. The limited testing I did based on
different values we see across our production deployments worked otherwise.
Please let me know what you folks think. I know I'm ignoring a lot of other
factors, especially given what Justin recently shared.

The remaining trick for me now is to calculate this in chef since
shared_buffers and wal_buffers attributes are strings with the unit ("MB")
in them, rather than just numerical values. Thinking of changing that
attribute to be just that and assume/require MB to make the calculations
easier.

-- 
Don Seiler
www.seiler.us

view thread (108+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Estimating HugePages Requirements?
  In-Reply-To: <CAHJZqBAZ+SYR4jZ-Jy5nHYwUP3vYF+UjPGKwCR+gZm0z8vyoag@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox