public inbox for [email protected]
help / color / mirror / Atom feedFrom: Bossart, Nathan <[email protected]>
To: Magnus Hagander <[email protected]>
To: Michael Paquier <[email protected]>
Cc: Andres Freund <[email protected]>
Cc: Mark Dilger <[email protected]>
Cc: Don Seiler <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: Estimating HugePages Requirements?
Date: Fri, 27 Aug 2021 18:16:01 +0000
Message-ID: <[email protected]> (raw)
In-Reply-To: <CABUevEwGHcsmp4rgZdcbinZOdEs_cSCabihDvRty=0zz1H95kw@mail.gmail.com>
References: <CAHJZqBBLHFNs6it-fcJ6LEUXeC5t73soR3h50zUSFpg7894qfQ@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<[email protected]>
<CABUevEwGHcsmp4rgZdcbinZOdEs_cSCabihDvRty=0zz1H95kw@mail.gmail.com>
On 8/27/21, 7:41 AM, "Magnus Hagander" <[email protected]> wrote:
> On Fri, Aug 27, 2021 at 8:46 AM Michael Paquier <[email protected]> wrote:
>> On Wed, Aug 11, 2021 at 11:23:52PM +0000, Bossart, Nathan wrote:
>> > While testing this new option, I noticed that you can achieve similar
>> > results today with the following command, although this one will
>> > actually try to create the shared memory, too.
>>
>> That may not be the best option.
>
> I would say that can be a disastrous option.
>
> First of all it would probably not work if you already have something
> running -- especially when using huge pages. And if it does work, in
> that or other scenarios, it can potentially have significant impact on
> a running cluster to suddenly allocate many GB of more memory...
The v3 patch actually didn't work if the server was already running.
I removed that restriction in v4.
>> > IMO the new option is still handy, but I can see the argument that it
>> > might not be necessary.
>>
>> A separate option looks handy. Wouldn't it be better to document it
>> in postgres-ref.sgml then?
>
> I'd say a lot more than just handy. I don't think the workaround is
> really all that useful.
I added some documentation in v4.
Nathan
Attachments:
[application/octet-stream] v4-0001-introduce-option-for-retreiving-shmem-size.patch (13.5K, 2-v4-0001-introduce-option-for-retreiving-shmem-size.patch)
download | inline diff:
From dff1d2b57a65ada412386b3dce2f1cf6d8409cac Mon Sep 17 00:00:00 2001
From: Nathan Bossart <[email protected]>
Date: Fri, 27 Aug 2021 17:54:26 +0000
Subject: [PATCH v4 1/1] introduce option for retreiving shmem size
---
doc/src/sgml/ref/postgres-ref.sgml | 17 +++++
doc/src/sgml/runtime.sgml | 17 ++---
src/backend/bootstrap/bootstrap.c | 25 +++++--
src/backend/main/main.c | 6 +-
src/backend/storage/ipc/ipci.c | 142 ++++++++++++++++++++++---------------
src/include/bootstrap/bootstrap.h | 2 +-
src/include/storage/ipc.h | 1 +
7 files changed, 132 insertions(+), 78 deletions(-)
diff --git a/doc/src/sgml/ref/postgres-ref.sgml b/doc/src/sgml/ref/postgres-ref.sgml
index 4aaa7abe1a..da20fb559a 100644
--- a/doc/src/sgml/ref/postgres-ref.sgml
+++ b/doc/src/sgml/ref/postgres-ref.sgml
@@ -280,6 +280,23 @@ PostgreSQL documentation
</listitem>
</varlistentry>
+ <varlistentry>
+ <term><option>--output-shmem</option></term>
+ <listitem>
+ <para>
+ Prints the amount of shared memory required for the current
+ configuration and exits. This can be used on a running server.
+ This must be the first argument on the command line.
+ </para>
+
+ <para>
+ This option is useful for determining the number of huge pages needed
+ for the server. For more information, see
+ <xref linkend="linux-huge-pages"/>.
+ </para>
+ </listitem>
+ </varlistentry>
+
<varlistentry>
<term><option>-p <replaceable class="parameter">port</replaceable></option></term>
<listitem>
diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml
index f1cbc1d9e9..535eacadb2 100644
--- a/doc/src/sgml/runtime.sgml
+++ b/doc/src/sgml/runtime.sgml
@@ -1442,17 +1442,14 @@ export PG_OOM_ADJUST_VALUE=0
with <varname>CONFIG_HUGETLBFS=y</varname> and
<varname>CONFIG_HUGETLB_PAGE=y</varname>. You will also have to configure
the operating system to provide enough huge pages of the desired size.
- To estimate the number of huge pages needed, start
- <productname>PostgreSQL</productname> without huge pages enabled and check
- the postmaster's anonymous shared memory segment size, as well as the
- system's default and supported huge page sizes, using the
- <filename>/proc</filename> and <filename>/sys</filename> file systems.
+ To estimate the number of huge pages needed, use the
+ <command>postgres</command> command to determine the amount of shared memory
+ needed, and use the <filename>/proc</filename> and <filename>/sys</filename>
+ file systems to find the system's default and supported huge page sizes.
This might look like:
<programlisting>
-$ <userinput>head -1 $PGDATA/postmaster.pid</userinput>
-4170
-$ <userinput>pmap 4170 | awk '/rw-s/ && /zero/ {print $2}'</userinput>
-6490428K
+$ <userinput>postgres --output-shmem -D $PGDATA</userinput>
+6646198272 bytes
$ <userinput>grep ^Hugepagesize /proc/meminfo</userinput>
Hugepagesize: 2048 kB
$ <userinput>ls /sys/kernel/mm/hugepages</userinput>
@@ -1463,7 +1460,7 @@ hugepages-1048576kB hugepages-2048kB
either 2MB or 1GB with <xref linkend="guc-huge-page-size"/>.
Assuming <literal>2MB</literal> huge pages,
- <literal>6490428</literal> / <literal>2048</literal> gives approximately
+ <literal>6646198272</literal> / <literal>2097152</literal> gives approximately
<literal>3169.154</literal>, so in this example we need at
least <literal>3170</literal> huge pages. A larger setting would be
appropriate if other programs on the machine also need huge pages.
diff --git a/src/backend/bootstrap/bootstrap.c b/src/backend/bootstrap/bootstrap.c
index 48615c0ebc..5bb491da8b 100644
--- a/src/backend/bootstrap/bootstrap.c
+++ b/src/backend/bootstrap/bootstrap.c
@@ -198,9 +198,13 @@ CheckerModeMain(void)
* the current configuration, particularly the passed in options pertaining
* to shared memory sizing, options work (or at least do not cause an error
* up to shared memory creation).
+ *
+ * When output_shmem is true, startup is done only far enough to calculate the
+ * amount of shared memory required for the current configuration. The result
+ * of this calculation is printed.
*/
void
-BootstrapModeMain(int argc, char *argv[], bool check_only)
+BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem)
{
int i;
char *progname = argv[0];
@@ -214,10 +218,11 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
/* Set defaults, to be overridden by explicit options below */
InitializeGUCOptions();
- /* an initial --boot or --check should be present */
+ /* an initial --boot, --check, or --output-shmem should be present */
Assert(argc > 1
&& (strcmp(argv[1], "--boot") == 0
- || strcmp(argv[1], "--check") == 0));
+ || strcmp(argv[1], "--check") == 0
+ || strcmp(argv[1], "--output-shmem") == 0));
argv++;
argc--;
@@ -317,21 +322,29 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
checkDataDir();
ChangeToDataDir();
- CreateDataDirLockFile(false);
+ /*
+ * We do not create the lock file when --output-shmem is specified so
+ * that it can be used while the server is running.
+ */
+ if (!output_shmem)
+ CreateDataDirLockFile(false);
SetProcessingMode(BootstrapProcessing);
IgnoreSystemIndexes = true;
InitializeMaxBackends();
- CreateSharedMemoryAndSemaphores();
+ if (output_shmem)
+ printf("%zu bytes\n", CalculateShmemSize(NULL));
+ else
+ CreateSharedMemoryAndSemaphores();
/*
* XXX: It might make sense to move this into its own function at some
* point. Right now it seems like it'd cause more code duplication than
* it's worth.
*/
- if (check_only)
+ if (check_only || output_shmem)
{
SetProcessingMode(NormalProcessing);
CheckerModeMain();
diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 3a2a0d598c..c141ae3d1c 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -182,9 +182,11 @@ main(int argc, char *argv[])
*/
if (argc > 1 && strcmp(argv[1], "--check") == 0)
- BootstrapModeMain(argc, argv, true);
+ BootstrapModeMain(argc, argv, true, false);
+ else if (argc > 1 && strcmp(argv[1], "--output-shmem") == 0)
+ BootstrapModeMain(argc, argv, false, true);
else if (argc > 1 && strcmp(argv[1], "--boot") == 0)
- BootstrapModeMain(argc, argv, false);
+ BootstrapModeMain(argc, argv, false, false);
#ifdef EXEC_BACKEND
else if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
SubPostmasterMain(argc, argv);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 3e4ec53a97..b225b1ee70 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -75,6 +75,87 @@ RequestAddinShmemSpace(Size size)
total_addin_request = add_size(total_addin_request, size);
}
+/*
+ * CalculateShmemSize
+ * Calculates the amount of shared memory and number of semaphores needed.
+ *
+ * If num_semaphores is not NULL, it will be set to the number of semaphores
+ * required.
+ *
+ * Note that this function freezes the additional shared memory request size
+ * from loadable modules.
+ */
+Size
+CalculateShmemSize(int *num_semaphores)
+{
+ Size size;
+ int numSemas;
+
+ /* Compute number of semaphores we'll need */
+ numSemas = ProcGlobalSemas();
+ numSemas += SpinlockSemas();
+
+ /* Return the number of semaphores if requested by the caller */
+ if (num_semaphores)
+ *num_semaphores = numSemas;
+
+ /*
+ * Size of the Postgres shared-memory block is estimated via moderately-
+ * accurate estimates for the big hogs, plus 100K for the stuff that's too
+ * small to bother with estimating.
+ *
+ * We take some care to ensure that the total size request doesn't overflow
+ * size_t. If this gets through, we don't need to be so careful during the
+ * actual allocation phase.
+ */
+ size = 100000;
+ size = add_size(size, PGSemaphoreShmemSize(numSemas));
+ size = add_size(size, SpinlockSemaSize());
+ size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
+ sizeof(ShmemIndexEnt)));
+ size = add_size(size, dsm_estimate_size());
+ size = add_size(size, BufferShmemSize());
+ size = add_size(size, LockShmemSize());
+ size = add_size(size, PredicateLockShmemSize());
+ size = add_size(size, ProcGlobalShmemSize());
+ size = add_size(size, XLOGShmemSize());
+ size = add_size(size, CLOGShmemSize());
+ size = add_size(size, CommitTsShmemSize());
+ size = add_size(size, SUBTRANSShmemSize());
+ size = add_size(size, TwoPhaseShmemSize());
+ size = add_size(size, BackgroundWorkerShmemSize());
+ size = add_size(size, MultiXactShmemSize());
+ size = add_size(size, LWLockShmemSize());
+ size = add_size(size, ProcArrayShmemSize());
+ size = add_size(size, BackendStatusShmemSize());
+ size = add_size(size, SInvalShmemSize());
+ size = add_size(size, PMSignalShmemSize());
+ size = add_size(size, ProcSignalShmemSize());
+ size = add_size(size, CheckpointerShmemSize());
+ size = add_size(size, AutoVacuumShmemSize());
+ size = add_size(size, ReplicationSlotsShmemSize());
+ size = add_size(size, ReplicationOriginShmemSize());
+ size = add_size(size, WalSndShmemSize());
+ size = add_size(size, WalRcvShmemSize());
+ size = add_size(size, PgArchShmemSize());
+ size = add_size(size, ApplyLauncherShmemSize());
+ size = add_size(size, SnapMgrShmemSize());
+ size = add_size(size, BTreeShmemSize());
+ size = add_size(size, SyncScanShmemSize());
+ size = add_size(size, AsyncShmemSize());
+#ifdef EXEC_BACKEND
+ size = add_size(size, ShmemBackendArraySize());
+#endif
+
+ /* freeze the addin request size and include it */
+ addin_request_allowed = false;
+ size = add_size(size, total_addin_request);
+
+ /* might as well round it off to a multiple of a typical page size */
+ size = add_size(size, 8192 - (size % 8192));
+
+ return size;
+}
/*
* CreateSharedMemoryAndSemaphores
@@ -102,65 +183,8 @@ CreateSharedMemoryAndSemaphores(void)
Size size;
int numSemas;
- /* Compute number of semaphores we'll need */
- numSemas = ProcGlobalSemas();
- numSemas += SpinlockSemas();
-
- /*
- * Size of the Postgres shared-memory block is estimated via
- * moderately-accurate estimates for the big hogs, plus 100K for the
- * stuff that's too small to bother with estimating.
- *
- * We take some care during this phase to ensure that the total size
- * request doesn't overflow size_t. If this gets through, we don't
- * need to be so careful during the actual allocation phase.
- */
- size = 100000;
- size = add_size(size, PGSemaphoreShmemSize(numSemas));
- size = add_size(size, SpinlockSemaSize());
- size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
- sizeof(ShmemIndexEnt)));
- size = add_size(size, dsm_estimate_size());
- size = add_size(size, BufferShmemSize());
- size = add_size(size, LockShmemSize());
- size = add_size(size, PredicateLockShmemSize());
- size = add_size(size, ProcGlobalShmemSize());
- size = add_size(size, XLOGShmemSize());
- size = add_size(size, CLOGShmemSize());
- size = add_size(size, CommitTsShmemSize());
- size = add_size(size, SUBTRANSShmemSize());
- size = add_size(size, TwoPhaseShmemSize());
- size = add_size(size, BackgroundWorkerShmemSize());
- size = add_size(size, MultiXactShmemSize());
- size = add_size(size, LWLockShmemSize());
- size = add_size(size, ProcArrayShmemSize());
- size = add_size(size, BackendStatusShmemSize());
- size = add_size(size, SInvalShmemSize());
- size = add_size(size, PMSignalShmemSize());
- size = add_size(size, ProcSignalShmemSize());
- size = add_size(size, CheckpointerShmemSize());
- size = add_size(size, AutoVacuumShmemSize());
- size = add_size(size, ReplicationSlotsShmemSize());
- size = add_size(size, ReplicationOriginShmemSize());
- size = add_size(size, WalSndShmemSize());
- size = add_size(size, WalRcvShmemSize());
- size = add_size(size, PgArchShmemSize());
- size = add_size(size, ApplyLauncherShmemSize());
- size = add_size(size, SnapMgrShmemSize());
- size = add_size(size, BTreeShmemSize());
- size = add_size(size, SyncScanShmemSize());
- size = add_size(size, AsyncShmemSize());
-#ifdef EXEC_BACKEND
- size = add_size(size, ShmemBackendArraySize());
-#endif
-
- /* freeze the addin request size and include it */
- addin_request_allowed = false;
- size = add_size(size, total_addin_request);
-
- /* might as well round it off to a multiple of a typical page size */
- size = add_size(size, 8192 - (size % 8192));
-
+ /* Compute the size of the shared-memory block */
+ size = CalculateShmemSize(&numSemas);
elog(DEBUG3, "invoking IpcMemoryCreate(size=%zu)", size);
/*
diff --git a/src/include/bootstrap/bootstrap.h b/src/include/bootstrap/bootstrap.h
index 7d3b78e374..8e420574f5 100644
--- a/src/include/bootstrap/bootstrap.h
+++ b/src/include/bootstrap/bootstrap.h
@@ -32,7 +32,7 @@ extern Form_pg_attribute attrtypes[MAXATTR];
extern int numattr;
-extern void BootstrapModeMain(int argc, char *argv[], bool check_only) pg_attribute_noreturn();
+extern void BootstrapModeMain(int argc, char *argv[], bool check_only, bool output_shmem) pg_attribute_noreturn();
extern void closerel(char *name);
extern void boot_openrel(char *name);
diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index 753a6dd4d7..80e191d407 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -77,6 +77,7 @@ extern void check_on_shmem_exit_lists_are_empty(void);
/* ipci.c */
extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;
+extern Size CalculateShmemSize(int *num_semaphores);
extern void CreateSharedMemoryAndSemaphores(void);
#endif /* IPC_H */
--
2.16.6
view thread (108+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected], [email protected], [email protected], [email protected]
Subject: Re: Estimating HugePages Requirements?
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox