MIME-Version: 1.0
References: <CAL0m=zXXmKKdh0zNseVmMZ2qfWH-093sToyKgOGCjiPWikz3Xg@mail.gmail.com>
In-Reply-To: <CAL0m=zXXmKKdh0zNseVmMZ2qfWH-093sToyKgOGCjiPWikz3Xg@mail.gmail.com>
From: Vijaykumar Jain <vijaykumarjain.github@gmail.com>
Date: Wed, 30 Oct 2024 17:09:30 +0530
Message-ID: <CAM+6J94oXYSx1XB5XF-DQ=SjFYgXdW2xWb2LCL7PBx5qjCYEBg@mail.gmail.com>
Subject: Re: Random memory related errors on live postgres 14.13 instance on
 Ubuntu 22.04 LTS
To: Ian J Cottee <ian@cottee.org>
Cc: pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="00000000000078b3f40625b02892"
Archived-At: <https://www.postgresql.org/message-id/CAM%2B6J94oXYSx1XB5XF-DQ%3DSjFYgXdW2xWb2LCL7PBx5qjCYEBg%40mail.gmail.com>
Precedence: bulk

--00000000000078b3f40625b02892
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Wed, 30 Oct 2024 at 13:04, Ian J Cottee <ian@cottee.org> wrote:

> Hello everyone, I=E2=80=99ve been using postgres for over 25 years now an=
d never
> had any major issues which were not caused by my own stupidity. In the la=
st
> 24 hours however I=E2=80=99ve had a number of issues on one client's serv=
er which I
> assume are a bug in postgres or a possible hardware issue (they are runni=
ng
> on a Linode) but I need some clarification and would welcome advice on ho=
w
> to proceed. I will also forward this mail to Linode support to ask them t=
o
> check for any memory issues they can detect.
>
>
>
> This particular Postgres is running on Ubuntu LTS 22.04 and has the
> following version information:
>
>
>
> ```
>
> PostgreSQL 14.13 (Ubuntu 14.13-0ubuntu0.22.04.1) on x86_64-pc-linux-gnu,
> compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
>
> ```
>
>
> The quick summary is that over a 24 hour period I had the following error=
s
> appear in the postgres logs at different times causing the system process=
es
> to restart:
>
>
>    - stuck spinlock detected
>    - free(): corrupted unsorted chunks
>    - double free or corruption (!prev)
>    - corrupted size vs. prev_size
>    - corrupted double-linked list
>    - *** stack smashing detected ***: terminated
>    - Segmentation fault
>
>
>

are you using the postgresql setup compiled from source ?
listing the output of pg_config may give the details

are there any extensions installed, can you list those extensions.

if you have access to source packages ,
Getting a stack trace of a running PostgreSQL backend on Linux/BSD -
PostgreSQL wiki
<https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_Postgr=
eSQL_backend_on_Linux/BSD>
can you generate a stacktrace from the process that is crashing or if it
dumped a core, then backtrace from the core dump.

will it be possible to share the actual logs both postgresql and kernel
around the incident ...

Do you have access to core dumps which these crashes may have generated ? i
think ABRT / segmentation faults would generate one.

do you collect stats of system. around the time of crash do you any
abnormal usage of io or cpu or memory , along with locks held in postgresql
setup etc.


> Here=E2=80=99s the more detailed breakdown.
>
>
>
> On Monday evening this week, the following event occurred on the server
>
>
>
> ```
>
> 2024-10-28 18:12:47.145 GMT [575437] xxx@xxx PANIC: stuck spinlock
> detected at LWLockWaitListLock,
> ./build/../src/backend/storage/lmgr/lwlock.c:913
>
```
>
>
>
I think a backtrace here would help what part of call stack led to this.
this alone does not look like any bug.


> Followed by:
>
>
>
> ```
>
> 2024-10-28 18:12:47.249 GMT [1880289] LOG: terminating any other active
> server processes
>
> 2024-10-28 18:12:47.284 GMT [1880289] LOG: all server processes
> terminated; reinitializing
>
> ```
>
> And eventually
>
>
> ```
>
> 2024-10-28 18:12:48.474 GMT [575566] xxx@xxx FATAL: the database system
> is in recovery mode
>
> 2024-10-28 18:12:48.476 GMT [575550] LOG: database system was not properl=
y
> shut down; automatic recovery in progress
>
> 2024-10-28 18:12:48.487 GMT [575550] LOG: redo starts at DD/405E83A8
>
> 2024-10-28 18:12:48.487 GMT [575550] LOG: invalid record length at
> DD/405EF818: wanted 24, got 0
>
> 2024-10-28 18:12:48.487 GMT [575550] LOG: redo done at DD/405EF7E0 system
> usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>
> 2024-10-28 18:12:48.515 GMT [1880289] LOG: database system is ready to
> accept connections
>
> ```
>
> This wasn=E2=80=99t noticed by myself or any users as they tend to all be=
 finished
> by 17:30.  However later,
>
>
>


> ```
>
> 2024-10-28 20:27:15.258 GMT [611459] xxx@xxx LOG: unexpected EOF on
> client connection with an open transaction
>
> 2024-10-28 21:01:05.934 GMT [620373] xxx@xxxx LOG: unexpected EOF on
> client connection with an open transaction
>
> free(): corrupted unsorted chunks
>

it all seems like memory corruption or some leak ... valgrind ? to get more
details if leak ...


> 2024-10-28 21:15:02.203 GMT [1880289] LOG: server process (PID 623803) wa=
s
> terminated by signal 6: Aborted
>
> 2024-10-28 21:15:02.204 GMT [1880289] LOG: terminating any other active
> server processes
>
> ```
>
>
>
> This time it could not recover and I didn=E2=80=99t notice until early th=
e next
> morning whilst doing some routine checks.
>
>
>
> ```
>
> 2024-10-28 21:15:03.643 GMT [623807] LOG: database system was not properl=
y
> shut down; automatic recovery in progress
>
> 2024-10-28 21:15:03.655 GMT [623807] LOG: redo starts at DD/47366740
>
> 2024-10-28 21:15:03.663 GMT [623807] LOG: invalid record length at
> DD/475452A0: wanted 24, got 0
>
> 2024-10-28 21:15:03.663 GMT [623807] LOG: redo done at DD/47545268 system
> usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
>
> 2024-10-28 21:15:03.682 GMT [623829] xxx@xxx FATAL: the database system
> is in recovery mode
>
> double free or corruption (!prev)
>
> 2024-10-28 21:15:03.832 GMT [1880289] LOG: startup process (PID 623807)
> was terminated by signal 6: Aborted
>
> 2024-10-28 21:15:03.832 GMT [1880289] LOG: aborting startup due to startu=
p
> process failure
>
> 2024-10-28 21:15:03.835 GMT [1880289] LOG: database system is shut down
>
> ```
>
>
>
> When I noticed in the morning it was able to start without an issue. From
> googling it appeared to be a memory issue and I wondered if the problem w=
as
> sorted now the server process had stopped completely and restarted. The
> problem was not sorted although all the above errors were recovered from
> automatically without any input from myself or the client=E2=80=99s notic=
ing.
>
>
>
> ```
>
> corrupted size vs. prev_size
>
> 2024-10-29 09:55:24.417 GMT [894747] LOG: background worker "parallel
> worker" (PID 947642) was terminated by signal 6: Aborted
>
> ```
>
>
>
> ```
>
> corrupted double-linked list
>
> 2024-10-29 13:14:28.322 GMT [894747] LOG: background worker "parallel
> worker" (PID 1019071) was terminated by signal 6: Aborted
>
> ```
>
>
>
> ```
>
> *** stack smashing detected ***: terminated
>
> 2024-10-28 15:24:30.331 GMT [1880289] LOG: background worker "parallel
> worker" (PID 528630) was terminated by signal 6: A\ borted
>
> ```
>
>
>
> ```
>
> 2024-10-28 15:40:26.617 GMT [1880289] LOG: background worker "parallel
> worker" (PID 533515) was terminated by signal 11: \
>
> Segmentation fault
>
> 2024-10-28 15:40:26.617 GMT [1880289] DETAIL: Failed process was running:
> SELECT "formula_line".id FROM "formul\
>
> ```
>
>

idk why it crashed with sigabrt instead of sigkill if it was indeed a
memory leak and not a bug ... so not sure memory overcommiting can be of
use here ...

how much is the concurrency at peak and with what work mem .... any theory
of excessive work mem and too many concurrent processes holding some locks
for long ...
it should not crash even if that is the case, but just asking ...

lastly.... is it possible to memcheck run on the machine just to ensure no
memory scares ... if this is running on a vm, or bare metal ,,,, any
hardware errors around that time ?

most likely it looks like a h/w issue, we used to see things like this on
bare metals .... which only happened occasionally and then frequently till
we moved away from that setup.

also, does it happen only when the optimiser picks a plan involving
parallel workers  for a query?
If you set max_parallel_workers_per_gather to 0, to not
parallelize anything , do you still see the issue ?

Just insights, if not useful, pls ignore.


> Best regards
>
>
> Ian Cottee
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>


--=20
Thanks,
Vijay

Open to work
Resume - Vijaykumar Jain <https://github.com/cabecada>

--00000000000078b3f40625b02892
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><div dir=3D"ltr"><br></d=
iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On =
Wed, 30 Oct 2024 at 13:04, Ian J Cottee &lt;<a href=3D"mailto:ian@cottee.or=
g">ian@cottee.org</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote"=
 style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);p=
adding-left:1ex"><div dir=3D"ltr"><p class=3D"MsoNormal" style=3D"margin:0c=
m;font-size:12pt;font-family:Aptos,sans-serif"><span style=3D"font-size:12p=
t">Hello everyone, I=E2=80=99ve been using postgres for over 25 years
now and never had any major issues which were not caused by my own stupidit=
y.
In the last 24 hours however I=E2=80=99ve had a number of issues on one cli=
ent&#39;s server which I assume
are a bug in postgres or a possible hardware issue (they are running on a L=
inode) but I need
some clarification and would welcome advice on how to proceed. I will also =
forward this mail to Linode support to ask them to check for any memory iss=
ues they can detect.=C2=A0</span></p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">This particular Postgres is running on Ubuntu LTS 22.04 and
has the following version information:</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">PostgreSQL 14.13 (Ubuntu 14.13-0ubuntu0.22.04.1) on
x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0,
64-bit</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p><p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12=
pt;font-family:Aptos,sans-serif"><br></p><p class=3D"MsoNormal" style=3D"ma=
rgin:0cm;font-size:12pt;font-family:Aptos,sans-serif">The quick summary is =
that over a 24 hour period I had the
following errors appear in the postgres logs at different times causing the
system processes to restart:<br>
<br></p><ul><li>
stuck spinlock detected=C2=A0</li><li>free(): corrupted unsorted chunks=C2=
=A0</li><li>double free or corruption (!prev)</li><li>corrupted size vs. pr=
ev_size=C2=A0</li><li>corrupted double-linked list=C2=A0</li><li>*** stack =
smashing detected ***: terminated=C2=A0</li><li>Segmentation fault=C2=A0</l=
i></ul><p></p><p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;fon=
t-family:Aptos,sans-serif">=C2=A0</p></div></blockquote><div><br></div><div=
>are you using the postgresql setup compiled from source ?=C2=A0</div><div>=
listing the output of pg_config may give the details=C2=A0</div><div><br></=
div><div>are there any extensions installed, can you list those extensions.=
</div><div><br></div><div>if you have access to source packages ,=C2=A0</di=
v><div><a href=3D"https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of=
_a_running_PostgreSQL_backend_on_Linux/BSD">Getting a stack trace of a runn=
ing PostgreSQL backend on Linux/BSD - PostgreSQL wiki</a></div><div>can you=
 generate a stacktrace=C2=A0from the process that is crashing or if it dump=
ed a core, then backtrace from the core dump.</div><div><br></div><div>will=
 it be possible to share the actual logs both postgresql and kernel around =
the incident ...=C2=A0</div><div><br></div><div>Do you have access to core =
dumps which these crashes may have generated ? i think ABRT / segmentation =
faults would generate one.</div><div><br></div><div>do you collect stats of=
 system. around the time of crash do you any abnormal usage of io or cpu or=
 memory , along with locks held in postgresql setup etc.</div><div><br></di=
v><div><br></div><div><br></div><div>=C2=A0</div><blockquote class=3D"gmail=
_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204=
,204);padding-left:1ex"><div dir=3D"ltr"><p class=3D"MsoNormal" style=3D"ma=
rgin:0cm;font-size:12pt;font-family:Aptos,sans-serif">


</p><p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:A=
ptos,sans-serif">Here=E2=80=99s the more detailed breakdown.=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">On Monday evening this week, the following event occurred on t=
he
server</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:47.145 GMT [575437] xxx@xxx PANIC: stuck
spinlock detected at LWLockWaitListLock,
./build/../src/backend/storage/lmgr/lwlock.c:913<span style=3D"font-family:=
Arial,Helvetica,sans-serif;font-size:small">=C2=A0</span></p></div></blockq=
uote><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><p =
class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos,sa=
ns-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p></div></blockquote><div>I think a backtrace here wou=
ld help what part of call stack led to this. this alone does not look like =
any bug.</div><div><br></div><div><br></div><div>=C2=A0</div><blockquote cl=
ass=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid=
 rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr">

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">Followed by:</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:47.249 GMT [1880289] LOG: terminating any
other active server processes </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:47.284 GMT [1880289] LOG: all server
processes terminated; reinitializing</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">And eventually </p><p class=3D"MsoNormal" style=3D"margin:0cm;=
font-size:12pt;font-family:Aptos,sans-serif"><br></p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:48.474 GMT [575566] xxx@xxx FATAL: the
database system is in recovery mode </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:48.476 GMT [575550] LOG: database system
was not properly shut down; automatic recovery in progress </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:48.487 GMT [575550] LOG: redo starts at
DD/405E83A8 </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:48.487 GMT [575550] LOG: invalid record
length at DD/405EF818: wanted 24, got 0 </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:48.487 GMT [575550] LOG: redo done at
DD/405EF7E0 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 =
s </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 18:12:48.515 GMT [1880289] LOG: database system
is ready to accept connections</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">This wasn=E2=80=99t noticed by myself or any users as they ten=
d to
all be finished by 17:30.=C2=A0 However later, </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p></div></blockquote><div>=C2=A0</div><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid r=
gb(204,204,204);padding-left:1ex"><div dir=3D"ltr">

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 20:27:15.258 GMT [611459] xxx@xxx LOG: unexpected
EOF on client connection with an open transaction </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:01:05.934 GMT [620373] xxx@xxxx LOG:
unexpected EOF on client connection with an open transaction </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">free(): corrupted unsorted chunks</p></div></blockquote><div><=
br></div><div>it all seems like memory corruption or some leak ... valgrind=
 ? to get more details if leak ...</div><div><br></div><div>=C2=A0</div><bl=
ockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-lef=
t:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"><p class=3D=
"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos,sans-serif=
"> </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:02.203 GMT [1880289] LOG: server process
(PID 623803) was terminated by signal 6: Aborted </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:02.204 GMT [1880289] LOG: terminating any
other active server processes </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">This time it could not recover and I didn=E2=80=99t notice unt=
il
early the next morning whilst doing some routine checks. </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.643 GMT [623807] LOG: database system
was not properly shut down; automatic recovery in progress </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.655 GMT [623807] LOG: redo starts at
DD/47366740 </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.663 GMT [623807] LOG: invalid record
length at DD/475452A0: wanted 24, got 0 </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.663 GMT [623807] LOG: redo done at
DD/47545268 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 =
s </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.682 GMT [623829] xxx@xxx FATAL: the
database system is in recovery mode </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">double free or corruption (!prev) </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.832 GMT [1880289] LOG: startup process
(PID 623807) was terminated by signal 6: Aborted </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.832 GMT [1880289] LOG: aborting startup
due to startup process failure </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 21:15:03.835 GMT [1880289] LOG: database system
is shut down</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">When I noticed in the morning it was able to start without
an issue. From googling it appeared to be a memory issue and I wondered if =
the
problem was sorted now the server process had stopped completely and restar=
ted.
The problem was not sorted although all the above errors were recovered fro=
m
automatically without any input from myself or the client=E2=80=99s noticin=
g. </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">corrupted size vs. prev_size </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-29 09:55:24.417 GMT [894747] LOG: background worker
&quot;parallel worker&quot; (PID 947642) was terminated by signal 6: Aborte=
d </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">corrupted double-linked list </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-29 13:14:28.322 GMT [894747] LOG: background worker
&quot;parallel worker&quot; (PID 1019071) was terminated by signal 6: Abort=
ed</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">*** stack smashing detected ***: terminated </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 15:24:30.331 GMT [1880289] LOG: background worker
&quot;parallel worker&quot; (PID 528630) was terminated by signal 6: A\ bor=
ted</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 15:40:26.617 GMT [1880289] LOG: background worker
&quot;parallel worker&quot; (PID 533515) was terminated by signal 11: \ </p=
>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">Segmentation fault </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">2024-10-28 15:40:26.617 GMT [1880289] DETAIL: Failed process
was running: SELECT &quot;formula_line&quot;.id FROM &quot;formul\ </p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">```</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif"><span style=3D"font-size:12pt"></span></p></div></blockquote><=
div><br></div><div><br></div><div>idk why it=C2=A0crashed with sigabrt inst=
ead of sigkill if it was indeed a memory leak and not a bug ... so not sure=
 memory overcommiting=C2=A0can be of use here ...</div><div><br></div><div>=
how much is the concurrency at peak and with what work mem .... any theory =
of excessive work mem and too many concurrent processes holding some locks =
for long ...</div><div>it should not crash even if that is the case, but ju=
st asking ...</div><div><br></div><div>lastly.... is it possible to memchec=
k run on the machine just to ensure no memory scares ... if this is running=
 on a vm, or bare metal ,,,, any hardware errors around that time ?</div><d=
iv><br></div><div>most likely it looks like a h/w issue, we used to see thi=
ngs like this on bare metals .... which only happened occasionally and then=
 frequently till we moved away from that setup.</div><div><br></div><div>al=
so, does it happen only when the optimiser picks a plan involving parallel =
workers=C2=A0 for a query?</div><div>If you set=C2=A0<span style=3D"color:r=
gb(71,71,71);font-family:Arial,sans-serif;font-size:14px">max_</span><span =
style=3D"font-weight:700;color:rgb(82,87,92);font-family:Arial,sans-serif;f=
ont-size:14px">parallel</span><span style=3D"color:rgb(71,71,71);font-famil=
y:Arial,sans-serif;font-size:14px">_workers_per_gather to 0, to not paralle=
lize=C2=A0anything , do you still see the issue ?</span></div><div><span st=
yle=3D"color:rgb(71,71,71);font-family:Arial,sans-serif;font-size:14px"><br=
></span></div><div><span style=3D"color:rgb(71,71,71);font-family:Arial,san=
s-serif;font-size:14px">Just insights, if not useful, pls ignore.</span></d=
iv><div><span style=3D"color:rgb(71,71,71);font-family:Arial,sans-serif;fon=
t-size:14px"><br></span></div><div>=C2=A0</div><div><br></div><div><br></di=
v><div><br></div><div>=C2=A0</div><blockquote class=3D"gmail_quote" style=
=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding=
-left:1ex"><div dir=3D"ltr"><p class=3D"MsoNormal" style=3D"margin:0cm;font=
-size:12pt;font-family:Aptos,sans-serif"><span style=3D"font-size:12pt">Bes=
t regards</span></p><p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12=
pt;font-family:Aptos,sans-serif"><br></p><p class=3D"MsoNormal" style=3D"ma=
rgin:0cm;font-size:12pt;font-family:Aptos,sans-serif">Ian Cottee</p><p clas=
s=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos,sans-s=
erif"><br></p><p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;fon=
t-family:Aptos,sans-serif"><br></p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p>

<p class=3D"MsoNormal" style=3D"margin:0cm;font-size:12pt;font-family:Aptos=
,sans-serif">=C2=A0</p></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr">Thanks,<div>Vijay</div><div><br></div><div>Open =
to work</div><div>Resume -=C2=A0<a href=3D"https://github.com/cabecada" tar=
get=3D"_blank">Vijaykumar Jain</a><br></div></div></div></div></div></div>

--00000000000078b3f40625b02892--