MIME-Version: 1.0
References: 
 <CAHnOmadn1UB-t-=Umd_TSEZ=kw48=ecX3EnesABxbPdboB-ZUQ@mail.gmail.com>
 <32ad0fda77629362dbdc90136e6d5f667d496e01.camel@cybertec.at>
 <CAHnOmafAW_Dqc8NEkmi=HOOMp3xf1DZdtOXauU2c1N5hq9BnVw@mail.gmail.com>
 <CAL9MbytuyCsNoKK4Aus8zsXYiBz+G+YAm+Gvoa1Pz+SvGO7fBA@mail.gmail.com>
In-Reply-To: 
 <CAL9MbytuyCsNoKK4Aus8zsXYiBz+G+YAm+Gvoa1Pz+SvGO7fBA@mail.gmail.com>
From: =?UTF-8?B?0JDQvdGC0L7QvSDQk9C70YPRiNCw0LrQvtCy?=
 <a.glushakov86@gmail.com>
Date: Wed, 21 May 2025 12:06:54 +0300
Message-ID: 
 <CAHnOmacs_JB_wjA9g=PZyuH5jtbFb_qkf-dS7h7H4bkS4nTQeg@mail.gmail.com>
Subject: Re: query hangs out
To: ikramuddin <ikram.amani815@gmail.com>, pgsql-admin@lists.postgresql.org
Content-Type: multipart/alternative; boundary="000000000000f3fdff0635a1b0ee"
Archived-At: 
 <https://www.postgresql.org/message-id/CAHnOmacs_JB_wjA9g%3DPZyuH5jtbFb_qkf-dS7h7H4bkS4nTQeg%40mail.gmail.com>
Precedence: bulk

--000000000000f3fdff0635a1b0ee
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

The problem is with only one table.

As a result, I determined that the problem is on page 5 of the table (I
made SELECT ctid selections until it hangs).
Then I tried to delete rows by ctid (5, 0-100) from the table until I found
the problematic row.

Content through pageinspect:
# SELECT * FROM heap_page_items(get_raw_page('"InboxState"', 5)) where lp =
=3D
51;
-[ RECORD 1
]--------------------------------------------------------------------------=
-----------------------------------------------------------------
lp          | 51
lp_off      | 3760
lp_flags    | 1
lp_len      | 100
t_xmin      | 136269917
t_xmax      | 66664135
t_field3    | 0
t_ctid      | (47,13)
t_infomask2 | 8203
t_infomask  | 4929
t_hoff      | 32
t_bits      | 1111011000000000
t_oid       |
t_data      |
\x3e8a7c00000000000100000090877a16b4b308dd9460898784c4af2dab692693d29bdf78b=
cf5153401000000fd55f20f44ec08dd9460f88969b943ab3cd8020000000000


I couldn't delete it in the standard way (delete from "InboxState" where
ctid =3D '(5,51)') - it also hangs.

But I can freeze it through pg_surgery.

# select heap_force_freeze('"InboxState"'::regclass, ARRAY['(5,
51)']::tid[]);

Output after freeze:
digitalarchive=3D# SELECT * FROM heap_page_items(get_raw_page('"InboxState"=
',
5)) where lp =3D 51;
-[ RECORD 1
]--------------------------------------------------------------------------=
-----------------------------------------------------------------
lp          | 51
lp_off      | 3760
lp_flags    | 1
lp_len      | 100
t_xmin      | 2
t_xmax      | 0
t_field3    | 0
t_ctid      | (5,51)
t_infomask2 | 11
t_infomask  | 2817
t_hoff      | 32
t_bits      | 1111011000000000
t_oid       |
t_data      |
\x3e8a7c00000000000100000090877a16b4b308dd9460898784c4af2dab692693d29bdf78b=
cf5153401000000fd55f20f44ec08dd9460f88969b943ab3cd8020000000000


After that, queries to the table started to work normally.
I'll note that there are absolutely no errors in the postgres logs,
checksums are enabled, there are no errors for them either.

It seems that this is a bug.

=D1=81=D1=80, 21 =D0=BC=D0=B0=D1=8F 2025=E2=80=AF=D0=B3. =D0=B2 02:52, ikra=
muddin <ikram.amani815@gmail.com>:

> Is it taking too long only for this table or other tables also? If the
> issue is with this single table then check when it started to happened ,
> mean after creating one index or whatever change you perform just get bac=
k
> to that point and now the query should run fine
>
>
>
>
> On Tue, 20 May 2025 at 9:14=E2=80=AFPM, =D0=90=D0=BD=D1=82=D0=BE=D0=BD =
=D0=93=D0=BB=D1=83=D1=88=D0=B0=D0=BA=D0=BE=D0=B2 <a.glushakov86@gmail.com>
> wrote:
>
>> Thanks for the advice.
>> I tried to remove all indexes and constraints from the table - it did no=
t
>> help.
>> I have a copy of the data (before truncate) - I can test any hypothesis
>>
>> =D0=B2=D1=82, 20 =D0=BC=D0=B0=D1=8F 2025=E2=80=AF=D0=B3. =D0=B2 18:25, L=
aurenz Albe <laurenz.albe@cybertec.at>:
>>
>>> On Tue, 2025-05-20 at 16:48 +0300, =D0=90=D0=BD=D1=82=D0=BE=D0=BD =D0=
=93=D0=BB=D1=83=D1=88=D0=B0=D0=BA=D0=BE=D0=B2 wrote:
>>> > I encountered a very strange behavior.
>>> > For any query (even a simple count(*) to one specific table (a small
>>> 30MB table with 3 indexes,
>>> > without any specific data types - everything is standard out of the
>>> box vanilla Postgres) -
>>> > the query hangs dead. Waited more than 24 hours - the query did not
>>> complete).
>>> >
>>> >
>>> > Similarly, the vacuum process to the table hangs.
>>> > Only Kill -9 with a full restart helps
>>> >
>>> > I get a backtrace, from it - I then examined the pg_multixact
>>> directory, which at the time of
>>> > the problem had swelled to 900MB and had several thousand files.
>>> > I excluded long and inactive transactions, as well as prepared
>>> statements.
>>> >
>>> > The workaround in the end was this - truncate the table (it was
>>> successful), then vacuum freeze
>>> > each DB, and after that the files from pg_multixact disappeared.
>>> >
>>> > What could it be? vacuum\freeze\mulitxact  settings are default.
>>> > At the same time, the value pg_database.datminmxid=3D1
>>> > Could the problem with the hang be related to the many old files in
>>> pg_multixact ? (judging by the backtrace - yes)
>>>
>>> I can't say for certain, but I have seen cases like that where index
>>> corruption sent
>>> processes into an endless loop.  Next time you could try to rebuild the
>>> indexes.
>>>
>>> Yours,
>>> Laurenz Albe
>>>
>>

--000000000000f3fdff0635a1b0ee
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">The problem is with only one table.<br><br>As a result, I =
determined that the problem is on page 5 of the table (I made SELECT ctid s=
elections until it hangs).<br>Then I tried to delete rows by ctid (5, 0-100=
) from the table until I found the problematic row.<br><br>Content through =
pageinspect:<br># SELECT * FROM heap_page_items(get_raw_page(&#39;&quot;Inb=
oxState&quot;&#39;, 5)) where lp =3D 51;<br>-[ RECORD 1 ]------------------=
---------------------------------------------------------------------------=
----------------------------------------------<br>lp =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0| 51<br>lp_off =C2=A0 =C2=A0 =C2=A0| 3760<br>lp_flags =C2=A0 =
=C2=A0| 1<br>lp_len =C2=A0 =C2=A0 =C2=A0| 100<br>t_xmin =C2=A0 =C2=A0 =C2=
=A0| 136269917<br>t_xmax =C2=A0 =C2=A0 =C2=A0| 66664135<br>t_field3 =C2=A0 =
=C2=A0| 0<br>t_ctid =C2=A0 =C2=A0 =C2=A0| (47,13)<br>t_infomask2 | 8203<br>=
t_infomask =C2=A0| 4929<br>t_hoff =C2=A0 =C2=A0 =C2=A0| 32<br>t_bits =C2=A0=
 =C2=A0 =C2=A0| 1111011000000000<br>t_oid =C2=A0 =C2=A0 =C2=A0 |<br>t_data =
=C2=A0 =C2=A0 =C2=A0| \x3e8a7c00000000000100000090877a16b4b308dd9460898784c=
4af2dab692693d29bdf78bcf5153401000000fd55f20f44ec08dd9460f88969b943ab3cd802=
0000000000<br><br><br>I couldn&#39;t delete it in the standard way (delete =
from &quot;InboxState&quot; where ctid =3D &#39;(5,51)&#39;) - it also hang=
s.<br><br>But I can freeze it through pg_surgery.<br><br># select heap_forc=
e_freeze(&#39;&quot;InboxState&quot;&#39;::regclass, ARRAY[&#39;(5, 51)&#39=
;]::tid[]);<br><br>Output after freeze:<br>digitalarchive=3D# SELECT * FROM=
 heap_page_items(get_raw_page(&#39;&quot;InboxState&quot;&#39;, 5)) where l=
p =3D 51;<br>-[ RECORD 1 ]-------------------------------------------------=
---------------------------------------------------------------------------=
---------------<br>lp =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0| 51<br>lp_off =C2=
=A0 =C2=A0 =C2=A0| 3760<br>lp_flags =C2=A0 =C2=A0| 1<br>lp_len =C2=A0 =C2=
=A0 =C2=A0| 100<br>t_xmin =C2=A0 =C2=A0 =C2=A0| 2<br>t_xmax =C2=A0 =C2=A0 =
=C2=A0| 0<br>t_field3 =C2=A0 =C2=A0| 0<br>t_ctid =C2=A0 =C2=A0 =C2=A0| (5,5=
1)<br>t_infomask2 | 11<br>t_infomask =C2=A0| 2817<br>t_hoff =C2=A0 =C2=A0 =
=C2=A0| 32<br>t_bits =C2=A0 =C2=A0 =C2=A0| 1111011000000000<br>t_oid =C2=A0=
 =C2=A0 =C2=A0 |<br>t_data =C2=A0 =C2=A0 =C2=A0| \x3e8a7c000000000001000000=
90877a16b4b308dd9460898784c4af2dab692693d29bdf78bcf5153401000000fd55f20f44e=
c08dd9460f88969b943ab3cd8020000000000<br><br><br>After that, queries to the=
 table started to work normally.<br>I&#39;ll note that there are absolutely=
 no errors in the postgres logs, checksums are enabled, there are no errors=
 for them either.<br><br>It seems that this is a bug.</div><br><div class=
=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=3D"gmail_attr=
">=D1=81=D1=80, 21 =D0=BC=D0=B0=D1=8F 2025=E2=80=AF=D0=B3. =D0=B2 02:52, ik=
ramuddin &lt;<a href=3D"mailto:ikram.amani815@gmail.com">ikram.amani815@gma=
il.com</a>&gt;:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:=
0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">=
<div dir=3D"auto">Is it taking too long only for this table or other tables=
 also? If the issue is with this single table then check when it started to=
 happened , mean after creating one index or whatever change you perform ju=
st get back to that point and now the query should run fine</div><div dir=
=3D"auto"><br></div><div dir=3D"auto"><br></div><div dir=3D"auto"><br></div=
><div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">=
On Tue, 20 May 2025 at 9:14=E2=80=AFPM, =D0=90=D0=BD=D1=82=D0=BE=D0=BD =D0=
=93=D0=BB=D1=83=D1=88=D0=B0=D0=BA=D0=BE=D0=B2 &lt;<a href=3D"mailto:a.glush=
akov86@gmail.com" target=3D"_blank">a.glushakov86@gmail.com</a>&gt; wrote:<=
br></div><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8e=
x;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir=3D"ltr"=
>Thanks for the advice.<br>I tried to remove all indexes and constraints fr=
om the table - it did not help.
<span lang=3D"en"><span><span><br>I have a copy of the data (before truncat=
e) - I can test any hypothesis</span></span></span>

<br></div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_at=
tr">=D0=B2=D1=82, 20 =D0=BC=D0=B0=D1=8F 2025=E2=80=AF=D0=B3. =D0=B2 18:25, =
Laurenz Albe &lt;<a href=3D"mailto:laurenz.albe@cybertec.at" target=3D"_bla=
nk">laurenz.albe@cybertec.at</a>&gt;:<br></div></div><div class=3D"gmail_qu=
ote"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;bo=
rder-left:1px solid rgb(204,204,204);padding-left:1ex">On Tue, 2025-05-20 a=
t 16:48 +0300, =D0=90=D0=BD=D1=82=D0=BE=D0=BD =D0=93=D0=BB=D1=83=D1=88=D0=
=B0=D0=BA=D0=BE=D0=B2 wrote:<br>
&gt; I encountered a very strange behavior.<br>
&gt; For any query (even a simple count(*) to one specific table (a small 3=
0MB table with 3 indexes,<br>
&gt; without any specific data types - everything is standard out of the bo=
x vanilla Postgres) -<br>
&gt; the query hangs dead. Waited more than 24 hours - the query did not co=
mplete).<br>
&gt; <br>
&gt; <br>
&gt; Similarly, the vacuum process to the table hangs.<br>
&gt; Only Kill -9 with a full restart helps<br>
&gt; <br>
&gt; I get a backtrace, from it - I then examined the pg_multixact director=
y, which at the time of<br>
&gt; the problem had swelled to 900MB and had several thousand files.<br>
&gt; I excluded long and inactive transactions, as well as prepared stateme=
nts.<br>
&gt; <br>
&gt; The workaround in the end was this - truncate the table (it was succes=
sful), then vacuum freeze<br>
&gt; each DB, and after that the files from pg_multixact disappeared.<br>
&gt; <br>
&gt; What could it be? vacuum\freeze\mulitxact=C2=A0 settings are default.<=
br>
&gt; At the same time, the value pg_database.datminmxid=3D1<br>
&gt; Could the problem with the hang be related to the many old files in pg=
_multixact ? (judging by the backtrace - yes)<br>
<br>
I can&#39;t say for certain, but I have seen cases like that where index co=
rruption sent<br>
processes into an endless loop.=C2=A0 Next time you could try to rebuild th=
e indexes.<br>
<br>
Yours,<br>
Laurenz Albe<br>
</blockquote></div>
</blockquote></div></div>
</blockquote></div>

--000000000000f3fdff0635a1b0ee--