public inbox for [email protected]
help / color / mirror / Atom feedRe: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
3+ messages / 3 participants
[nested] [flat]
* Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
@ 2026-03-18 13:10 Jim Jones <[email protected]>
2026-03-18 14:54 ` RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY Ana Almeida <[email protected]>
0 siblings, 1 reply; 3+ messages in thread
From: Jim Jones @ 2026-03-18 13:10 UTC (permalink / raw)
To: Ana Almeida <[email protected]>; [email protected] <[email protected]>
Hi Ana
On 18/03/2026 10:54, Ana Almeida wrote:
> 2026-03-17 18:19:55.244 WAT [2261667] LOG: server process (PID 2382873)
> was terminated by signal 11: Segmentation fault
>
> 2026-03-17 18:19:55.244 WAT [2261667] DETAIL: Failed process was
> running: REINDEX TABLE CONCURRENTLY sibs.purchases;
>
> 2026-03-17 18:19:55.244 WAT [2261667] LOG: terminating any other active
> server processes
>
> 2026-03-17 18:19:55.257 WAT [2261667] LOG: all server processes
> terminated; reinitializing
>
> 2026-03-17 18:19:55.354 WAT [2382972] LOG: database system was
> interrupted; last known up at 2026-03-17 18:18:58 WAT
>
> 2026-03-17 18:19:55.449 WAT [2382972] LOG: database system was not
> properly shut down; automatic recovery in progress
>
> 2026-03-17 18:19:55.457 WAT [2382972] LOG: redo starts at 310/8142BEC0
>
> 2026-03-17 18:19:56.352 WAT [2382972] LOG: invalid record length at
> 310/988BAA18: expected at least 24, got 0
>
> 2026-03-17 18:19:56.352 WAT [2382972] LOG: redo done at 310/988BA9E0
> system usage: CPU: user: 0.28 s, system: 0.34 s, elapsed: 0.89 s
>
> 2026-03-17 18:19:56.360 WAT [2382973] LOG: checkpoint starting: end-of-
> recovery immediate wait
I was unable to reproduce the bug. Could you share a bit more data on
the table and indexes that caused the system crash?
Best, Jim
^ permalink raw reply [nested|flat] 3+ messages in thread
* RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
2026-03-18 13:10 Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY Jim Jones <[email protected]>
@ 2026-03-18 14:54 ` Ana Almeida <[email protected]>
2026-03-18 23:42 ` Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY Tomas Vondra <[email protected]>
0 siblings, 1 reply; 3+ messages in thread
From: Ana Almeida @ 2026-03-18 14:54 UTC (permalink / raw)
To: Jim Jones <[email protected]>; [email protected] <[email protected]>; +Cc: Nuno Azevedo <[email protected]>
Hello Jim,
I didn’t notice that the error showed the schema and table name. For confidentiality reasons, could you please not share the schema and table name if this is released as a bug?
Here is the information:
Table "myschema.mytable"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------------------+-----------------------------+-----------+----------+---------+----------+-------------+--------------+-------------
id | bigint | | not null | | plain | | |
axxxxxx | character varying(32) | | not null | | extended | | |
bxx | text | | not null | | extended | | |
cxxxxxxx | text | | not null | | extended | | |
dxxxxxxxx | text | | | | extended | | |
lag_val | text | | | | extended | | |
exxxxxxxxxx | text | | | | extended | | |
fxxxxxxxxxxxxx | text | | | | extended | | |
gxxxxxxxxxxxx | text | | | | extended | | |
hxxxxxx | numeric | | not null | | main | | |
ixxxxxxxxxxxxxx | numeric | | | | main | | |
jxxxxxxxxxxxxxx | numeric | | | | main | | |
kxxxxxx | integer | | | | plain | | |
lxxxxxxxxxxxx | integer | | not null | | plain | | |
mxxxxxxxxxxxxxx | timestamp without time zone | | | | plain | | |
nxxxxxxxxxxxxx | timestamp without time zone | | | | plain | | |
oxxxxxxxxxxxx | timestamp without time zone | | | | plain | | |
pxxxxxxxxxxx | timestamp without time zone | | not null | | plain | | |
qr_mydb_id | bigint | | | | plain | | |
qxxxxxx | character varying(100) | | | | extended | | |
Indexes:
"mytable_pkey" PRIMARY KEY, btree (id)
"idx_lag_val" btree (lag_val)
"idx_mytable_qr_mydb" btree (qr_mydb_id)
Foreign-key constraints:
"fk__mytable__qr_mydb" FOREIGN KEY (qr_mydb_id) REFERENCES myschema.qr_mydb(id)
Access method: heap
Options: autovacuum_enabled=true, toast.autovacuum_enabled=true
Just another note, before we also had the error below in the same reindex command. The database didn’t crash when that error happened but the reindex failed. After that, we recreated the table.
ERROR: could not open file "base/179146/184526.4" (target block 808464432): previous segment is only 99572 blocks
We haven’t been able to reproduce the errors again.
Cumprimentos,
Ana Almeida
-----Original Message-----
From: Jim Jones <[email protected]>
Sent: 18 March 2026 13:10
To: Ana Almeida <[email protected]>; [email protected]
Subject: Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
[You don't often get email from [email protected]<mailto:[email protected]>. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi Ana
On 18/03/2026 10:54, Ana Almeida wrote:
> 2026-03-17 18:19:55.244 WAT [2261667] LOG: server process (PID
> 2382873) was terminated by signal 11: Segmentation fault
>
> 2026-03-17 18:19:55.244 WAT [2261667] DETAIL: Failed process was
> running: REINDEX TABLE CONCURRENTLY sibs.purchases;
>
> 2026-03-17 18:19:55.244 WAT [2261667] LOG: terminating any other
> active server processes
>
> 2026-03-17 18:19:55.257 WAT [2261667] LOG: all server processes
> terminated; reinitializing
>
> 2026-03-17 18:19:55.354 WAT [2382972] LOG: database system was
> interrupted; last known up at 2026-03-17 18:18:58 WAT
>
> 2026-03-17 18:19:55.449 WAT [2382972] LOG: database system was not
> properly shut down; automatic recovery in progress
>
> 2026-03-17 18:19:55.457 WAT [2382972] LOG: redo starts at
> 310/8142BEC0
>
> 2026-03-17 18:19:56.352 WAT [2382972] LOG: invalid record length at
> 310/988BAA18: expected at least 24, got 0
>
> 2026-03-17 18:19:56.352 WAT [2382972] LOG: redo done at 310/988BA9E0
> system usage: CPU: user: 0.28 s, system: 0.34 s, elapsed: 0.89 s
>
> 2026-03-17 18:19:56.360 WAT [2382973] LOG: checkpoint starting:
> end-of- recovery immediate wait
I was unable to reproduce the bug. Could you share a bit more data on the table and indexes that caused the system crash?
Best, Jim
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY
2026-03-18 13:10 Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY Jim Jones <[email protected]>
2026-03-18 14:54 ` RE: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY Ana Almeida <[email protected]>
@ 2026-03-18 23:42 ` Tomas Vondra <[email protected]>
0 siblings, 0 replies; 3+ messages in thread
From: Tomas Vondra @ 2026-03-18 23:42 UTC (permalink / raw)
To: Ana Almeida <[email protected]>; Jim Jones <[email protected]>; [email protected] <[email protected]>; +Cc: Nuno Azevedo <[email protected]>
On 3/18/26 15:54, Ana Almeida wrote:
> Hello Jim,
>
> I didn’t notice that the error showed the schema and table name. For
> confidentiality reasons, could you please not share the schema and table
> name if this is released as a bug?
>
> Here is the information:
>
>
>
> Table
> "myschema.mytable"
>
> Column | Type | Collation | Nullable
> | Default | Storage | Compression | Stats target | Description
>
> --------------------+-----------------------------+-----------
> +----------+---------+----------+-------------+--------------+-------------
>
> id | bigint | | not null
> | | plain | | |
>
> axxxxxx | character varying(32) | | not null
> | | extended | | |
>
> bxx | text | | not null
> | | extended | | |
>
> cxxxxxxx | text | | not null
> | | extended | | |
>
> dxxxxxxxx | text | |
> | | extended | | |
>
> lag_val | text | |
> | | extended | | |
>
> exxxxxxxxxx | text | |
> | | extended | | |
>
> fxxxxxxxxxxxxx | text | |
> | | extended | | |
>
> gxxxxxxxxxxxx | text | |
> | | extended | | |
>
> hxxxxxx | numeric | | not null
> | | main | | |
>
> ixxxxxxxxxxxxxx | numeric | |
> | | main | | |
>
> jxxxxxxxxxxxxxx | numeric | |
> | | main | | |
>
> kxxxxxx | integer | |
> | | plain | | |
>
> lxxxxxxxxxxxx | integer | | not null
> | | plain | | |
>
> mxxxxxxxxxxxxxx | timestamp without time zone | |
> | | plain | | |
>
> nxxxxxxxxxxxxx | timestamp without time zone | |
> | | plain | | |
>
> oxxxxxxxxxxxx | timestamp without time zone | |
> | | plain | | |
>
> pxxxxxxxxxxx | timestamp without time zone | | not null
> | | plain | | |
>
> qr_mydb_id | bigint | |
> | | plain | | |
>
> qxxxxxx | character varying(100) | |
> | | extended | | |
>
> Indexes:
>
> "mytable_pkey" PRIMARY KEY, btree (id)
>
> "idx_lag_val" btree (lag_val)
>
> "idx_mytable_qr_mydb" btree (qr_mydb_id)
>
> Foreign-key constraints:
>
> "fk__mytable__qr_mydb" FOREIGN KEY (qr_mydb_id) REFERENCES
> myschema.qr_mydb(id)
>
> Access method: heap
>
> Options: autovacuum_enabled=true, toast.autovacuum_enabled=true
>
>
>
> Just another note, before we also had the error below in the same
> reindex command. The database didn’t crash when that error happened but
> the reindex failed. After that, we recreated the table.
>
>
>
> ERROR: could not open file "base/179146/184526.4" (target block
> 808464432): previous segment is only 99572 blocks
>
So what was the sequence of events, exactly? You got this "could not
open file" error during REINDEX CONCURRENTLY, you recreated the table
and then it crashed on some later REINDEX CONCURRENTLY?
How did you recreate the table? Did you reload it from a backup or
something else?
>
> We haven’t been able to reproduce the errors again.
>
That suggests it might have been some sort of data corruption, but it's
just a guess. Have you checked the server log if there are any messages
suggesting e.g. storage / memory issues or something like that?
Per the backtrace you shared in the previous message, the segfault
happened here:
#0 0x00000000005d67a8 validate_index_callback (postgres)
#1 0x00000000005738bd btvacuumpage (postgres)
#2 0x0000000000573d8a btvacuumscan (postgres)
#3 0x0000000000573f00 btbulkdelete (postgres)
...
Which is a very heavily exercised code, so I'm somewhat skeptical a bug
would go unnoticed for very long. It's possible, of course. But the
validate_index_callback doesn't do all that much - it just writes the
TID value to a tuplesort / temporary file.
It seems you have the core saved in a file:
> Storage: /var/lib/systemd/coredump/core.postgres.26.0a32...
Can you try inspecting getting a better backtrace using gdb? It might
tell us if there's a bogus pointer or something like that. Or maybe not,
chances are the compiler optimized some of the variables, but it's worth
a try.
regards
--
Tomas Vondra
^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2026-03-18 23:42 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2026-03-18 13:10 Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY Jim Jones <[email protected]>
2026-03-18 14:54 ` Ana Almeida <[email protected]>
2026-03-18 23:42 ` Tomas Vondra <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox