MIME-Version: 1.0
From: Max Madden <maxmmadden@gmail.com>
Date: Tue, 10 Jun 2025 16:37:03 +0100
Message-ID: <CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com>
Subject: Logical Replication Memory Allocation Error - "invalid memory alloc
 request size"
To: pgsql-general@lists.postgresql.org
Content-Type: multipart/alternative; boundary="0000000000008dd12b063739785c"
Archived-At: <https://www.postgresql.org/message-id/CAD1FGCT2sYrP_70RTuo56QTizyc%2BJ3wJdtn2gtO3VttQFpdMZg%40mail.gmail.com>
Precedence: bulk

--0000000000008dd12b063739785c
Content-Type: text/plain; charset="UTF-8"

Hello,

I'm encountering a consistent issue with PostgreSQL 15 logical replication
and would appreciate any guidance on debugging or resolving this problem.

*Setup:*
- Source: PostgreSQL 15.x
- Target: PostgreSQL 15.x
- Replication: Logical replication using publication/subscription (pgoutput)
- Tables: 3 tables (details below)

*Table Details:*
- Table 1: ~1,300 records, 7 columns, no large objects
- Table 2: ~100,000 records, 7 columns, no large objects
- Table 3: ~100,000 records, 17 columns, no large objects

*Problem:*

The initial snapshot and data copy complete successfully for all tables.
However, anywhere from 5 minutes to 2 hours after the initial sync, the
subscription consistently fails with memory allocation errors like:

```
2025-06-10 14:14:56.800 UTC [299] ERROR: could not receive data from WAL
stream: ERROR: invalid memory alloc request size 1238451248
2025-06-10 14:14:56.805 UTC [1] LOG: background worker "logical replication
worker" (PID 299) exited with exit code 1
```

This occurs whether I replicate all 3 tables together or individually.

My initial hypothesis is that large transactions are creating WAL segments
that exceed memory limits when sent to the subscriber. However, I haven't
been able to confirm this / find the cause.

*Questions:*
1. What's the best approach to debug this memory allocation issue?
2. Are there specific PostgreSQL settings I should check ?
3. How can I identify if large transactions are indeed the root cause?

*Additional Context:*
- This happens consistently across multiple replication attempts
- The error size varies but is always requesting > 1GB
- No custom logical replication settings currently applied
- Subscriber machine has 256 GB of RAM and Ubuntu 20.04
- Can recreate it on different machines

I should also mention that we're operating in a managed environment on
DigitalOcean, which means we don't have direct access to the WAL logs on
the publisher node. This is why the log information above is limited. I
understand this constraint makes it more difficult to provide help, but I
would really appreciate any insights or suggestions you might have.

Thanks,

Max

--0000000000008dd12b063739785c
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hello,<br><br>I&#39;m encountering a consistent issue with=
 PostgreSQL 15 logical replication and would appreciate any guidance on deb=
ugging or resolving this problem.<br><br><b>Setup:</b><div>- Source: Postgr=
eSQL 15.x<br>- Target: PostgreSQL 15.x<br>- Replication: Logical replicatio=
n using publication/subscription (pgoutput)<br>- Tables: 3 tables (details =
below)<br><br><b>Table Details:</b><div>- Table 1: ~1,300 records, 7 column=
s, no large objects=C2=A0<br>- Table 2: ~100,000 records, 7 columns, no lar=
ge objects<br>- Table 3: ~100,000 records, 17 columns, no large objects<br>=
<br><b>Problem:</b></div><div><br></div><div>The initial snapshot and data =
copy complete successfully for all tables. However, anywhere from 5 minutes=
 to 2 hours after the initial sync, the subscription consistently fails wit=
h memory allocation errors like:<br><br>```<br>2025-06-10 14:14:56.800 UTC =
[299] ERROR: could not receive data from WAL stream: ERROR: invalid memory =
alloc request size 1238451248 <br>2025-06-10 14:14:56.805 UTC [1] LOG: back=
ground worker &quot;logical replication worker&quot; (PID 299) exited with =
exit code 1<br>```<br><br>This occurs whether I replicate all 3 tables toge=
ther or individually.<br></div><div><br></div><div>My initial hypothesis is=
 that large transactions are creating WAL segments that exceed memory limit=
s when sent to the subscriber. However, I haven&#39;t been able to confirm =
this / find the cause. <br><b><br>Questions:</b><br>1. What&#39;s the best =
approach to debug this memory allocation issue?<br>2. Are there specific Po=
stgreSQL settings I should check ?<br>3. How can I identify if large transa=
ctions are indeed the root cause?<br><br><b>Additional Context:</b><br>- Th=
is happens consistently across multiple replication attempts<br>- The error=
 size varies but is always requesting &gt; 1GB<br>- No custom logical repli=
cation settings currently applied<br>- Subscriber machine has 256 GB of RAM=
 and Ubuntu 20.04</div><div>- Can recreate it on different machines<br><br>=
I should also mention that we&#39;re operating in a managed environment on =
DigitalOcean, which means we don&#39;t have direct access to the WAL logs o=
n the publisher node. This is why the log information above is limited. I u=
nderstand this constraint makes it more difficult to provide help, but I wo=
uld really appreciate any insights or suggestions you might have.<br><br>Th=
anks,</div><div>=C2=A0 <br>Max<br></div></div></div>

--0000000000008dd12b063739785c--