public inbox for [email protected]
help / color / mirror / Atom feedCan postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
3+ messages / 2 participants
[nested] [flat]
* Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
@ 2025-07-29 22:37 Akashkiran Shivakumar <[email protected]>
2025-08-10 03:00 ` Re: Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row Akashkiran Shivakumar <[email protected]>
2025-08-11 15:37 ` Re: Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row Greg Sabino Mullane <[email protected]>
0 siblings, 2 replies; 3+ messages in thread
From: Akashkiran Shivakumar @ 2025-07-29 22:37 UTC (permalink / raw)
To: [email protected]
Hello,
I have a postgres database (major version 13) and am doing CDC by using a
replication slot with pgoutput. In our data lake, we see that there are
multiple updates (3 in this case) happening to the same row as part of the
same transaction. This doesn't make sense if we look at them as separate
updates. The expectation was that the row was updated once and a single CDC
record was pushed out. I haven't completely ruled out whether multiple
updates happened in that transaction but I wanted to ask the community if
it is possible that a single update statement could spill over as multiple
CDC update records by pgoutput / postgres ?
If yes, could you possibly point to the testcases or code where this might
happen? Any blogs or suggestions are welcome. LMK if you need further
information
P.S: Each update seen in the data lake changes at least one field in the
row. The entire row with all the columns are pushed out for the 3 updates
seen in the data lake. From the application perspective, it makes sense if
the 3 updates were merged into a single update record
Regards,
Akashkiran
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
2025-07-29 22:37 Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row Akashkiran Shivakumar <[email protected]>
@ 2025-08-10 03:00 ` Akashkiran Shivakumar <[email protected]>
1 sibling, 0 replies; 3+ messages in thread
From: Akashkiran Shivakumar @ 2025-08-10 03:00 UTC (permalink / raw)
To: [email protected]
Can someone take a look at this ? Any help would be appreciated
Regards,
Akashkiran
On Tue, Jul 29, 2025 at 3:37 PM Akashkiran Shivakumar <
[email protected]> wrote:
> Hello,
> I have a postgres database (major version 13) and am doing CDC by using a
> replication slot with pgoutput. In our data lake, we see that there are
> multiple updates (3 in this case) happening to the same row as part of the
> same transaction. This doesn't make sense if we look at them as separate
> updates. The expectation was that the row was updated once and a single CDC
> record was pushed out. I haven't completely ruled out whether multiple
> updates happened in that transaction but I wanted to ask the community if
> it is possible that a single update statement could spill over as multiple
> CDC update records by pgoutput / postgres ?
>
> If yes, could you possibly point to the testcases or code where this might
> happen? Any blogs or suggestions are welcome. LMK if you need further
> information
>
> P.S: Each update seen in the data lake changes at least one field in the
> row. The entire row with all the columns are pushed out for the 3 updates
> seen in the data lake. From the application perspective, it makes sense if
> the 3 updates were merged into a single update record
>
> Regards,
> Akashkiran
>
^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row
2025-07-29 22:37 Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row Akashkiran Shivakumar <[email protected]>
@ 2025-08-11 15:37 ` Greg Sabino Mullane <[email protected]>
1 sibling, 0 replies; 3+ messages in thread
From: Greg Sabino Mullane @ 2025-08-11 15:37 UTC (permalink / raw)
To: Akashkiran Shivakumar <[email protected]>; +Cc: [email protected]
>
> possible that a single update statement could spill over as multiple CDC
> update records by pgoutput / postgres ?
Basically, no. If I understand your questions correctly, the answer seems
to point to your application making multiple updates, which will lead to
multiple UPDATE records in the WAL, and thus multiple UPDATEs via pgoutput.
But the basic mapping will still be one heap update = one WAL entry = one
pgoutput update. (It would be good to get familiar with the pg_waldump
program and examine the WAL yourself to get an idea of the exact changes
that are generated for various situations.) While there are cases in which
a single SQL update can generate multiple WAL entries (e.g. indexes,
TOAST), each individual update is always going to generate at least one WAL
record. Due to the way that Postgres works, each update gets put into WAL
immediately - Postgres cannot merge updates that happen to occur in the
same transaction. Nor does pgoutput combine those. So even though at the
end of the transaction a single row gets changed at the logical level,
there can be quite a number of changes at the WAL level. If you want to
smush together pgoutput's updates into a single call, you are welcome to do
so, of course - as long as it's in the same transaction. But I'd not do
that, as you are now messing with the order of events as they happened on
your primary. In other words, there may have been a reason it was updated
col by col on the primary, and if that bothers you, the primary is the
place to fix it, not stitching it together after the fact.
Cheers,
Greg
^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2025-08-11 15:37 UTC | newest]
Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2025-07-29 22:37 Can postgres replication slot using pgoutput release multiple CDC records for a single update to a particular row Akashkiran Shivakumar <[email protected]>
2025-08-10 03:00 ` Akashkiran Shivakumar <[email protected]>
2025-08-11 15:37 ` Greg Sabino Mullane <[email protected]>
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox