public inbox for [email protected]  
help / color / mirror / Atom feed
From: jiye <[email protected]>
To: Etsuro Fujita <[email protected]>
Cc: David G. Johnston <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re:Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up
Date: Fri, 3 Apr 2026 11:13:19 +0800 (CST)
Message-ID: <[email protected]> (raw)
In-Reply-To: <CAPmGK157wTfup6Jqt7K1p1=S0kGoBRSeQ1M50fLt93tcF1NQ5w@mail.gmail.com>
References: <[email protected]>
	<CAKFQuwYcEvfna4HUkG9VkX0cR_xbRRhT_0qDaAHhrJ4UX0EW-g@mail.gmail.com>
	<[email protected]>
	<CAPmGK157wTfup6Jqt7K1p1=S0kGoBRSeQ1M50fLt93tcF1NQ5w@mail.gmail.com>

We have successfully reproduced this issue and gained a clearer understanding of its root cause. The application uses a cursor to fetch partial results in batches, with a delay between consecutive fetch operations. When the interval between two batches exceeds the tcp_user_timeout threshold, the connection is terminated unexpectedly.

In my analysis, during cursor-based queries, applications typically retrieve results in partial batches. If the number of rows fetched in a single batch is smaller than the number of rows scanned from the local table, the executor is unable to proceed with fetching rows from the foreign table. While we have attempted workarounds such as adjusting the fetch size, tuning TCP buffer parameters, and modifying the tcp_user_timeout value, these measures only mitigate the symptoms without addressing the underlying problem.

To achieve a fundamental resolution, I propose two potential solutions:

‌Alternate Row Fetching‌: Modify the executor to alternately retrieve rows from the local table and the foreign table, ensuring balanced data flow between the two data sources.
‌Asynchronous Tuple Storage‌: Implement a tuple storage mechanism to asynchronously cache results from the foreign table. This would allow the executor to fetch foreign table results into the storage buffer independently, preventing TCP window exhaustion and decoupling the dependency between local and foreign data retrieval.





At 2026-03-11 16:01:02, "Etsuro Fujita" <[email protected]> wrote:
>On Wed, Mar 11, 2026 at 3:25 PM jiye <[email protected]> wrote:
>> Sorry, I made a mistake about the tcp_user_timeout configuration. Our app sets it to 9000 (9 seconds), but it still errors out even with 9000 - it just takes a little longer to error.
>> And about this point :
>>   => I don’t actually know whether or if “buffer filling up” is accurate or relevant here.  It doesn’t seem that way.  You haven’t demonstrated that scenario here, just a timeout being reached.
>>   Actually i have caputured tcp dump firstly, and "tcp buffer filling up" seem to be demonstrated by "TCP windows full" packet."
>>   Secondly if data of fetch rows are not sufficiently wide, it does not reproduce this issue.
>>
>> So i suspect that the reason for this connection timeout is that the tcp buffer is full.
>
>I think this problem is not with async execution, but with your
>environment; if the root cause of it is “TCP windows full”, I think it
>might fix it to 1) retrieve only needed columns from the remote server
>and 2) decrease the fetch_size option for postgres_fdw.
>
>Best regards,
>Etsuro Fujita


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re:Re: Re: FDW connection drops with "Connection timed out" during async append query due to TCP receive buffer filling up
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox