public inbox for [email protected]help / color / mirror / Atom feed
Re: Long running query causing XID limit breach 8+ messages / 4 participants [nested] [flat]
* Re: Long running query causing XID limit breach @ 2024-05-29 08:08 Laurenz Albe <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: Laurenz Albe @ 2024-05-29 08:08 UTC (permalink / raw) To: yudhi s <[email protected]>; +Cc: sud <[email protected]>; pgsql-general <[email protected]> On Wed, 2024-05-29 at 01:34 +0530, yudhi s wrote: > > The only way you can have no delay in replication AND no canceled queries is > > if you use two different standby servers with different settings for > > "max_standby_streaming_delay". One of the server is for HA, the other for > > your long-running queries. > > When you suggest having different max_standby_streaming_delay for first replica > (say 10 sec for High availability) and second replica(say -1 for long running queries). > Do you also suggest keeping "hot_feedback_standby" as "OFF" for all the three > instances i.e. master and both the replicas? The parameter is ignored on the master. It needs to be off on the standby that is running long queries. For the other standby it probably doesn't matter if you are not running any queries on it. I would leave "hot_standby_feedback = off" there as well. Actually, I would set "hot_standby = off" on the standby that is only used for HA. > Also OP has added a few other parameters as below, do you think these should be needed? > I think the master and first replica should have the same set up because in case > of any disaster to master the first replica should be able to take the place of master. > > Master/Primary First Replica/Standby for High Availability Second Replica for Reporting > hot_standby_feedback=ON hot_standby_feedback=ON hot_standby_feedback=OFF > max_standby_streaming_delay=10 sec max_standby_streaming_delay=10 sec max_standby_streaming_delay=-1 (Infinite) > statement_timeout = "2hrs" statement_timeout="2hrs" No statement_timeout i.e. infinite > idle_in_transaction_session_timeout=10minutes idle_in_transaction_session_timeout=10minutes No idle_in_transaction_session_timeout i.e. infinite > autovacuum_freeze_max_age=100M autovacuum_freeze_max_age=100M autovacuum_freeze_max_age=100M > Log_autovacuum_min_duration=0 Log_autovacuum_min_duration=0 Log_autovacuum_min_duration=0 - I would leave "hot_standby_feedback" off everywhere. - "max_standby_streaming_delay" should be -1 on the reporting standby and very low or 0 on the HA standby. It doesn't matter on the primary. - "statement_timeout" should be way lower on the first two nodes. - "idle_in_transaction_session_timeout" is good. - I would leave "autovacuum_freeze_max_age" at the default setting but 100 million is ok too. Yours, Laurenz Albe ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: Long running query causing XID limit breach @ 2024-06-05 06:25 sud <[email protected]> parent: Laurenz Albe <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: sud @ 2024-06-05 06:25 UTC (permalink / raw) To: Laurenz Albe <[email protected]>; +Cc: yudhi s <[email protected]>; pgsql-general <[email protected]> Hello Laurenz, Thank you so much.This information was really helpful for us understanding the working of these parameters. One follow up question i have , as we are setting one of the standby/replica with value idle_in_transaction_session_timeout=-1 which can cause the WAL's to be heavily backlogged in a scenario where we have a query running for very long time on that instance. So in that case will there be chances of instance restart and if that can be avoided anyway? And the plan is to set these system parameters with different values in writer/read replica , so in that case if we apply the "alter system" command on the primary , won't the WAL going to apply those same commands forcibly on reader instance making those same as the writer instance configuration( but we want the reader replica configuration to be different from writer)? Appreciate your guidance. On Wed, May 29, 2024 at 1:38 PM Laurenz Albe <[email protected]> wrote: > On Wed, 2024-05-29 at 01:34 +0530, yudhi s wrote: > > > The only way you can have no delay in replication AND no canceled > queries is > > > if you use two different standby servers with different settings for > > > "max_standby_streaming_delay". One of the server is for HA, the other > for > > > your long-running queries. > > > > When you suggest having different max_standby_streaming_delay for first > replica > > (say 10 sec for High availability) and second replica(say -1 for long > running queries). > > Do you also suggest keeping "hot_feedback_standby" as "OFF" for all the > three > > instances i.e. master and both the replicas? > > The parameter is ignored on the master. > It needs to be off on the standby that is running long queries. > For the other standby it probably doesn't matter if you are not running any > queries on it. I would leave "hot_standby_feedback = off" there as well. > > Actually, I would set "hot_standby = off" on the standby that is only used > for HA. > > > - I would leave "hot_standby_feedback" off everywhere. > - "max_standby_streaming_delay" should be -1 on the reporting standby and > very > low or 0 on the HA standby. It doesn't matter on the primary. > - "statement_timeout" should be way lower on the first two nodes. > - "idle_in_transaction_session_timeout" is good. > - I would leave "autovacuum_freeze_max_age" at the default setting but 100 > million > is ok too. > > Yours, > Laurenz Albe > ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: Long running query causing XID limit breach @ 2024-06-05 07:08 Simon Elbaz <[email protected]> parent: sud <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: Simon Elbaz @ 2024-06-05 07:08 UTC (permalink / raw) To: sud <[email protected]>; +Cc: Laurenz Albe <[email protected]>; yudhi s <[email protected]>; pgsql-general <[email protected]> Hi, I am following this very interesting thread. From the documentation https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-T..., the 0 value will disable the timeout (not -1). On Wed, Jun 5, 2024 at 8:25 AM sud <[email protected]> wrote: > Hello Laurenz, > > Thank you so much.This information was really helpful for us > understanding the working of these parameters. > > One follow up question i have , as we are setting one of the > standby/replica with value idle_in_transaction_session_timeout=-1 which can > cause the WAL's to be heavily backlogged in a scenario where we have a > query running for very long time on that instance. So in that case will > there be chances of instance restart and if that can be avoided anyway? > > And the plan is to set these system parameters with different values in > writer/read replica , so in that case if we apply the "alter system" > command on the primary , won't the WAL going to apply those same commands > forcibly on reader instance making those same as the writer instance > configuration( but we want the reader replica configuration to be different > from writer)? > > Appreciate your guidance. > > On Wed, May 29, 2024 at 1:38 PM Laurenz Albe <[email protected]> > wrote: > >> On Wed, 2024-05-29 at 01:34 +0530, yudhi s wrote: >> > > The only way you can have no delay in replication AND no canceled >> queries is >> > > if you use two different standby servers with different settings for >> > > "max_standby_streaming_delay". One of the server is for HA, the >> other for >> > > your long-running queries. >> > >> > When you suggest having different max_standby_streaming_delay for first >> replica >> > (say 10 sec for High availability) and second replica(say -1 for long >> running queries). >> > Do you also suggest keeping "hot_feedback_standby" as "OFF" for all >> the three >> > instances i.e. master and both the replicas? >> >> The parameter is ignored on the master. >> It needs to be off on the standby that is running long queries. >> For the other standby it probably doesn't matter if you are not running >> any >> queries on it. I would leave "hot_standby_feedback = off" there as well. >> >> Actually, I would set "hot_standby = off" on the standby that is only used >> for HA. >> >> >> - I would leave "hot_standby_feedback" off everywhere. >> - "max_standby_streaming_delay" should be -1 on the reporting standby and >> very >> low or 0 on the HA standby. It doesn't matter on the primary. >> - "statement_timeout" should be way lower on the first two nodes. >> - "idle_in_transaction_session_timeout" is good. >> - I would leave "autovacuum_freeze_max_age" at the default setting but >> 100 million >> is ok too. >> >> Yours, >> Laurenz Albe >> > ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: Long running query causing XID limit breach @ 2024-06-05 07:39 sud <[email protected]> parent: Simon Elbaz <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: sud @ 2024-06-05 07:39 UTC (permalink / raw) To: Simon Elbaz <[email protected]>; Laurenz Albe <[email protected]>; +Cc: yudhi s <[email protected]>; pgsql-general <[email protected]> On Wed, 5 Jun, 2024, 12:39 pm Simon Elbaz, <[email protected]> wrote: > Hi, > > I am following this very interesting thread. > > From the documentation > https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-T..., > the 0 value will disable the timeout (not -1). > > > > On Wed, Jun 5, 2024 at 8:25 AM sud <[email protected]> wrote: > >> Hello Laurenz, >> >> Thank you so much.This information was really helpful for us >> understanding the working of these parameters. >> >> One follow up question i have , as we are setting one of the >> standby/replica with value idle_in_transaction_session_timeout=-1 which can >> cause the WAL's to be heavily backlogged in a scenario where we have a >> query running for very long time on that instance. So in that case will >> there be chances of instance restart and if that can be avoided anyway? >> >> And the plan is to set these system parameters with different values in >> writer/read replica , so in that case if we apply the "alter system" >> command on the primary , won't the WAL going to apply those same commands >> forcibly on reader instance making those same as the writer instance >> configuration( but we want the reader replica configuration to be different >> from writer)? >> >> Appreciate your guidance. >> >> > My apologies. I was meant to say setting up "max_standby_streaming_delay" To -1. Which means unlimited lag. > ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: Long running query causing XID limit breach @ 2024-06-05 10:22 Laurenz Albe <[email protected]> parent: sud <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: Laurenz Albe @ 2024-06-05 10:22 UTC (permalink / raw) To: sud <[email protected]>; Simon Elbaz <[email protected]>; +Cc: yudhi s <[email protected]>; pgsql-general <[email protected]> On Wed, 2024-06-05 at 13:09 +0530, sud wrote: > > > One follow up question i have , as we are setting one of the standby/replica > > > with value idle_in_transaction_session_timeout=-1 which can cause the WAL's > > > to be heavily backlogged in a scenario where we have a query running for very > > > long time on that instance. So in that case will there be chances of instance > > > restart and if that can be avoided anyway? > > My apologies. I was meant to say setting up "max_standby_streaming_delay" To -1. > Which means unlimited lag. There should never be a restart unless you perform one or the standby crashes. If you mean that you want to avoid a crash caused by a full disk on the standby, the answer is probably "no". Make sure that you have enough disk space and use monitoring. Yours, Laurenz Albe ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: Long running query causing XID limit breach @ 2024-06-05 19:21 yudhi s <[email protected]> parent: Laurenz Albe <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: yudhi s @ 2024-06-05 19:21 UTC (permalink / raw) To: Laurenz Albe <[email protected]>; +Cc: sud <[email protected]>; Simon Elbaz <[email protected]>; pgsql-general <[email protected]> On Wed, Jun 5, 2024 at 3:52 PM Laurenz Albe <[email protected]> wrote: > > There should never be a restart unless you perform one or the standby > crashes. > If you mean that you want to avoid a crash caused by a full disk on the > standby, > the answer is probably "no". Make sure that you have enough disk space and > use monitoring. > > Yours, > Laurenz Albe > Is this because OP initially mentioned its RDS postgres, so in that case there is storage space restriction on 64TB(and 128TB in case of aurora postgres). So I believe this storage space combines data + WAL , so in that case as you mentioned, appropriate monitoring needs to be put in place. Or else in the worst case scenario, if the storage consumption hit that hard limit , then there will be instance restart or crash? ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: Long running query causing XID limit breach @ 2024-06-08 09:21 sud <[email protected]> parent: yudhi s <[email protected]> 0 siblings, 1 reply; 8+ messages in thread From: sud @ 2024-06-08 09:21 UTC (permalink / raw) To: yudhi s <[email protected]>; +Cc: Laurenz Albe <[email protected]>; Simon Elbaz <[email protected]>; pgsql-general <[email protected]> On Thu, Jun 6, 2024 at 12:52 AM yudhi s <[email protected]> wrote: > On Wed, Jun 5, 2024 at 3:52 PM Laurenz Albe <[email protected]> > wrote: > >> >> There should never be a restart unless you perform one or the standby >> crashes. >> If you mean that you want to avoid a crash caused by a full disk on the >> standby, >> the answer is probably "no". Make sure that you have enough disk space >> and >> use monitoring. >> >> Yours, >> Laurenz Albe >> > > Is this because OP initially mentioned its RDS postgres, so in that case > there is storage space restriction on 64TB(and 128TB in case of aurora > postgres). So I believe this storage space combines data + WAL , so in that > case as you mentioned, appropriate monitoring needs to be put in place. > Or else in the worst case scenario, if the storage consumption hit that > hard limit , then there will be instance restart or crash? > Thank You so much Laurenz and Yudhi. Yes its RDS and as you mentioned there does exist a space limitation of ~64TB but as Laurenz mentioned the only time the second standby may crash would be probably because of the storage space saturation and thus we need to have appropriate monitoring in place to find this and get alerted beforehand. And also a monitoring to see how much WAL gets generated per hour/day to get an idea of the usage. I am not sure how to do it , but will check on this. ^ permalink raw reply [nested|flat] 8+ messages in thread
* Re: Long running query causing XID limit breach @ 2024-06-13 07:43 yudhi s <[email protected]> parent: sud <[email protected]> 0 siblings, 0 replies; 8+ messages in thread From: yudhi s @ 2024-06-13 07:43 UTC (permalink / raw) To: sud <[email protected]>; Laurenz Albe <[email protected]>; +Cc: Simon Elbaz <[email protected]>; pgsql-general <[email protected]> On Sat, Jun 8, 2024 at 2:51 PM sud <[email protected]> wrote: > > Thank You so much Laurenz and Yudhi. > > Yes its RDS and as you mentioned there does exist a space limitation of > ~64TB but as Laurenz mentioned the only time the second standby may crash > would be probably because of the storage space saturation and thus we need > to have appropriate monitoring in place to find this and get alerted > beforehand. And also a monitoring to see how much WAL gets generated per > hour/day to get an idea of the usage. I am not sure how to do it , but will > check on this. > Not exactly related but just for our information, While going through the "aurora postgres" database docs in regards to similar concepts which are getting discussed here, I am finding some interesting stuff. https://aws.amazon.com/blogs/database/manage-long-running-read-queries-on-amazon-aurora-postgresql-c... *Cancel the conflicting query on the reader node if the conflict lasts longer than max_standby_streaming_delay (maximum 30 seconds). This is different from Amazon RDS or self-managed PostgreSQL. With Amazon RDS or self-managed PostgreSQL, the instance has its own physical copy of the database, and you’re able to set the parameter max_standby_streaming_delay as high as you want to prevent query cancellation.If the conflicting query can’t cancel in time, or if multiple long-running queries are causing the replication lag to go beyond 60 seconds, Aurora restarts the reader node to ensure it’s not lagging far behind the primary node.* So if i get it correct it means, even if hot_standby_feedback is set to OFF, the constraints of max_standby_streaming_delay (30 seconds) and the 60-second replication lag limit applies. And thus Aurora may cancel long-running queries or restart reader nodes to maintain synchronization even if it just runs for >60seconds. So it's really odd but does that mean , by no way you can guarantee a query to run >60 seconds on read replica in aurora postgres? ^ permalink raw reply [nested|flat] 8+ messages in thread
end of thread, other threads:[~2024-06-13 07:43 UTC | newest] Thread overview: 8+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-05-29 08:08 Re: Long running query causing XID limit breach Laurenz Albe <[email protected]> 2024-06-05 06:25 ` sud <[email protected]> 2024-06-05 07:08 ` Simon Elbaz <[email protected]> 2024-06-05 07:39 ` sud <[email protected]> 2024-06-05 10:22 ` Laurenz Albe <[email protected]> 2024-06-05 19:21 ` yudhi s <[email protected]> 2024-06-08 09:21 ` sud <[email protected]> 2024-06-13 07:43 ` yudhi s <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox