Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pMN8m-0000tN-E6 for pgsql-hackers@arkaria.postgresql.org; Mon, 30 Jan 2023 05:55:40 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1pMN8l-0000l3-3E for pgsql-hackers@arkaria.postgresql.org; Mon, 30 Jan 2023 05:55:39 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pMN8k-0000ku-Or for pgsql-hackers@lists.postgresql.org; Mon, 30 Jan 2023 05:55:38 +0000 Received: from mail-lf1-x133.google.com ([2a00:1450:4864:20::133]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1pMN8i-0004um-53 for pgsql-hackers@lists.postgresql.org; Mon, 30 Jan 2023 05:55:38 +0000 Received: by mail-lf1-x133.google.com with SMTP id x40so17360243lfu.12 for ; Sun, 29 Jan 2023 21:55:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=SdmoqqnI5oaacPXgBFBD3LlmuGlWaIYhfQtPNmc/pYM=; b=P+sq698F5i6TAXxnXPMsucCs+3BNUamvno5qBus3SMtWIvJxqxjr+T7U5gDm24fctH 3q4dZKGNkfVRJxURLckpr+9uTJecnio6+sHbzhGtZswGWKdca2nz6VL8I9YaSnVCunco 1hC8piUPMZezEWGv1ycZcwc4pQQbgNFhtdd674WZA5OaaHstsGeTSY6C++Tum1qfeEOa Fj+KugUzAoD03NS/HUP5ngF+Q+IZSnWEm08xKNQ/z7c9JSsZmqkrOs6d9oCLH215DaKR kFpB3jXAgIly7r4ol1CAp/bWaBaI3HBnXb6v8Cxy6EmL1/1m7S+9AJA0oL0upUQI12R7 n3Bw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=SdmoqqnI5oaacPXgBFBD3LlmuGlWaIYhfQtPNmc/pYM=; b=cYAaZg5rzckDCet4hLxHkyExnLdRRz5A3sQoE1dmGYyPyrb/BSCtHxw1Y0E1tGnuFp jQ1Yk/cPo6HlrJXxTThD6AqHeEcAIlUjuIcbx5C/HuvB/L1QdX9xifq3A3+8Yz6J9OyR re/sDcLkDGSLJFaVMPx/73Bc04MHW4Zuzrx5VwWPWtxVstP9ALBN26w7jC2xITWtKP2U DW26kM9dsdHOwcN60lHft+MP0OMYZ0iW+h/UTb8OvpgOgqVF0fgotoYe+duoh//uDWa9 HROXhoPayJ7O8pV0WSzUL4/w/GRI7JZv3MCoALnwoHqSbEtmbl8pLCo2FTX7WtH623gX HUMA== X-Gm-Message-State: AFqh2kpftgvM6zd2vamyig0p+HmO/x2fyDJi2o5pzqp135TCHKL1xzGw gL0AmmAQIb8V6XBSe8v0TGZtcrbmP3dU1jinLyc= X-Google-Smtp-Source: AMrXdXuxkGpvoqedkiTaC4PnIUzmC518j5y1FjXszYDgv259lniNPk4ryRu7ElUOj8BL+/mFtGhVzpDvgxAx5yHbaWU= X-Received: by 2002:a05:6512:308a:b0:4d1:3e32:5436 with SMTP id z10-20020a056512308a00b004d13e325436mr6162869lfd.215.1675058134841; Sun, 29 Jan 2023 21:55:34 -0800 (PST) MIME-Version: 1.0 References: <20220805.114916.994654810780821553.horikyota.ntt@gmail.com> <20220809.161236.1486509314201074910.horikyota.ntt@gmail.com> In-Reply-To: From: Bharath Rupireddy Date: Mon, 30 Jan 2023 11:25:23 +0530 Message-ID: Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication To: SATYANARAYANA NARLAPURAM , Bruce Momjian , Andrey Borodin Cc: Kyotaro Horiguchi , Laurenz Albe , PostgreSQL Hackers Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Tue, Nov 29, 2022 at 10:45 PM SATYANARAYANA NARLAPURAM wrote: > > On Tue, Nov 29, 2022 at 8:42 AM SATYANARAYANA NARLAPURAM wrote: >> >> On Tue, Nov 29, 2022 at 8:29 AM Bruce Momjian wrote: >>> >>> On Tue, Nov 29, 2022 at 08:14:10AM -0800, SATYANARAYANA NARLAPURAM wrote: >>> > 2. Process proc die immediately when a backend is waiting for sync >>> > replication acknowledgement, as it does today, however, upon restart, >>> > don't open up for business (don't accept ready-only connections) >>> > unless the sync standbys have caught up. >>> > >>> > Are you planning to block connections or queries to the database? It would be >>> > good to allow connections and let them query the monitoring views but block the >>> > queries until sync standby have caught up. Otherwise, this leaves a monitoring >>> > hole. In cloud, I presume superusers are allowed to connect and monitor (end >>> > customers are not the role members and can't query the data). The same can't be >>> > true for all the installations. Could you please add more details on your >>> > approach? >>> >>> I think ALTER SYSTEM should be allowed, particularly so you can modify >>> synchronous_standby_names, no? >> >> Yes, Change in synchronous_standby_names is expected in this situation. IMHO, blocking all the connections is not a recommended approach. > > How about allowing superusers (they can still read locally committed data) and users part of pg_monitor role? I started to spend time on this feature again. Thanks all for your comments so far. Per latest comments, it looks like we're mostly okay to emit a warning and ignore query cancel interrupts while waiting for sync replication ACK. For proc die, it looks like the suggestion was to process it immediately and upon next restart, don't allow user connections unless all sync standbys were caught up. However, we need to be able to allow replication connections from standbys so that they'll be able to stream the needed WAL and catch up with primary, allow superuser or users with pg_monitor role to connect to perform ALTER SYSTEM to remove the unresponsive sync standbys if any from the list or disable sync replication altogether or monitor for flush lsn/catch up status. And block all other connections. Note that replication, superuser and users with pg_monitor role connections are allowed only after the server reaches a consistent state not before that to not read any inconsistent data. The trickiest part of doing the above is how we detect upon restart that the server received proc die while waiting for sync replication ACK. One idea might be to set a flag in the control file before the crash. Second idea might be to write a marker file (although I don't favor this idea); presence indicates that the server was waiting for sync replication ACK before the crash. However, we may not detect all sorts of crashes in a backend when it is waiting for sync replication ACK to do any of these two ideas. Therefore, this may not be a complete solution. Third idea might be to just let the primary wait for sync standbys to catch up upon restart irrespective of whether it was crashed or not while waiting for sync replication ACK. While this idea works well without having to detect all sorts of crashes, the primary may not come up if any unresponsive standbys are present (currently, the primary continues to be operational for read-only queries at least irrespective of whether sync standbys have caught up or not). Thoughts? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com