Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1osdIb-0003DL-Eu for pgsql-hackers@arkaria.postgresql.org; Wed, 09 Nov 2022 05:06:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1osdIZ-0002fR-VN for pgsql-hackers@arkaria.postgresql.org; Wed, 09 Nov 2022 05:06:51 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1osdIZ-0002fG-Jz for pgsql-hackers@lists.postgresql.org; Wed, 09 Nov 2022 05:06:51 +0000 Received: from mail-lf1-x132.google.com ([2a00:1450:4864:20::132]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1osdIX-0003g5-DR for pgsql-hackers@lists.postgresql.org; Wed, 09 Nov 2022 05:06:50 +0000 Received: by mail-lf1-x132.google.com with SMTP id be13so24089221lfb.4 for ; Tue, 08 Nov 2022 21:06:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=ydVDuespFnKrUkZMF9BObNO0yMewuqaWa2MiY1r2jn4=; b=Hma1MqqXqY2KLJzSbx+oVLA6rlRejolJ+Uw4e3l5/tTd1joryQvlmjGpb6oxnR6TqR TkSBFfKaSDhFEeupt7wHckD+bWCmt9x4AekM147B0nSR1ZR4Ah7JJX3shyvbD2dJhl0Z 0tsilHzb3SwzS89BNNUukbOoCBY5rR9/e8wBv+mGEHClOcHvo/LNZk+im2emHk1Db5Fx 29PaZ5VOjRHc0Bd7B4I0SqkRclzuVTQ13rB1CSIoN5XzpxykWH0bAp+O2agcQ1WatHCs JQwq06nYwna6FwSeXVsbLvVpQ8SHSi0aL8sdKFcvjZrpKrVjt5AObN/by9S1CWYUtCdr ox3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=ydVDuespFnKrUkZMF9BObNO0yMewuqaWa2MiY1r2jn4=; b=zcjW5RebPLSTZ0jj14YLQA3jGaqpzrgb9oV3+MZ0q01IN6kPCRVV0qBGSvugDhJHRx 8hwrQumftF+zRXDDKCXPBOKfwjBT6r55FaoK6zIPXzvZ7J4NOmtwW8+oeirjNoDerQO+ 6nDh7d5vHsYvHHLnf7v/y9ppvgfpDBN9M/hXjAts8i+aYy6k0rwjbOyQYtipPpnqZhSd h0RTHuEM+4uBLZyu1gdwichfOWhqC/uuWdfioOb0dGGQJymIkb6XEWPZcNZ1k/LZToTn iBwhtIM16MvIrUZP2zdSFXHbF3TJ+fGJDYW7onUzUyhphDAuJjza0PM9/mv0F3qynRFv rnUg== X-Gm-Message-State: ACrzQf3SyX67/h2tRE+jlRtPAveJWRl6apCfvlRWz7b1+bv73sym5HNT tBzLj6IzwV5wl3j+KVQFC6aNsKcRB71j8NS1ph4= X-Google-Smtp-Source: AMsMyM4XTXo3TrKZZDfqZDOTb3QOcSS0kTJjFePGP5ALxKR21vSC/qnMjsgLZPtoinXKMmEFjDKO/n7eDxjAqBvDMx0= X-Received: by 2002:a05:6512:210b:b0:4a2:361d:da22 with SMTP id q11-20020a056512210b00b004a2361dda22mr19312626lfr.561.1667970407806; Tue, 08 Nov 2022 21:06:47 -0800 (PST) MIME-Version: 1.0 References: <9290b55b6ae2b04e002ca9dadadd1cca09461482.camel@cybertec.at> <20220805.114916.994654810780821553.horikyota.ntt@gmail.com> <20220809.161236.1486509314201074910.horikyota.ntt@gmail.com> In-Reply-To: From: Andrey Borodin Date: Tue, 8 Nov 2022 21:06:36 -0800 Message-ID: Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication To: Bruce Momjian Cc: Bharath Rupireddy , Kyotaro Horiguchi , Laurenz Albe , PostgreSQL Hackers , SATYANARAYANA NARLAPURAM Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Thu, Sep 29, 2022 at 3:53 PM Bruce Momjian wrote: > > So, what happens when an insufficient number of synchronous replicas > reply? It's a failover. > Sessions hang because the synchronous behavior cannot be > guaranteed. We then _allow_ query cancel so the user or administrator > can get out of the hung sessions and perhaps modify > synchronous_standby_names. Administrators should not modify synchronous_standby_names. Administrator must shoot this not in the head. > I have always felt this has to be done at the server level, meaning when > a synchronous_standby_names replica is not responding after a certain > timeout, the administrator must be notified by calling a shell command > defined in a GUC and all sessions will ignore the replica. Standbys are expelled from the waitlist according to quorum rules. I'd propose not to invent more quorum rules involving shell scripts. The Administrator expressed what number of standbys can be offline by setting synchronous_standby_names. They actively asked for hanging queries in case of insufficient standbys. We have reserved administrator connections for the case when all connection slots are used by hanging queries. Best regards, Andrey Borodin.