Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nj1su-0000gI-O8 for pgsql-hackers@arkaria.postgresql.org; Mon, 25 Apr 2022 16:48:24 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nj1sr-0001wr-VR for pgsql-hackers@arkaria.postgresql.org; Mon, 25 Apr 2022 16:48:21 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nj1sr-0001wi-KV for pgsql-hackers@lists.postgresql.org; Mon, 25 Apr 2022 16:48:21 +0000 Received: from mail-pl1-x631.google.com ([2607:f8b0:4864:20::631]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nj1so-0002F2-Pa for pgsql-hackers@lists.postgresql.org; Mon, 25 Apr 2022 16:48:21 +0000 Received: by mail-pl1-x631.google.com with SMTP id s17so27738922plg.9 for ; Mon, 25 Apr 2022 09:48:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=MKU+R2KEQVxq0/vP2vdaMPtVxXiNfFPVTTShuEELt0Q=; b=IWeoEKwOKEeCvUFBNA9UyNVa+bga5Xyhs61j5VdXPdtELb1zulmJKgQCUVzPHQtgcy L+7oSZR/B1PMid15LdYt6UHI0lN0pl8m/TvWpXPE9SrJVluvkH8aRcuWBQyGSP+WGNJt Sy/p7cRxmLPbaNPGam1GCxRFxlwQ5DR40kRkEDM9zv0Q0Xj6ZYoqmBcr+qqKt4ff/IqS vNZ8TVCHmavFOx5K/xEMU1AaEn57Wakgk1dSq/vQX4j+IzlkkOHNBIsrFqf4qO18yLrm YVit3aK11iE4U5xnhN6Soza17T/QPCufAUZI7B0YC+6epuRT5m/NEmkLWuIToC0wmTCk kbWg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=MKU+R2KEQVxq0/vP2vdaMPtVxXiNfFPVTTShuEELt0Q=; b=zQ7MWwdeODgOWOXbmMpHWb88nhQMLmQlOGEVTPN6rT7fNx8ca7msALz8pgzviJZPEL MfzIKJkzYLdvZ8H90xQ1A2JkhnVNC7Va85H1RBP9fiUvsu42tpct29FCOJSXvlIs/eIs WCLVFEDKs4sMY5lsfcef6N4y3mQRb8BHZXzMwqRenTN5OFW3EQf9A8Ok+sPeK94fryj6 vadb53i8spac12/sQI5SFpR4mVH/3uB0xVzjV8WXJSWUPeRmKX3UVReJH7it5+syrBVa BmlsittKdiDaYzlHk58E6FmPMlbMx857Qp1dzCrm7oJU3d+G3uyzKQN9XWSxDyjh0wvF lZjw== X-Gm-Message-State: AOAM531jT2duQn/b0jtx+3sA1t0oLyF/Pyta8Y9CoXRae9T/5HBvrm1Y 3fnjjWAiprtIEjZMsI9rfcw= X-Google-Smtp-Source: ABdhPJy26TbRMfvpX3XJc+75rQLHhGZ7JMPg+v8jPUpk9nKLedEsTF7CWXyWhC2te+pRQxpcaznOlg== X-Received: by 2002:a17:902:9887:b0:151:6e1c:7082 with SMTP id s7-20020a170902988700b001516e1c7082mr18552522plp.162.1650905295941; Mon, 25 Apr 2022 09:48:15 -0700 (PDT) Received: from nathanxps13 ([50.54.155.70]) by smtp.gmail.com with ESMTPSA id p125-20020a62d083000000b0050d475ed4d2sm2887040pfg.197.2022.04.25.09.48.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 25 Apr 2022 09:48:15 -0700 (PDT) Date: Mon, 25 Apr 2022 09:48:13 -0700 From: Nathan Bossart To: Bharath Rupireddy Cc: PostgreSQL Hackers , SATYANARAYANA NARLAPURAM Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication Message-ID: <20220425164813.GA2890928@nathanxps13> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Apr 25, 2022 at 07:51:03PM +0530, Bharath Rupireddy wrote: > With synchronous replication typically all the transactions (txns) > first locally get committed, then streamed to the sync standbys and > the backend that generated the transaction will wait for ack from sync > standbys. While waiting for ack, it may happen that the query or the > txn gets canceled (QueryCancelPending is true) or the waiting backend > is asked to exit (ProcDiePending is true). In either of these cases, > the wait for ack gets canceled and leaves the txn in an inconsistent > state (as in the client thinks that the txn would have replicated to > sync standbys) - "The transaction has already committed locally, but > might not have been replicated to the standby.". Upon restart after > the crash or in the next txn after the old locally committed txn was > canceled, the client will be able to see the txns that weren't > actually streamed to sync standbys. Also, if the client fails over to > one of the sync standbys after the crash (either by choice or because > of automatic failover management after crash), the locally committed > txns on the crashed primary would be lost which isn't good in a true > HA solution. This topic has come up a few times recently [0] [1] [2]. > Thoughts? І think this will require a fair amount of discussion. I'm personally in favor of just adding a GUC that can be enabled to block canceling synchronous replication waits, but I know folks have concerns with that approach. When I looked at this stuff previously [2], it seemed possible to handle the other data loss scenarios (e.g., forcing failover whenever the primary shut down, turning off restart_after_crash). However, I'm not wedded to this approach. [0] https://postgr.es/m/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru [1] https://postgr.es/m/cac4b9df-92c6-77aa-687b-18b86cb13728%40stratox.cz [2] https://postgr.es/m/FDE157D7-3F35-450D-B927-7EC2F82DB1D6%40amazon.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com