Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oJVxq-0003pT-7V for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Aug 2022 08:12:18 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1oJVxp-0001xS-2q for pgsql-hackers@arkaria.postgresql.org; Thu, 04 Aug 2022 08:12:17 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1oJVxo-0001sE-Pz for pgsql-hackers@lists.postgresql.org; Thu, 04 Aug 2022 08:12:16 +0000 Received: from mail-lj1-x234.google.com ([2a00:1450:4864:20::234]) by makus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1oJVxm-0001du-Cd for pgsql-hackers@lists.postgresql.org; Thu, 04 Aug 2022 08:12:15 +0000 Received: by mail-lj1-x234.google.com with SMTP id v7so1890779ljj.4 for ; Thu, 04 Aug 2022 01:12:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc; bh=Hq4O6PhrXRWdgwNLwkd6chWxIZjxM3OvIVq9OG2vRfg=; b=ZiB49Zlzf/+6SEB7iDcqYBsF39fgB236ccvQA9DWBgBQWv9A5UQ7r9WwZkXuRzJwnB /gUPWi80cXhmg8AmL6voGYL5UCcpV7dUc6W1pFsx3ZT27ivkzLFHbZc0Ha5q+SPw+KtR VTKWJLjaPBncCFRmrXCx1jyld2V9DaBd0vd1/wpHSyUEgHVeoNOW/Hiy6G94QDz4heVJ FBp4XqO+yqILXmXa6KlWA5XXcmDoNtMS7anhLhbZEjGWenDJUUByPUtkzmsENoWnHHBj 3wejHJCplCQn8Aj+qHAa0sNYJ+N/sQF+CyvWXwbaF9awDPGZNtmF5DABMkQ3h8PopaMz Vl1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=Hq4O6PhrXRWdgwNLwkd6chWxIZjxM3OvIVq9OG2vRfg=; b=p0PXGuRzfsyNYmmc/fhn0P/Zjkro0CcWHYbNRQwQ9bX+6AmdlEbl+ZFOenKNiRBwf8 n/UNAZk9QGfdbjAupubeabUQgyfPmOBw++pCSQcmioHhTYaTM1LhcEJebqbEgIOXkHzL LXyUi6e2zRZDLHELASbbE3OC7XacoHjXajiDJtIQp8Dqi0WfRKHZ9lLoh5t3sS1omZgV P2kHX3uWMx24vuHC9QQOnVcHIb/mKrYl9AWYsVIyRicfNqtUDnWHG9m0zUrwdQ0xVmNG DieB+EQMZDBnIlOtsCjWuqTx0bB6qg5IqrfT+rLnjXt7Mjwg7o1FOJZfd5QAYdx2iZey KGkw== X-Gm-Message-State: ACgBeo3rbDvIudS6Kmg0NG6ipHt62eKVffEwybrn8fZyWpPCnVU3yMlJ XOZqKc9yGCRjA9C3m0sRIjuCBBP0L2VhMbp7y54= X-Google-Smtp-Source: AA6agR4OGQFjNpdy166nTedv+g6HXSjVy+opDZ+Ypzgs8QN3RCZDRw/7Ul+l31UG5VQWgl4MB/cy1JAxxSWXtEJE8s4= X-Received: by 2002:a05:651c:201:b0:25e:695d:2b4 with SMTP id y1-20020a05651c020100b0025e695d02b4mr223937ljn.87.1659600732883; Thu, 04 Aug 2022 01:12:12 -0700 (PDT) MIME-Version: 1.0 References: <9290b55b6ae2b04e002ca9dadadd1cca09461482.camel@cybertec.at> <763B5AF0-1C9E-4796-9639-F969A2E66189@yandex-team.ru> <11FF616C-C78C-41AA-A823-E3D4E745ACE5@yandex-team.ru> <4F070B19-51EC-4A05-A111-6001A961F991@yandex-team.ru> In-Reply-To: <4F070B19-51EC-4A05-A111-6001A961F991@yandex-team.ru> From: Bharath Rupireddy Date: Thu, 4 Aug 2022 13:42:02 +0530 Message-ID: Subject: Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication To: Andrey Borodin Cc: Dilip Kumar , Laurenz Albe , PostgreSQL Hackers , SATYANARAYANA NARLAPURAM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, Jul 25, 2022 at 4:20 PM Andrey Borodin wrote= : > > > 25 =D0=B8=D1=8E=D0=BB=D1=8F 2022 =D0=B3., =D0=B2 14:29, Bharath Rupired= dy =D0=BD=D0=B0=D0=BF=D0=B8=D1=81= =D0=B0=D0=BB(=D0=B0): > > > > Hm, after thinking for a while, I tend to agree with the above > > approach - meaning, query cancel interrupt processing can completely > > be disabled in SyncRepWaitForLSN() and process proc die interrupt > > immediately, this approach requires no GUC as opposed to the proposed > > v1 patch upthread. > GUC was proposed here[0] to maintain compatibility with previous behaviou= r. But I think that having no GUC here is fine too. If we do not allow canc= elation of unreplicated backends, of course. > > >> > >> And yes, we need additional complexity - but in some other place. Tran= saction can also be locally committed in presence of a server crash. But th= is another difficult problem. Crashed server must not allow data queries un= til LSN of timeline end is successfully replicated to synchronous_standby_n= ames. > > > > Hm, that needs to be done anyways. How about doing as proposed > > initially upthread [1]? Also, quoting the idea here [2]. > > > > Thoughts? > > > > [1] https://www.postgresql.org/message-id/CALj2ACUrOB59QaE6=3DjF2cFAyv1= MR7fzD8tr4YM5+OwEYG1SNzA@mail.gmail.com > > [2] 2) Wait for sync standbys to catch up upon restart after the crash = or > > in the next txn after the old locally committed txn was canceled. One > > way to achieve this is to let the backend, that's making the first > > connection, wait for sync standbys to catch up in ClientAuthentication > > right after successful authentication. However, I'm not sure this is > > the best way to do it at this point. > > > I think ideally startup process should not allow read only connections in= CheckRecoveryConsistency() until WAL is not replicated to quorum al least = up until new timeline LSN. We can't do it in CheckRecoveryConsistency() unless I'm missing something. Because, the walsenders (required for sending the remaining WAL to sync standbys to achieve quorum) can only be started after the server reaches a consistent state, after all walsenders are specialized backends. --=20 Bharath Rupireddy RDS Open Source Databases: https://aws.amazon.com/rds/postgresql/