Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfAzR-004wLS-3D for pgsql-hackers@arkaria.postgresql.org; Mon, 12 Jan 2026 06:01:22 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vfAzQ-00DFxa-0H for pgsql-hackers@arkaria.postgresql.org; Mon, 12 Jan 2026 06:01:20 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vfAzP-00DFxJ-2M for pgsql-hackers@lists.postgresql.org; Mon, 12 Jan 2026 06:01:20 +0000 Received: from mail-pg1-x530.google.com ([2607:f8b0:4864:20::530]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1vfAzO-0067Za-0r for pgsql-hackers@lists.postgresql.org; Mon, 12 Jan 2026 06:01:20 +0000 Received: by mail-pg1-x530.google.com with SMTP id 41be03b00d2f7-b553412a19bso2761745a12.1 for ; Sun, 11 Jan 2026 22:01:17 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1768197676; x=1768802476; darn=lists.postgresql.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HiLaISU3kk3c7QW/uxS/I9rhhojisswaNdMvF1iMsfU=; b=DBCyJZ7XbiZEPvcdj5uhYEdPSXEyM5dnZo+Are/vXRlhdFE8oS/c5kvJ2XvKLGd0lE GgegggabETorZE4XfWL/qjXIrpsQlr6bdhGeqfkqc8tKrS6S8F952Dm9oKBkOikERMFU Akdqwg7dbkfuQHDs1LWZKuRjrLtMBpZp8X75t4H6nZBSqsK13PE4JSsZRreGx1rMPTiN ce+ndHqbmRSDmDo7n8x+mFa1MV7JjRleSF8B8z8M1LJSlxPc8nwyauB1gcYh6x/p5pph yz+coaJIdjyiZvZLqWIa7Q5lBHXRFYo5UHtG30d+FiUSqDrF4Vvci/soLNMvvxOgN+WW iM7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1768197676; x=1768802476; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=HiLaISU3kk3c7QW/uxS/I9rhhojisswaNdMvF1iMsfU=; b=oaUyd5vxCS9b/PI61O+Og6jge1OQ9/6d5sLRIunN0I1icqYqglUhR2s7l86Rnm6sgM vScMbkFUQeEvSSX02wYIhnrEcPBSJyy6uClQEfZIcCJlCxUynmgBs0t5ALuC61tum4Er Y03MHp1z2KK1pv1VhrsLcPnsoKQ0yxWzbm7g0bZvAhG2KhNqdpKJ73a13+9+rXOYwII7 i868+mTHnzSAlq/Vb81Yst5zmfmwvN9B8fdV9waEBxA0vq1FDUogJlN4YldFjUZgeeCi KpRNaFnIvy55vON/uhiiSq0AGk4wGvuW7JB2sL63H5J27KECfz0GZentuXxSbwYlRzZ7 AZ3w== X-Forwarded-Encrypted: i=1; AJvYcCVSFlq6aVe1xaxx2NYKS+x/isHbK8Nn0xa+7wnaHCIniiuB0+KyC8xdJrpBt52h911WBpIcbZunPVNtdW3z@lists.postgresql.org X-Gm-Message-State: AOJu0YwYYSJsDBzTN6+yVjXoz7Bh4eNSbpNzChrZMGGTMVMnhGzMZ2qJ 09HL+nCTWGvpdQmyLCog5y+7veXyE1SDuOujyr1GO45bmzDTtu5Oa4qnqyBKYmpy00wna0okLn0 hYXtvJvSHk79kOsvuKMic9aNskyR+P4Y= X-Gm-Gg: AY/fxX7LAJMvqUIJ2X6GOQEIF9QqG9Z+LPjx4a3lLgvBkleiVCuxzr9gx0e6P6Q4wji tSGZWiGnlCtsYx3lTPWQdAR3rSrSgUc2ObnPWVaRTx8MDbSBVY59lzfhQdx3o04c1j6yu9Ap9Mb H5RLUiofpsWCe2MxwVTWRhjgdpekjqWZYxtZaCPyrNH/QLaYzCX6Z9puU3Rhcraf5PMKEtyeC8e LaFomZSSBIcQxsKLHfEJLor2CAbWjksfOsJJ+JzvHCScb8E4xRNg4qltJ7HppEOsVSBEEnJlGYP ck10h8YVN81juf8M/2JYKjTqzNI9xbSdmFVEV7vl X-Google-Smtp-Source: AGHT+IH7dFNy3MVI33hfxCbL/0LNubssAAl3R9geJC9+IFOoR9Aj+9LDzemsAXnURu/ls1zo3/A1aCj/QHdRIn56goE= X-Received: by 2002:a17:90b:2741:b0:341:88c9:aefb with SMTP id 98e67ed59e1d1-34f68c0241cmr13134509a91.5.1768197675963; Sun, 11 Jan 2026 22:01:15 -0800 (PST) MIME-Version: 1.0 References: In-Reply-To: From: shveta malik Date: Mon, 12 Jan 2026 11:31:02 +0530 X-Gm-Features: AZwV_QgRHUBu-ABHP0iB6P8ijDmJ6V3CCbBqBBi79kVK4-opTtfsM7qZH4GQ5UM Message-ID: Subject: Re: [Patch] add new parameter to pg_replication_origin_session_setup To: "Hayato Kuroda (Fujitsu)" Cc: Amit Kapila , "Zhijie Hou (Fujitsu)" , Doruk Yilmaz , "pgsql-hackers@lists.postgresql.org" , shveta malik Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Fri, Jan 9, 2026 at 4:58=E2=80=AFPM Hayato Kuroda (Fujitsu) wrote: > > Dear Amit, Shveta, > > > > > > > > > Thanks Hou-San and Kuroda-San. > > > > > > > > What should be the expected behavior when Session1 resets the origi= n > > > > (changing acquired_pid from its own PID to 0), while Session2 is > > > > already connected to the origin and Session3 also attempts to reuse > > > > the same origin? > > > > > > > > Currently it asserts: > > > > > > > > Session1: > > > > select pg_replication_origin_create('origin'); > > > > SELECT pg_replication_origin_session_setup('origin'); > > > > > > > > Session2: > > > > SELECT pg_replication_origin_session_setup('origin',48028); > > > > > > > > Session1: > > > > SELECT pg_replication_origin_session_reset(); > > > > > > > > Session3: > > > > SELECT pg_replication_origin_session_setup('origin'); > > > > This asserts at: > > > > TRAP: failed Assert("session_replication_state->refcount =3D=3D 0")= , File: > > > > "origin.c", Line: 1231, PID: 48037 > > > > > > FYI, this happened because v1 assumed refcount was 0 if acquired_by was 0= . > But your proposed scenario met it. > > > > I checked the behavior on HEAD. Session3 is able to set up the origin > > > and sets its own PID in acquired_pid. But it is unclear to me which > > > PID should be recorded in acquired_pid - Session2=E2=80=99s PID, sinc= e it set > > > up the origin earlier, or Session3=E2=80=99s PID. Or does this even m= ake any > > > difference? > > To clarify, I think the behavior on HEAD is not correct. The backend shou= ld > acquire the active origin if it expressly specifies the PID. Otherwise, u= sers > may acquire unintentionally and advance it. I agree. > > > Can we address these problems by prohibiting leader worker to reset > > when pa workers are still associated with the origin? The way for > > leader to know if pa workers are associated with origin is by checking > > following condition: acquired_by =3D=3D MyProcpid AND refcount > 1. > > I think it's okay. IIUC, the idea is to avoid that active origin has inva= lid > acquired_by attribute. The replication origin was extended to support par= allel > apply of logical replication, and it is reasonable to force the same appr= oach. > Attached 0001 implemented that. > > One concern with the implementation is that acquired_by can be zero if th= e process > exits without releasing the origin; this can happen if the first acquired= process > exits while the second is still using it. > Regarding our logical replication, it won't be problematic because the le= ader > worker ensures all parallel workers finish before it exits. > > To address the issue, I propose that another process should not be able t= o > acquire the origin if the acquired_by of the active origin is 0. The prob= lem > should be resolved within the SQL interface, since it occurs there. > +1. Please find a few comments: 1) + /* + * The replication origin cannot be reset if the replication origin is + * firstly acquired by this backend and other processes are actively using + * now. This can cause acquired_by to be zero and active replication origi= n + * might be dropped. + */ + if (session_replication_state->acquired_by =3D=3D MyProcPid && + session_replication_state->refcount > 1) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg_plural("another process is acquiring the replication origin", + "other processes are acquiring the replication origin", + Since user is not aware of internal acquired_by logic, the error might not make much sense to him as to why one session is able to reset while another is not. Shall we make it: ERROR: cannot reset replication origin "origin_name" while it is still shared by other processes DETAIL: the current session is the first process for this replication origin, and other processes are sharing it. HINT: ensure this replication origin is reset in all other processes first= . 2) When the first session leaves, while the second session is still using origin, the third session is able to drop the origin which is not right. I think replorigin_state_clear() needs a change. 'if (state->acquired_by !=3D 0)' check should be replaced by 'if (state->refcount > 0)' Patch 001 had correct changes in replorigin_state_clear(), IMO we still need those 3) When first session leaves, while second session is still using origin, now correctly third session is not able to join it. It gives error: postgres=3D# SELECT pg_replication_origin_session_setup('origin'); ERROR: replication origin with ID 1 is already active for another process Error is not very informative provided the fact that now sharing is allowed. Shall it be: ERROR: replication origin "origin_name" cannot be acquired while it is still in use DETAIL: the process that first acquired this origin exited without releasing it. HINT: wait until all processes sharing this origin have released it. Or HINT: ensure this replication origin is reset in all other processes first= . 4) + /* Number of backend that is currently using this origin. */ Number of backend that is --> Count of backends that are thanks Shveta