Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvAgj-006tbl-2U for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Feb 2026 08:56:10 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1vvAgh-005DCO-2E for pgsql-hackers@arkaria.postgresql.org; Wed, 25 Feb 2026 08:56:07 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1vvAgh-005DCG-1B for pgsql-hackers@lists.postgresql.org; Wed, 25 Feb 2026 08:56:07 +0000 Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1vvAgb-000000015pj-06sx for pgsql-hackers@lists.postgresql.org; Wed, 25 Feb 2026 08:56:06 +0000 Received: by mail-wm1-x32f.google.com with SMTP id 5b1f17b1804b1-4836f363d0dso53535505e9.3 for ; Wed, 25 Feb 2026 00:56:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cybertec.at; s=google; t=1772009761; x=1772614561; darn=lists.postgresql.org; h=message-id:date:content-id:mime-version:comments:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=wcMDD8xeQ42AIda3Lvpl2nYr//ssEeU63KTwm4OtlEw=; b=otxQ10XORD9k60XOgu+z7SaBixrb2GeJIeOh7wpGVAExsYjWB1xbXNJ13NQ7J05zqr sGRegDYfW/b2603l+NyM8nzX6uMuF97WNTQrVhS4uvGX4mQWonYkQe/0Gw/cpD3j1KXE n5mcQOjSHnaEkcrFG1LLU6P1D/B+x9w8qCIpODp0V9JqNAGDVbVxxUBKutbZJU92+pew 9ssqbxO1hZg+szKyLgAHhPykgH7I3OI/HAW21GAM1cEO/ZARpa9KX0QyVhEY5YJRuc/X JPk1TnL/s/AK2N+qeQLk/JYwx1EnYW9l1TkOOCwmsbQWdDJPZl5xEW4lgULNPM8hYH6C cF0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1772009761; x=1772614561; h=message-id:date:content-id:mime-version:comments:references :in-reply-to:subject:cc:to:from:x-gm-gg:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=wcMDD8xeQ42AIda3Lvpl2nYr//ssEeU63KTwm4OtlEw=; b=NEHfP2DkrWh8GKjWq07z9KC2kghGz3zeTLv9FXgW/oXbpkP2LiyCoSWRxENcPATZbQ utjQjPh9b+QJLr9+7GeAzmbnSDvRZnesYtK2QiAp3I/WjTJCCRxnUj0YzKi3fz2Kp/RJ x9pg4bfmh6eXOZHQ5qwZ2zbF4HnlBTO6ktmEG4jfkqKQqQQMcVvYRhsodPAlUo2TQKsY Dvw7/SLSU345tlQaq8sBoUA6iK9PQ9s2xe6Ce7JUK8r+iexmwCW/3qhmRUrb/Xu90Ugb JCFQax+wJwEpx7iisyXDXJJOCjSS/xhGHy1MO0qvtGCHj21PYdAN02GcF/6xFrhYHrD6 XLIQ== X-Forwarded-Encrypted: i=1; AJvYcCU/LXtRagd/fU64FAu74s5ejHr2VFSrb0PIDZFXl291tKJbM7dcUlDcYxgtxmADlc9AgrHu84Pj8WOhAFtr@lists.postgresql.org X-Gm-Message-State: AOJu0YxkB/6tZIZ0M5+iSlItQrYfBfwViggYLUNnzv+tImRs/B7IcQYb 3HlVrPTTrq/nDTYb1YWWZcDtWHEK4sTxiNTosZMcAP5LvvOTURHF+/Ik7UshjEYKnSU= X-Gm-Gg: ATEYQzzld/ktFsH7NMFYZqAiqXfSFff0TS31dGSArDtzt7zQhwxDdTwo/CcmfyQUN05 hbfocE+xaaJ8f6vYLhb/ODfs586tBNbxyuJn4p3IPrMKdXYkynaZwbs+OJx9Xb1ZJh2fVcQp6W3 vYbYJ7sgXleywc3jA5huRI63dmI5ndmv0DQ92b/rBgOcJMdqXUMlkOMWmWpKS+UCWBMnyUWbWV3 Vk1XgeHCuwK/Hex0LlRFBSZrPz0NDlhu/1nxW1CVpSNBPkNKxHcktl7+xf9rM123t+2OCOWtLUa AsmpywqBMBdqwABT8f/6fRR1RIcSEytcINhQ9AslD89U94uhfTC8OR3wC5DLx7D9r0e5++ofK5X B5PcpnB0a0P4nAQzd63EauxcOLzKMJQIoM0TQzkxdEICFk1eO1jGm5OwlOXuzd+DI/1g1u7gTPK P5GuiICXdmYYpXorpjyveD7o425XVWZs5LIlZF X-Received: by 2002:a05:600c:620f:b0:483:129e:b573 with SMTP id 5b1f17b1804b1-483bef49d6fmr27747215e9.18.1772009760674; Wed, 25 Feb 2026 00:56:00 -0800 (PST) Received: from localhost (109-81-168-142.rct.o2.cz. [109.81.168.142]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-483bfb776a3sm9490705e9.7.2026.02.25.00.56.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 Feb 2026 00:56:00 -0800 (PST) From: Antonin Houska To: Alvaro Herrera cc: Mihail Nikalayeu , Pg Hackers , Robert Treat Subject: Re: Adding REPACK [concurrently] In-reply-to: <202602241757.6ac3iss2u4vo@alvherre.pgsql> References: <202602241757.6ac3iss2u4vo@alvherre.pgsql> Comments: In-reply-to Alvaro Herrera message dated "Tue, 24 Feb 2026 19:29:17 +0100." X-Mailer: MH-E 8.6+git; nmh 1.8; GNU Emacs 28.3 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-ID: <9114.1772009759.1@localhost> Date: Wed, 25 Feb 2026 09:55:59 +0100 Message-ID: <9116.1772009759@localhost> List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk Alvaro Herrera wrote: > On 2026-Feb-23, Alvaro Herrera wrote: > > Looking at this function in pgoutput_repack.c: > > > +/* Store concurrent data change. */ > > +static void > > +store_change(LogicalDecodingContext *ctx, ConcurrentChangeKind kind, > > + HeapTuple tuple) > > +{ > > [...] we have this: > > > + size = VARHDRSZ + SizeOfConcurrentChange; > > + > > + /* > > + * ReorderBufferCommit() stores the TOAST chunks in its private memory > > + * context and frees them after having called apply_change(). Therefore > > + * we need flat copy (including TOAST) that we eventually copy into the > > + * memory context which is available to decode_concurrent_changes(). > > + */ > > + if (HeapTupleHasExternal(tuple)) > > + { > > + /* > > + * toast_flatten_tuple_to_datum() might be more convenient but we > > + * don't want the decompression it does. > > + */ > > + tuple = toast_flatten_tuple(tuple, dstate->tupdesc); > > + flattened = true; > > + } > > + > > + size += tuple->t_len; > > + if (size >= MaxAllocSize) > > + elog(ERROR, "Change is too big."); > > + > > + /* Construct the change. */ > > + change_raw = (char *) palloc0(size); > > + SET_VARSIZE(change_raw, size); In 0005 ("Use background worker to do logical decoding"), the function is a bit simpler because here the decoding worker uses temporary file to send the data to the REPACKing backend, rather than tuplestore. sharedtuplestore.h would also work but I think we do not need its functionality, and AFAICS it always writes the data into a file anyway (i.e. it does not use memory even if the amount of data is small). (Perhaps 0004 should use the file too, in case 0005 does not make it into PG 19.) > I wonder if this isn't problematic with large tuples. If a row has some > very wide columns, each of which individually is less than 1 GB, then it > might happen that the sum of their sizes exceeds 1 GB, causing palloc() > to complain and abort the whole repack operation. This wouldn't be very > nice, so I think we need to address it somehow. I agree. > I think we need some new APIs to avoid all this copying. It appears > that it all starts with reorderbuffer doing something unhelpful with the > memory context of the TOAST chunks. Maybe we should address this by > "fixing" reorderbuffer so that it doesn't do this, instead of playing so > many games to cope. What I see is that reorderbuffer.c collects the TOAST pointers (ReorderBufferToastAppendChunk) and then, before passing the tuple to the output plugin, it copies the TOAST chunks referenced by the tuple to memory and replaces the "on-disk" TOAST pointers in the tuple with "external indirect" ones, pointing to the in-memory TOAST chunks (ReorderBufferToastReplace). For REPACK, I suggest a variant of toast_flatten_tuple() that writes the output to a file, and a corresponding function that reads it while allocating separate chunks of memory for the individual TOASTed attributes - the restored tuple would reference the chunks using the "external indirect" TOAST pointers, as if it had been processed by ReorderBufferToastReplace(). Does that make sense to you? -- Antonin Houska Web: https://www.cybertec-postgresql.com