Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t5LLU-005UKd-GY for pgsql-hackers@arkaria.postgresql.org; Mon, 28 Oct 2024 08:43:28 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1t5LLS-004kMk-D8 for pgsql-hackers@arkaria.postgresql.org; Mon, 28 Oct 2024 08:43:26 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1t5LLS-004kLt-1h for pgsql-hackers@lists.postgresql.org; Mon, 28 Oct 2024 08:43:26 +0000 Received: from mail-ej1-x635.google.com ([2a00:1450:4864:20::635]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1t5LLP-003Nqb-Pm for pgsql-hackers@postgresql.org; Mon, 28 Oct 2024 08:43:25 +0000 Received: by mail-ej1-x635.google.com with SMTP id a640c23a62f3a-a9a977d6cc7so284760366b.3 for ; Mon, 28 Oct 2024 01:43:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730105002; x=1730709802; darn=postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Pdq/0ZcUMhKve5WMz8KNRTw6L6ny+pdbAdAxcAFCpZE=; b=Ig9EkZ9RAyNwTA1BXVWvXLEf9/WzsP6g8l51fZkWv5oynjSikX/+NFPQS8FdaEicF4 ady/uEBIbDHndWeIslP2jdhSqbvFAc44DTzORnZiCB/mJl46PCKuGYVHTqJX6RGmEGEh kPbNk4LhrRaCH8knl3qQEJ46iqNwua/CDx/X3lmhTXxcO/HMLAHdf9jsYXXkE0ynHHyj KdpbXdFfomlW0Qw7BCKSu4ierFRAIH3Gt4jtZZ7mOiQun6/OYfzGUGirVOrGs8yzNkul j+r32Z4g51hid//YfCYQhh3AxxNa73I49LFNZDX4gZQTl2uJoQ+919x3bPCf5PlROvyu ILng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730105002; x=1730709802; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Pdq/0ZcUMhKve5WMz8KNRTw6L6ny+pdbAdAxcAFCpZE=; b=T0fKkp+1v1n5nmeA7ch+3HOEYPu69MjfvnzEHwtWOSHKajD47p2yCEyXofOOXrVVfv TIwiFaqDcAS7Udar9EhraBl8JNl9SsUQ6Fs7fVB/NHJumXu/m1ilqyOVXSez0LrtORUq RXKhNbfTF5tfA3lkyOUm00ba2RFr/x9pX8AKD3xvG8c1beFNNGaOxlhXg6DHV7Ower6r cDVr0hZvg9LwShw0jNcP/Ko45JGKQHBNv/UfTZ7bZOIeGVPHK97sUFgRyOik3a8JamyJ L8Dpl4YbYDErsgdlZ7CHVwDHtS+XpH8P4bVBg2aH60ekJ6AUIUBDwX4yKv5+98uLT0Tl ncOQ== X-Forwarded-Encrypted: i=1; AJvYcCXrUncIvxo40gvHd2JsJnB/tCnWn8AfItYw1whQ+ix4nnZvZF0uUtCL9g/7/GvE7u+fJULWt+Tx8Wk9IomV@postgresql.org X-Gm-Message-State: AOJu0YzP1teklZcqWMEx1c2BQII+eEtkmWLOlBP6uq240eAx38aRfCoi B3ceC5ote5baltYYJUIgyeSlHWO5ZzLX1RM9eYH8Kra3aQ5I7joTtu8N15mpWiTVBFPITs553zU LzblCj6uW7s8tM60VvzttttxvGoY= X-Google-Smtp-Source: AGHT+IHjdiDMwtG+IdWZRC594hhmWp8fswQS04/P++mi7egPAtaEv9qdcrEpp9oeLjHzF/enaB01Xher8T9kgLShRME= X-Received: by 2002:a05:6402:4584:b0:5cb:728e:926b with SMTP id 4fb4d7f45d1cf-5cbbf8c2eacmr8907948a12.17.1730105001384; Mon, 28 Oct 2024 01:43:21 -0700 (PDT) MIME-Version: 1.0 References: <20230603223824.o7iyochli2dwwi7k@alap3.anarazel.de> <6be6f58815dc0844fbe058edf56b4e735a6efc1c.camel@j-davis.com> <2280bf7241119bb88cbe0fe5eb36490cbd04c0c0.camel@j-davis.com> <12f1a2d8dd3b6305c0354f1c701f44b7be5e54eb.camel@j-davis.com> <8633171cb034aafc260fdf37df04b6c779aa1e2f.camel@j-davis.com> <229c4f7219ed164088dadc935df21e1cf125e191.camel@j-davis.com> <23a29125a2d07f96d49f97c31fcdb09a7f0ff6c1.camel@j-davis.com> In-Reply-To: From: Jingtang Zhang Date: Mon, 28 Oct 2024 16:43:09 +0800 Message-ID: Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM To: Bharath Rupireddy Cc: pgsql-hackers@lists.postgresql.org, PostgreSQL-development Content-Type: multipart/alternative; boundary="00000000000008282a0625857620" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000008282a0625857620 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi! Glad to see update in this thread. Little question about v24 0002 patch: would it be better to move the implementation of TableModifyIsMultiInsertsSupported to somewhere for table AM level? Seems it is a common function for future use, not a specific one for matview. --- Regards, Jingtang Bharath Rupireddy =E4=BA=8E2024=E5= =B9=B410=E6=9C=8826=E6=97=A5=E5=91=A8=E5=85=AD 21:31=E5=86=99=E9=81=93=EF=BC=9A > Hi, > > Thanks for looking into this. > > On Thu, Aug 29, 2024 at 12:29=E2=80=AFPM Jeff Davis w= rote: > > > > I believe we need the branching in the caller anyway: > > > > 1. If there is a BEFORE row trigger with a volatile function, the > > visibility rules[1] mean that the function should see changes from all > > the rows inserted so far this command, which won't work if they are > > still in the buffer. > > > > 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say > > that the function should see all previous rows inserted. > > > > 3. If there are volatile functions in the target list or WHERE clause, > > the same visibility semantics apply. > > > > 4. If there's a "RETURNING ctid" clause, we need to either come up with > > a way to return the tuples after flushing, or we need to use the > > single-tuple path. (Similarly in the future when we support UPDATE ... > > RETURNING, as Matthias pointed out.) > > > > If we need two paths in each caller anyway, it seems cleaner to just > > wrap the check for tuple_modify_buffer_insert in > > table_modify_buffer_enabled(). > > > > We could perhaps use a one path and then force a batch size of one or > > something, which is an alternative, but we have to be careful not to > > introduce a regression (and it still requires a solution for #4). > > I chose to branch in the caller e.g. if there's a volatile function > SELECT query of REFRESH MATERIALIZED VIEW, the caller goes > table_tuple_insert() path, else multi-insert path. > > I am posting the new v24 patch set organized as follows: 0001 > introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003 > using the new table AM for COPY ... FROM. I, for now, discarded the > INSERT INTO ... SELECT and Logical Replication Apply patches, the idea > is to take the basic stuff forward. > > I reworked structure names, members and function names, reworded > comments, addressed review comments in the v24 patches. Please have a > look. > > -- > Bharath Rupireddy > PostgreSQL Contributors Team > RDS Open Source Databases > Amazon Web Services: https://aws.amazon.com > --00000000000008282a0625857620 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi! Glad to see update in this thread.

Little = question about v24 0002 patch: would it be better to move the
implementa= tion of TableModifyIsMultiInsertsSupported to somewhere for table AM
lev= el? Seems it is a common function for future use, not a specific one for
matview.

---

Regards, Jingtang

Bharath R= upireddy <bhar= ath.rupireddyforpostgres@gmail.com> =E4=BA=8E2024=E5=B9=B410=E6=9C= =8826=E6=97=A5=E5=91=A8=E5=85=AD 21:31=E5=86=99=E9=81=93=EF=BC=9A
=
Hi,

Thanks for looking into this.

On Thu, Aug 29, 2024 at 12:29=E2=80=AFPM Jeff Davis <pgsql@j-davis.com> wrote:
>
> I believe we need the branching in the caller anyway:
>
> 1. If there is a BEFORE row trigger with a volatile function, the
> visibility rules[1] mean that the function should see changes from all=
> the rows inserted so far this command, which won't work if they ar= e
> still in the buffer.
>
> 2. Similarly, for an INSTEAD OF row trigger, the visibility rules say<= br> > that the function should see all previous rows inserted.
>
> 3. If there are volatile functions in the target list or WHERE clause,=
> the same visibility semantics apply.
>
> 4. If there's a "RETURNING ctid" clause, we need to eith= er come up with
> a way to return the tuples after flushing, or we need to use the
> single-tuple path. (Similarly in the future when we support UPDATE ...=
> RETURNING, as Matthias pointed out.)
>
> If we need two paths in each caller anyway, it seems cleaner to just > wrap the check for tuple_modify_buffer_insert in
> table_modify_buffer_enabled().
>
> We could perhaps use a one path and then force a batch size of one or<= br> > something, which is an alternative, but we have to be careful not to > introduce a regression (and it still requires a solution for #4).

I chose to branch in the caller e.g. if there's a volatile function
SELECT query of REFRESH MATERIALIZED VIEW, the caller goes
table_tuple_insert() path, else multi-insert path.

I am posting the new v24 patch set organized as follows: 0001
introducing the new table AM, 0002 optimizing CTAS, CMV and RMV, 0003
using the new table AM for COPY ... FROM. I, for now, discarded the
INSERT INTO ... SELECT and Logical Replication Apply patches, the idea
is to take the basic stuff forward.

I reworked structure names, members and function names, reworded
comments, addressed review comments in the v24 patches. Please have a
look.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
--00000000000008282a0625857620--