Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sih6h-0086fT-9D for pgsql-hackers@arkaria.postgresql.org; Mon, 26 Aug 2024 21:18:35 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sih6e-00FGta-PY for pgsql-hackers@arkaria.postgresql.org; Mon, 26 Aug 2024 21:18:33 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sih6e-00FGtR-8L for pgsql-hackers@lists.postgresql.org; Mon, 26 Aug 2024 21:18:33 +0000 Received: from mail-pl1-x62b.google.com ([2607:f8b0:4864:20::62b]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sih6X-001eEU-E8 for pgsql-hackers@postgresql.org; Mon, 26 Aug 2024 21:18:32 +0000 Received: by mail-pl1-x62b.google.com with SMTP id d9443c01a7336-202318c4f45so48733295ad.0 for ; Mon, 26 Aug 2024 14:18:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=j-davis-com.20230601.gappssmtp.com; s=20230601; t=1724707104; x=1725311904; darn=postgresql.org; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:from:to:cc:subject :date:message-id:reply-to; bh=CUkmOVNYqGrn0L/fuDPx3VqntaeeRLQzJjvw/uoOTA8=; b=jeWC6F203AUtPXbvGA/z/uwhv1o8FkUIEJk6NIsF/3/vxrF5q43xu4gpKBWnG55UZ6 2APsmDk08bQnJI3OszvrPB6vRztsLH2hbRt64oS5CcM5tYGXDNxKhRnMkJNGA5zsJ5/o ORKISsm+YhiWMLgkbszaKTJihc++uYy0+Z7mdPtM17JXHeSHzAvwAx8qr1A07xmT+jtF /k0S9hNl+W4WzONpyIGHKehegneZKhSAuidcz7Hwx/6gD+dm7DU3L591XMG1o4L3Mp4v vrc5ZjuMOHP6RNdE99c/JeZzFWYc/y4/DT/QXMH+x81RZy3sac6MVQR7Qr5AKeCLM1d1 4PzQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724707104; x=1725311904; h=mime-version:user-agent:content-transfer-encoding:references :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=CUkmOVNYqGrn0L/fuDPx3VqntaeeRLQzJjvw/uoOTA8=; b=DIYvticWxN6aC7szGDMjkm749yPnQCjXyR1TKDV+Jd/xM+6NxkfuYKyEwbcalg7hZ0 oMsxlLnoL2jlIZwXyLiIB7f5xkRYmK2ZZu6TdfynS1389SwMItTM+/F1f3Aky8oaxoPr FC9us0cLxLMKGCsU9E16jLEOA+1Xt5i0UBAbZDkRfv4z3rNlOnqHQeTqffurxZxCaxzo 06iM4qXyTGV4dF6fNbeSdjEmkRQGP526aIYrTDze8LSPovnXCNUXHFG6nGv5ChJ9daOX 3d6H3LxA8mAcgW76Dz+Ab6WYvOT/Rj/3a7v1l4KLl+Q8j8uoRCyBxiaJ8We/1giIyRw8 Nj3Q== X-Forwarded-Encrypted: i=1; AJvYcCUB0pSZBo0bes097gjfpEMkrG3aHRIB6pIO7aN8x3G+GLerTONJFZDSxLifrI2MAEJxAmJjlAYm/fcyxljj@postgresql.org X-Gm-Message-State: AOJu0YwFbEoV5kkDF/psQ35UsXcDQY/OfAQfgS1zZ35sboGt2rbyK9S2 RFliubIlVIBVtUluOL6iv1+L+WwO8/ZeSEp7CscPhBNeC9azM6EbqFrCYj9HijzEPwkmjnyi/18 = X-Google-Smtp-Source: AGHT+IFotKpHSjiGr554nc7hxQ641ebcGvDbHDwO4FvRQHCCZkV3b6AvH0z7OMkmRgDxkjy5ptnVHg== X-Received: by 2002:a17:903:41cd:b0:1fd:a5a2:5838 with SMTP id d9443c01a7336-204def2db0fmr9140955ad.6.1724707104097; Mon, 26 Aug 2024 14:18:24 -0700 (PDT) Received: from jeff-laptop.lan (c-76-102-242-158.hsd1.ca.comcast.net. [76.102.242.158]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-203855dd985sm72022125ad.164.2024.08.26.14.18.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 26 Aug 2024 14:18:23 -0700 (PDT) Message-ID: Subject: Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM From: Jeff Davis To: Bharath Rupireddy Cc: Masahiko Sawada , PostgreSQL-development , Andres Freund , Dilip Kumar , Luc Vlaming , Justin Pryzby , Michael Paquier , Matthias van de Meent , Alexander Korotkov Date: Mon, 26 Aug 2024 14:18:21 -0700 In-Reply-To: References: <20230603223824.o7iyochli2dwwi7k@alap3.anarazel.de> <6be6f58815dc0844fbe058edf56b4e735a6efc1c.camel@j-davis.com> <2280bf7241119bb88cbe0fe5eb36490cbd04c0c0.camel@j-davis.com> <12f1a2d8dd3b6305c0354f1c701f44b7be5e54eb.camel@j-davis.com> <8633171cb034aafc260fdf37df04b6c779aa1e2f.camel@j-davis.com> <229c4f7219ed164088dadc935df21e1cf125e191.camel@j-davis.com> Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable User-Agent: Evolution 3.44.4-0ubuntu2 MIME-Version: 1.0 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, 2024-08-26 at 11:09 +0530, Bharath Rupireddy wrote: > On Wed, Jun 5, 2024 at 12:42=E2=80=AFPM Bharath Rupireddy > wrote: > >=20 > > Please find the v22 patches with the above changes. >=20 > Please find the v23 patches after rebasing 0005 and adapting 0004 for > 9758174e2e. Thank you. 0001 API design: * Remove TableModifyState.modify_end_callback. * This patch means that we will either remove or deprecate TableAmRoutine.multi_insert and finish_bulk_insert. Are there any strong opinions about maintaining support for multi-insert, or should we just remove it outright and force any new AMs to implement the new APIs to maintain COPY performance? * Why do we need a separate "modify_flags" and "options"? Can't we just combine them into TABLE_MODIFY_* flags? Alexander, you had some work in this area as well, such b1484a3f19. I believe 0001 covers this use case in a different way: rather than giving complete responsibility to the AM to insert into the indexes, the caller provides a callback and the AM is responsible for calling it at the time the tuples are flushed. Is that right? The design has been out for a while, so unless others have suggestions, I'm considering the major design points mostly settled and I will move forward with something like 0001 (pending implementation issues). Note: I believe this API will extend naturally to updates and deletes, as well. 0001 implementation issues: * We need default implementations for AMs that don't implement the new APIs, so that the AM will still function even if it only defines the single-tuple APIs. If we need to make use of the AM's multi_insert method (I'm not sure we do), then the default methods would need to handle that as well. (I thought a previous version had these default implementations -- is there a reason they were removed?) * I am confused about how the heap implementation manages state and resets it. mistate->mem_cxt is initialized to a new memory context in heap_modify_begin, and then re-initialized to another new memory context in heap_modify_buffer_insert. Then the mistate->mem_cxt is also used as a temp context for executing heap_multi_insert, and it gets reset before calling the flush callback, which still needs the slots. * Why materialize the slot at copyfrom.c:1308 if the slot is going to be copied anyway (which also materializes it; see tts_virtual_copyslot()) at heapam.c:2710? * After correcting the memory issues, can you get updated performance numbers for COPY? Regards, Jeff Davis