Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nQ942-0000DF-Hy for pgsql-hackers@arkaria.postgresql.org; Fri, 04 Mar 2022 14:37:50 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1nQ941-0002u6-Fc for pgsql-hackers@arkaria.postgresql.org; Fri, 04 Mar 2022 14:37:49 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nQ941-0002tx-4J for pgsql-hackers@lists.postgresql.org; Fri, 04 Mar 2022 14:37:49 +0000 Received: from mail-vs1-xe33.google.com ([2607:f8b0:4864:20::e33]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1nQ93z-0001EA-3A for pgsql-hackers@postgresql.org; Fri, 04 Mar 2022 14:37:48 +0000 Received: by mail-vs1-xe33.google.com with SMTP id q9so9267965vsg.2 for ; Fri, 04 Mar 2022 06:37:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=fuSBDBwrmyUvgpLYBRsa5U7j+QDKW+56AmgIvdHIoBs=; b=iLP7iF/LGbehLV+OEb4wP3zd/NlsQABYgdvJMrYodOLHoPHbJKtDpfUzKbkMNuhtBI SN7Pmv+9Yh0euBQ8JtRrUSrjTzG8jIwK70lQSghuKxZHwRdpCE+gFSpY9/EQvMByCg2y 9gcB27z9G5Xk4/5bVORoIkbzi8aKKK7oq8Z2RgDizkAozBRUXM5ey5Dkqq8WO+IiRllM xcemhT3Tf8qwGSCMTvsnaqNortxVvvDfoL+ftH15Ej8upVfdcPG1uaml/5uWX+s4FUnA i8F3Eb+PANH9cD0k5q/KVpNabTHvjLQahAGRncDLmJs0q5ps/8AmLw11ZBi9mGaBwbrw t7oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=fuSBDBwrmyUvgpLYBRsa5U7j+QDKW+56AmgIvdHIoBs=; b=5OowmfboZlpPwIrChmTieAgdIyNk/c6hfAcuOwSzPByPPvOPSCOdlWCmbVLXLw5hF7 11BW/370TjZa6/wF+4Evfh01NhRE8Dw87wUd+x1Dg7FLkxpOSyMAhQUGVV5WeHQtpEpL 5ExphBNcx9TGQQlCUgbmnM67PaSD0AcOXQ4B/fbooYhQsZffqJK87m1zDxa/FDfPYbK0 8Kc0zcYq+iCZ1lecEdO7hMtewT2k+ZGqJHv6r/HyFhkO8vPEUayTxo3WkJM3YBxtVr5A bRrXKaOjiDLfifPtbDgVZdUPxH9ClcmBP9c9NZ0TDdbQv6uhm7LiqBusYL2BHWTzQDx0 rjvA== X-Gm-Message-State: AOAM5306OJPoIX9I9LyyUeF0jOPEwjuWv5zrrE9JsYdjFXs8onrCIo5Q BH70GECP2LAyLhr0AEbiOWf05RFop9BshNhepwUB7feY0zaeWw== X-Google-Smtp-Source: ABdhPJzaUDs383AXuR88EPzPEUNqGfvm3NL4t2ZL1bmPsA6MtJu2QQZFDCO/pZVwJTxOBuTH4FrVI4fyTGFwhpCSx8g= X-Received: by 2002:a05:6102:4428:b0:31e:7166:a322 with SMTP id df40-20020a056102442800b0031e7166a322mr16006479vsb.2.1646404665015; Fri, 04 Mar 2022 06:37:45 -0800 (PST) MIME-Version: 1.0 References: <20201217050522.GU30237@telsasoft.com> <20201217204442.GX30237@telsasoft.com> <20201218175439.GA30237@telsasoft.com> <20201221074725.GF30237@telsasoft.com> <20201225023958.GW30237@telsasoft.com> <96eaa813-4ad6-e80a-04a4-cc8082d356dd@swarm64.com> <508af801-6356-d36b-1867-011ac6df8f55@swarm64.com> In-Reply-To: From: Matthias van de Meent Date: Fri, 4 Mar 2022 15:37:32 +0100 Message-ID: Subject: Re: New Table Access Methods for Multi and Single Inserts To: Bharath Rupireddy Cc: Dilip Kumar , Luc Vlaming , Justin Pryzby , PostgreSQL-development , Andres Freund , Paul Guo , Jeff Davis Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk On Mon, 19 Apr 2021 at 06:52, Bharath Rupireddy wrote: > > On Mon, Apr 5, 2021 at 9:49 AM Bharath Rupireddy > wrote: > > > > On Wed, Mar 10, 2021 at 10:21 AM Bharath Rupireddy > > wrote: > > > Attaching the v4 patch set. Please review it further. > > > > Attaching v5 patch set after rebasing onto the latest master. > > Another rebase due to conflicts in 0003. Attaching v6 for review. I recently touched the topic of multi_insert, and I remembered this patch. I had to dig a bit to find it, but as it's still open I've added some comments: > diff --git a/src/include/access/heapam.h b/src/include/access/heapam.h > +#define MAX_BUFFERED_TUPLES 1000 > +#define MAX_BUFFERED_BYTES 65535 It looks like these values were copied over from copyfrom.c, but are now expressed in the context of the heapam. As these values are now heap-specific (as opposed to the TableAM-independent COPY infrastructure), shouldn't we instead optimize for heap page insertions? That is, I suggest using a multiple (1 or more) of MaxHeapTuplesPerPage for _TUPLES, and that same / a similar multiple of BLCKSZ for MAX_BUFFERED_BYTES. > TableInsertState->flushed > TableInsertState->mi_slots I don't quite like the current storage-and-feedback mechanism for flushed tuples. The current assumptions in this mechanism seem to be that 1.) access methods always want to flush all available tuples at once, 2.) access methods want to maintain the TupleTableSlots for all inserted tuples that have not yet had all triggers handled, and 3.) we need access to the not-yet-flushed TupleTableSlots. I think that that is not a correct set of assumptions; I think that only flushed tuples need to be visible to the tableam-agnostic code; and that tableams should be allowed to choose which tuples to flush at which point; as long as they're all flushed after a final multi_insert_flush. Examples: A heap-based access method might want bin-pack tuples and write out full pages only; and thus keep some tuples in the buffers as they didn't fill a page; thus having flushed only a subset of the current buffered tuples. A columnstore-based access method might not want to maintain the TupleTableSlots of original tuples, but instead use temporary columnar storage: TupleTableSlots are quite large when working with huge amounts of tuples; and keeping lots of tuple data in memory is expensive. As such, I think that this should be replaced with a TableInsertState->mi_flushed_slots + TableInsertState->mi_flushed_len, managed by the tableAM, in which only the flushed tuples are visible to the AM-agnostic code. This is not much different from how the implementation currently works; except that ->mi_slots now does not expose unflushed tuples; and that ->flushed is replaced by an integer value of number of flushed tuples. A further improvement (in my opinion) would be the change from a single multi_insert_flush() to a signalling-based multi_insert_flush: It is not unreasonable for e.g. a columnstore to buffer tens of thousands of inserts; but doing so in TupleTableSlots would introduce a high memory usage. Allowing for batched retrieval of flushed tuples would help in memory usage; which is why multiple calls to multi_insert_flush() could be useful. To handle this gracefully, we'd probably add TIS->mi_flush_remaining, where each insert adds one to mi_flush_remaining; and each time mi_flushed_slots has been handled mi_flush_remaining is decreased by mi_flushed_len by the handler code. Once we're done inserting into the table, we keep calling multi_insert_flush until no more tuples are being flushed (and error out if we're still waiting for flushes but no new flushed tuples are returned). - Matthias