Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kpr6X-000346-H3 for pgsql-hackers@arkaria.postgresql.org; Thu, 17 Dec 2020 11:05:53 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1kpr6U-0006nE-6r for pgsql-hackers@arkaria.postgresql.org; Thu, 17 Dec 2020 11:05:50 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kpr6T-0006n6-UM for pgsql-hackers@lists.postgresql.org; Thu, 17 Dec 2020 11:05:50 +0000 Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]) by magus.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.92) (envelope-from ) id 1kpr6Q-0004A6-JF for pgsql-hackers@postgresql.org; Thu, 17 Dec 2020 11:05:49 +0000 Received: by mail-pl1-x636.google.com with SMTP id q4so9234981plr.7 for ; Thu, 17 Dec 2020 03:05:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=gzQmrh0xO96QZJTlamy2B/skVn1cXTJ0q7UPQi+R9VQ=; b=gnSSM1JiIwANjKBJicwLdd85TnIyI3V6qorgph3EyGlNlkPe7E+Q1LGInVGGs7oBoo 4hzXH660pA6rh1fz2L7aIk8hFDvm246QOYxV4A1vfOvwcuolXlvrm9Vbe8amcfPRPW7e M88F3txcidUl3ywby2BusUTsYGHtexlzZePdl1EjLq24dH0TWVHqiQKlza9BqRNvbRVi 6T9kWy7CyEQIQt7YD55iQXRI+Tl5fuPIR1UKzDOAHiZMos4jhqnQB3bgpLu6/LMTAwcE t548q0sqEGLWUTQJAO+nCNqdFqyBdhwBnAoarvQ/O+zm79j6snNcaBnBghGSD2Cp1sGH W4VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=gzQmrh0xO96QZJTlamy2B/skVn1cXTJ0q7UPQi+R9VQ=; b=f8dS+0GqXOaMOIg33RrfjpbjhGxsEXsaxpGmq7adfPkC6i91iK5+4qH9RF1tCjX8Jy qtL6A/jIlpL6m/Jf56NpIySAFUsz/jL+BW+SNXDu6Lq9PGwek5JKDiwoZz4f0PA6v0+S QdG0yxnbWDtXzBRQik9ZWyMislf4eU7d/EAFAEv462sRdDGoFBeXSfdggKO46wlkQM5/ LuQCyEuFOoSji/L8m96S+LE2lVRAAozNl121nHrqZnIRJog5tCK2xeu4dymamdFy9I7y JEkHLcwRCL0D5BHE7jAKs5JSEeHzkfccju561VDSA5zf4+m7x4NQMD7Dkq7YnSwDsQFG trAw== X-Gm-Message-State: AOAM532H1LmRmhrnSrVFpDWqkAAw+q+227adS8hqkxL0ORAsh/Sk12og bjRZPU2OAD7PYk8xBfBRHhc05ywpqh1zDruVB5M= X-Google-Smtp-Source: ABdhPJy0JGawisCEGu8N3avgoV/+EMu8El7MGg/JLWFrMGzyY47+gawFKpaVIX0BL3IfXQIcdd3UNnO+5rcXtUXfo54= X-Received: by 2002:a17:902:7292:b029:dc:ac9:25b5 with SMTP id d18-20020a1709027292b02900dc0ac925b5mr10910274pll.2.1608203144326; Thu, 17 Dec 2020 03:05:44 -0800 (PST) MIME-Version: 1.0 References: <20201217050522.GU30237@telsasoft.com> In-Reply-To: <20201217050522.GU30237@telsasoft.com> From: Bharath Rupireddy Date: Thu, 17 Dec 2020 16:35:33 +0530 Message-ID: Subject: Re: New Table Access Methods for Multi and Single Inserts To: Justin Pryzby Cc: PostgreSQL-development , Andres Freund , Luc Vlaming , Paul Guo Content-Type: text/plain; charset="UTF-8" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk Thanks a lot for taking a look at the patches. On Thu, Dec 17, 2020 at 10:35 AM Justin Pryzby wrote: > Typos: > > + * 1) Specify is_multi as true, then multi insert state is allcoated. > => allocated > + * dropped, short-lived memory context is delted and mistate is freed up. > => deleted > + * 2) Currently, GetTupleSize() handles the existing heap, buffer, minmal and > => minimal > + /* Mulit insert state if requested, otherwise NULL. */ > => multi > + * Buffer the input slots and insert the tuples from the buffered slots at a > => *one* at a time ? > + * Compute the size of the tuple only if mi_max_size i.e. the total tuple size > => I guess you mean max_size > > This variable could use a better name: > +CopyMulitInsertFlushBuffers(List **mirri, .. > mirri is fine for a local variable like an element of a struture/array, or a > loop variable, but not for a function parameter which is an "List" of arbitrary > pointers. > > I think this comment needs to be updated (again) for the removal of the Info > structure. > - * CopyMultiInsertBuffer items stored in CopyMultiInsertInfo's > + * multi insert buffer items stored in CopyMultiInsertInfo's > > There's some superfluous whitespace (and other) changes there which make the > patch unnecessarily long. I will correct them and post the next version of the patch set. Before that, I would like to have the discussion and thoughts on the APIs and their usefulness. > I think the COPY patch should be 0002 (or maybe merged into 0001). I can make it as a 0002 patch. > You made the v2 insert interface a requirement for all table AMs. > Should it be optional, and fall back to simple inserts if not implemented ? I tried to implement the APIs mentioned by Andreas here in [1]. I just used v2 table am APIs in existing table_insert places to show that it works. Having said that, if you notice, I moved the bulk insert allocation and deallocation to the new APIs table_insert_begin() and table_insert_end() respectively, which make them tableam specific. Currently, the bulk insert state is outside and independent of tableam. I think we should not make bulk insert state allocation and deallocation tableam specific. Thoughts? [1] - https://www.postgresql.org/message-id/20200924024128.kyk3r5g7dnu3fxxx%40alap3.anarazel.de > For CTAS, I think we need to consider Paul's idea here. > https://www.postgresql.org/message-id/26C14A63-CCE5-4B46-975A-57C1784B3690%40vmware.com IMO, if we were to allow those raw insert APIs to perform parallel inserts, then we would be reimplementing the existing table_insert or table_mulit_insert API by having some sort of shared memory for coordinating among workers and so on, may be in some other way. Yes, we could avoid all the existing locking and shared buffers with those raw insert APIs, I also feel that we can now do that with the existing insert APIs for unlogged tables and bulk insert state. To me, the raw insert APIs after implementing them for the parallel inserts, they would look like the existing insert APIs for unlogged tables and with bulk insert state. Thoughts? Please have a look at [1] for detailed comment. [1] https://www.postgresql.org/message-id/CALj2ACX0u%3DQvB7GHLEqeVYwvs2eQS7%3D-cEuem7ZaF%3Dp%2BqZ0ikA%40mail.gmail.com > Conceivably, tableam should support something like that for arbitrary AMs > ("insert into a new table for which we have exclusive lock"). I think that AM > method should also be optional. It should be possible to implement a minimal > AM without implementing every available optimization, which may not apply to > all AMs, anyway. I could not understand this point well. Maybe more thoughts help me here. With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com