Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kzEeX-0006YX-CJ for pgsql-hackers@arkaria.postgresql.org; Tue, 12 Jan 2021 08:03:46 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.92) (envelope-from ) id 1kzEeV-0000iA-AU for pgsql-hackers@arkaria.postgresql.org; Tue, 12 Jan 2021 08:03:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kzEeU-0000i3-SS for pgsql-hackers@lists.postgresql.org; Tue, 12 Jan 2021 08:03:43 +0000 Received: from mail-le1ger01on0051.outbound.protection.outlook.de ([51.5.72.51] helo=GER01-LEJ-obe.outbound.protection.outlook.de) by magus.postgresql.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1kzEeQ-0001AR-C8 for pgsql-hackers@postgresql.org; Tue, 12 Jan 2021 08:03:42 +0000 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Jglu5zUg9Lca2aBtWdgZV2y0/SUaCH6MDeCdsDyGPs6/fS/6UqNMz6lRyf2Fqvl/amg83mzzQcYwwUSMhLIqUHmmZ+Mv8235tGpD34jggRHGQxmGFubhMJ+V1xXtn2rAK7u1GwfEFN4dhNBUqR3Iq/tbJbTcj8I4aRYTNzMJCiGsuzKGqN/rnhZlIMKQwv4IyZMQMTvxNpoG+/GjfY6JG3Z7RW4HM02pmFUwLK5eHK5MTX40KV1wQENpD/hxlaIQuslUvG+nuf2QtRn5WtAXTK/8gzee0EgJi+hdB9PNi/KEEroo4Wu4JJH9oMthfP+S6KnfHFtLLE1kBzPhoA7e6Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YFOlaIFKwbqvhfCpXUeiVFMAVn20AwWmdEIe3H9v3Jw=; b=P/M2oa2h9e2N3XaNsH4denMULKFtcQbuIp3fvEic5H/4Xqo0TC+VAz6IjIfRr4s4hHee7FtulRZjMgGWJfLF3j2bhsULZmqu6II5QtiMSqm/6KkziYp/FalWf4SWqQHidbpZJCLDXAasYpJPw9Vhzxm/YP9QcWcY2ghsea57x4hVG8aTIOtY2Uu8ufO7JivfwegymO7ndKcAeqZfKHEWTcMzZCgjRLEpe9PHFOUyvT3yTJXBW3FiWOBGN2jGYP7h4PGK7pdSa8sbkNSI+YhHok9GfM8p4wqmkiHZUtVzNvy3kbioLHQNaa+Dm01JjGC4hqvhKvsieXfcmgRCb734pw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=swarm64.com; dmarc=pass action=none header.from=swarm64.com; dkim=pass header.d=swarm64.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=swarm64hive.onmicrosoft.de; s=selector1-swarm64hive-onmicrosoft-de; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=YFOlaIFKwbqvhfCpXUeiVFMAVn20AwWmdEIe3H9v3Jw=; b=HqT6krN8UBOvet7AGSyVfYwW3FNL6X0P01xLyeRmh6rBLQ4N/ASQqwwjjUNqyZrKEN4DcOYL+ipNLQQTEc1v+Xr2G9xSGftZRc5nd0Yt+NteIyo/fGlCKT2S8i+7MjAENiOBowb0vgvkP/dlLf8DoDZAOzD8ROmRkWyAF6vweS8= Authentication-Results: j-davis.com; dkim=none (message not signed) header.d=none;j-davis.com; dmarc=none action=none header.from=swarm64.com; Received: from FRXPR01MB0456.DEUPRD01.PROD.OUTLOOK.DE (2a01:4180:c010:22::14) by FRXPR01MB0872.DEUPRD01.PROD.OUTLOOK.DE (2a01:4180:c010:1f::13) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.7; Tue, 12 Jan 2021 08:03:36 +0000 Received: from FRXPR01MB0456.DEUPRD01.PROD.OUTLOOK.DE ([fe80::1031:13bd:9e32:7655]) by FRXPR01MB0456.DEUPRD01.PROD.OUTLOOK.DE ([fe80::1031:13bd:9e32:7655%4]) with mapi id 15.20.3763.009; Tue, 12 Jan 2021 08:03:36 +0000 Subject: Re: New Table Access Methods for Multi and Single Inserts To: Bharath Rupireddy Cc: Justin Pryzby , PostgreSQL-development , Andres Freund , Paul Guo , Jeff Davis References: <20201217050522.GU30237@telsasoft.com> <20201217204442.GX30237@telsasoft.com> <20201218175439.GA30237@telsasoft.com> <20201221074725.GF30237@telsasoft.com> <20201225023958.GW30237@telsasoft.com> <96eaa813-4ad6-e80a-04a4-cc8082d356dd@swarm64.com> <508af801-6356-d36b-1867-011ac6df8f55@swarm64.com> From: Luc Vlaming Message-ID: Date: Tue, 12 Jan 2021 09:03:33 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-Originating-IP: [213.61.129.43] X-ClientProxiedBy: LEJPR01CA0051.DEUPRD01.PROD.OUTLOOK.DE (2a01:4180:c012:1::19) To FRXPR01MB0456.DEUPRD01.PROD.OUTLOOK.DE (2a01:4180:c010:22::14) MIME-Version: 1.0 X-MS-Exchange-MessageSentRepresentingType: 1 Received: from [10.1.0.5] (213.61.129.43) by LEJPR01CA0051.DEUPRD01.PROD.OUTLOOK.DE (2a01:4180:c012:1::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3763.2 via Frontend Transport; Tue, 12 Jan 2021 08:03:35 +0000 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: d5eb569a-9ed4-42fe-bda1-08d8b6d09029 X-MS-TrafficTypeDiagnostic: FRXPR01MB0872: X-Microsoft-Antispam-PRVS: X-MS-Oob-TLC-OOBClassifiers: OLM:10000; X-MS-Exchange-SenderADCheck: 1 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: Gw5xrDzNH1hYU35IM1DO2y9sWqUvuK+Fu0iE4Mz9SM/vQ6h2kMBfbKqNWVu4DFCXujjzPU9VLqciNcV5vxHWcPqeR7CDiSwbEG9e0R53FWl4bTDYd/hDSdNKj1eMFzb5OEiav4NUEv0ro6g5lggAqjSimUCUEJIEHYn7ofFZHh/DCzRn8jsPSpc6JtZbCJWyrX/X3TGdFUJqE7QEtPqLjDqQXb2JcOyIKqMlgiWSoHGpvpWOVQ54arJ8O8HVKPqgmXSy3mE9QF1wUI6TyegGXtmWXWusQgKDqE27EcSlbdT8aQr+vfttr+E5goAus+ygH9rYkpzN8O5O3V8YTjZp0mHL1lQzwtP9BhI3v8+SfCV7aO+PZ7ikSZw/rZkQKmNld0kl6LR+6Mb/Z7AucXkbVoIXGVArEo/glmY6yyPR3oRj7zQOCFz/9RSshsPYR4hJQJeEeud6Ev7f4+JDr919Va8gI04e6tw35kRQR3BdMfw= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:FRXPR01MB0456.DEUPRD01.PROD.OUTLOOK.DE;PTR:;CAT:NONE;SFS:(366004)(136003)(376002)(39830400003)(346002)(396003)(54906003)(16576012)(5660300002)(83380400001)(956004)(8936002)(2616005)(66556008)(8676002)(86362001)(966005)(52116002)(55236004)(26005)(36756003)(16526019)(31686004)(186003)(53546011)(316002)(2906002)(31696002)(478600001)(66946007)(6916009)(4326008)(66476007)(43740500002);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData: =?utf-8?B?Q2hnekNHSFFheHJxdGk2OXJRbmRPWldGd091ZVZmM3pIUjZ3ck1YMksydWR4?= =?utf-8?B?TUtiZTNtK1FHWExuak5vTDAzZFJYYmthdGpvOGNBUFdlZ1RPSmhRSHFsOGVP?= =?utf-8?B?cnVHQmtRZ0NSaTdSaXlxTy9Vb0tMN0Z3UXU5bS9hM3lHMEgxTFhmdWhRYkdW?= =?utf-8?B?TXA0VW12M2dGMjUydDY0aEJwMG1TQjR6NXREajd2OXVRcmRvUUF3VnkrN3Bk?= =?utf-8?B?WjFQWFdUbm5IVjVqRFVFU2Nrb3o2Q1dXTUtKMVZtTitIQWQ5VVhrcFhHZkk2?= =?utf-8?B?anVLZVhYRDAyQ0dxK0RoaHh1VE5aRXd0aU1wZHhLbXkrRWltTHpreEdiUmFv?= =?utf-8?B?bEV2d2dUQ0Vtb21WK1J2azYwdEZway81QVd5d0JmMUV3OGNkOG12NnlHSmt6?= =?utf-8?B?T0Y5bnRHdEZ2Y2VOSG9FQU1PalB0cEI4MUF4dXE5SE9DMXl4aURBVXdtV3JQ?= =?utf-8?B?VzVxd1FvWHB3dGR0OXZCTitrZWRpcUdoN0JjcEpmNWRCT2o4ZEVCS1V3d3FC?= =?utf-8?B?NTExQzVNNURmME5SaGlwbXl6ZGRxYnE2TUFURmxlSnJOVjhoaFphWWI5QUMv?= =?utf-8?B?azRsYndPNWlINlRMeTFsUXA2anJlV2ZTK0hrRDJvMXRMTy9kU3o4d1FQWjFk?= =?utf-8?B?WC8yaXdRdUpHdDFOMmI0ZlNYai8wK1NEa1g5Qm1RdEh1amxRZG96cTFlQW41?= =?utf-8?B?eVJnVytHdDhoM1JkRFpiZFlOcjdVdTEzcHdRK2dNOXhHZ1M4S20weDlUVDdp?= =?utf-8?B?SUh6dHk3czJ5ZGs4UnVURmI0QmY1bDgxOFFEWUwxTDdmWVJsSUhnWWxsUTZs?= =?utf-8?B?eHJReVB4bkg5RUFicE9vMWM5RklvcW5sMUkrc0p6UVRxOWtUckdPenRGU3Zr?= =?utf-8?B?OVlMTzRRQkxFZFN5bEhSMmExdnN6VGw2MUYzRUwzeDVCUmxBSXg0RnYrUFVX?= =?utf-8?B?TExwL2RrZ3NRbm5qOENQTmYvM2l0SXZZdDhuQk1SUmFNbm9oRmVTYSt1MnBG?= =?utf-8?B?L2pORnVkb2pIK0JLMWEzOHluZXBpM0dxQjh4d2hZaVFOMEllcGdhNDFWRzJx?= =?utf-8?B?b3NxV3NobUR4cnRQNVZYTHhYbHB4NCtaanhxbzFFNUxBdmorRk1hckI3eTRT?= =?utf-8?B?OUMxdDY0ZFBJZzlPQ0NXSVorWmhTM1VseXFRUnpsNENmOVd2WEtOUXZPdmdr?= =?utf-8?B?UmJneDB1bThMay9wNGtnQTQxWi9HSjE3bXJIeXYxU2RERFV3Y1dpcmhlUWdn?= =?utf-8?B?a3BGa1FkanYwMVo5enU1enlrd3VidXljSTg5anZDcjkvQWhZNFJaKzdadkxE?= =?utf-8?Q?hxF1lIH8SkWwfhlR6xeurD/fDYUE2UCPdW?= X-OriginatorOrg: swarm64.com X-MS-Exchange-CrossTenant-AuthSource: FRXPR01MB0456.DEUPRD01.PROD.OUTLOOK.DE X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 12 Jan 2021 08:03:36.0232 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 6a3a733d-0e05-4774-8f9e-4fc6a12face6 X-MS-Exchange-CrossTenant-Network-Message-Id: d5eb569a-9ed4-42fe-bda1-08d8b6d09029 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: A7j1LlBE/hW9LC0TI6frp4foHHuRu80lbw/qeETruEW5IN5stbv1k1jLugrF1LyK X-MS-Exchange-Transport-CrossTenantHeadersStamped: FRXPR01MB0872 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Precedence: bulk On 06-01-2021 14:06, Bharath Rupireddy wrote: > On Wed, Jan 6, 2021 at 12:56 PM Luc Vlaming wrote: >> The main reason for me for wanting a single API is that I would like the >> decision of using single or multi inserts to move to inside the tableam. >> For e.g. a heap insert we might want to put the threshold at e.g. 100 >> rows so that the overhead of buffering the tuples is actually >> compensated. For other tableam this logic might also be quite different, >> and I think therefore that it shouldn't be e.g. COPY or CTAS deciding >> whether or not multi inserts should be used. Because otherwise the thing >> we'll get is that there will be tableams that will ignore this flag and >> do their own thing anyway. I'd rather have an API that gives all >> necessary information to the tableam and then make the tableam do "the >> right thing". >> >> Another reason I'm suggesting this API is that I would expect that the >> begin is called in a different place in the code for the (multiple) >> inserts than the actual insert statement. >> To me conceptually the begin and end are like e.g. the executor begin >> and end: you prepare the inserts with the knowledge you have at that >> point. I assumed (wrongly?) that during the start of the statement one >> knows best how many rows are coming; and then the actual insertion of >> the row doesn't have to deal anymore with multi/single inserts, choosing >> when to buffer or not, because that information has already been given >> during the initial phase. One of the reasons this is appealing to me is >> that e.g. in [1] there was discussion on when to switch to a multi >> insert state, and imo this should be up to the tableam. > > Agree that whether to go with the multi or single inserts should be > completely left to tableam implementation, we, as callers of those API > just need to inform whether we expect single or multiple rows, and it > should be left to tableam implementation whether to actually go with > buffering or single inserts. ISTM that it's an elegant way of making > the API generic and abstracting everything from the callers. What I > wonder is how can we know in advance the expected row count that we > need to pass in to heap_insert_begin()? IIUC, we can not estimate the > upcoming rows in COPY, Insert Into Select, or Refresh Mat View or some > other insert queries? Of course, we can look at the planner's > estimated row count for the selects in COPY, Insert Into Select or > Refresh Mat View after the planning, but to me that's not something we > can depend on and pass in the row count to the insert APIs. > > When we don't know the expected row count, why can't we(as callers of > the APIs) tell the APIs something like, "I'm intending to perform > multi inserts, so if possible and if you have a mechanism to buffer > the slots, do it, otherwise insert the tuples one by one, or else do > whatever you want to do with the tuples I give it you". So, in case of > COPY we can ask the API for multi inserts and call heap_insert_begin() > and heap_insert_v2(). > I thought that when it is available (because of planning) it would be nice to pass it in. If you don't know you could pass in a 1 for doing single inserts, and e.g. -1 or max-int for streaming. The reason I proposed it is so that tableam's have as much knowledge as posisble to do the right thing. is_multi does also work of course but is just somewhat less informative. What to me seemed somewhat counterintuitive is that with the proposed API it is possible to say is_multi=true and then still call heap_insert_v2 to do a single insert. > Given the above explanation, I still feel bool is_multi would suffice. > > Thoughts? > > On dynamically, switching from single to multi inserts, this can be > done by heap_insert_v2 itself. The way I think it's possible is that, > say we have some threshold row count 1000(can be a macro) after > inserting those many tuples, heap_insert_v2 can switch to buffering > mode. For that I thought it'd be good to use the expected row count, but yeah dynamically switching also works and might work better if the expected row counts are usually off. > > Thoughts? > >> Which would make the code something like: >> >> void >> heap_multi_insert_v2(TableInsertState *state, TupleTableSlot *slot) >> { >> TupleTableSlot *batchslot; >> HeapMultiInsertState *mistate = (HeapMultiInsertState *)state->mistate; >> Size sz; >> >> Assert(mistate && mistate->slots); >> >> if (mistate->slots[mistate->cur_slots] == NULL) >> mistate->slots[mistate->cur_slots] = >> table_slot_create(state->rel, NULL); >> >> batchslot = mistate->slots[mistate->cur_slots]; >> >> ExecClearTuple(batchslot); >> ExecCopySlot(batchslot, slot); >> >> /* >> * Calculate the tuple size after the original slot is copied, because the >> * copied slot type and the tuple size may change. >> */ >> sz = GetTupleSize(batchslot, mistate->max_size); >> >> Assert(sz > 0); >> >> mistate->cur_slots++; >> mistate->cur_size += sz; >> >> if (mistate->cur_slots >= mistate->max_slots || >> mistate->cur_size >= mistate->max_size) >> heap_multi_insert_flush(state); >> } > > I think clearing tuples before copying the slot as you suggested may > work without the need of clear_slots flag. ok, cool :) > >> >>> > Also, why do we want to do ExecClearTuple() anyway? Isn't >>> > it good enough that the next call to ExecCopySlot will effectively clear >>> > it out? >>> >>> For virtual, heap, minimal tuple slots, yes ExecCopySlot slot clears the >>> slot before copying. But, for buffer heap slots, the >>> tts_buffer_heap_copyslot does not always clear the destination slot, see >>> below. If we fall into else condition, we might get some issues. And >>> also note that, once the slot is cleared in ExecClearTuple, it will not >>> be cleared again in ExecCopySlot because TTS_SHOULDFREE(slot) will be >>> false. That is why, let's have ExecClearTuple as is. >>> >> I had no idea the buffer heap slot doesn't unconditionally clear out the >> slot :( So yes lets call it unconditionally ourselves. See also >> suggestion above. > > Yeah, we will clear the tuple slot before copy to be on the safer side. > ok >>> /* >>> * If the source slot is of a different kind, or is a buffer slot >>> that has >>> * been materialized / is virtual, make a new copy of the tuple. >>> Otherwise >>> * make a new reference to the in-buffer tuple. >>> */ >>> if (dstslot->tts_ops != srcslot->tts_ops || >>> TTS_SHOULDFREE(srcslot) || >>> !bsrcslot->base.tuple) >>> { >>> MemoryContext oldContext; >>> >>> ExecClearTuple(dstslot); >>> } >>> else >>> { >>> Assert(BufferIsValid(bsrcslot->buffer)); >>> >>> tts_buffer_heap_store_tuple(dstslot, bsrcslot->base.tuple, >>> bsrcslot->buffer, false); >>> >>> > - flushed -> why is this a stored boolean? isn't this indirectly encoded >>> > by cur_slots/cur_size == 0? >>> >>> Note that cur_slots is in HeapMultiInsertState and outside of the new >>> APIs i.e. in TableInsertState, mistate is a void pointer, and we can't >>> really access the cur_slots. I mean, we can access but we need to be >>> dereferencing using the tableam kind. Instead of doing all of that, to >>> keep the API cleaner, I chose to have a boolean in the TableInsertState >>> which we can see and use outside of the new APIs. Hope that's fine. >>> >> So you mean the flushed variable is actually there to tell the user of >> the API that they are supposed to call flush before end? Why can't the >> end call flush itself then? I guess I completely misunderstood the >> purpose of table_multi_insert_flush being public. I had assumed it is >> there to from the usage site indicate that now would be a good time to >> flush, e.g. because of a statement ending or something. I had not >> understood this is a requirement that its always required to do >> table_multi_insert_flush + table_insert_end. >> IMHO I would hide this from the callee, given that you would only really >> call flush yourself when you immediately after would call end, or are >> there other cases where one would be required to explicitly call flush? > > We need to know outside the multi_insert API whether the buffered > slots in case of multi inserts are flushed. Reason is that if we have > indexes or after row triggers, currently we call ExecInsertIndexTuples > or ExecARInsertTriggers on the buffered slots outside the API in a > loop after the flush. > > If we agree on removing heap_multi_insert_v2 API and embed that logic > inside heap_insert_v2, then we can do this - pass the required > information and the functions ExecInsertIndexTuples and > ExecARInsertTriggers as callbacks so that, whether or not > heap_insert_v2 choses single or multi inserts, it can callback these > functions with the required information passed after the flush. We can > add the callback and required information into TableInsertState. But, > I'm not quite sure, we would make ExecInsertIndexTuples and > ExecARInsertTriggers. And in > > If we don't want to go with callback way, then at least we need to > know whether or not heap_insert_v2 has chosen multi inserts, if yes, > the buffered slots array, and the number of current buffered slots, > whether they are flushed or not in the TableInsertState. Then, > eventually, we might need all the HeapMultiInsertState info in the > TableInsertState. > To me the callback API seems cleaner, that on heap_insert_begin we can pass in a callback that is called on every flushed slot, or only on multi-insert flushes. Is there a reason it would only be done for multi-insert flushes or can it be generic? > Thoughts? > > With Regards, > Bharath Rupireddy. > EnterpriseDB: http://www.enterprisedb.com > Hi, Replied inline. Kind regards, Luc