Hi,
On Sun, Apr 6, 2025 at 8:55 PM Jingtang Zhang <mrdrivingduck@gmail.com> wrote:
>
> It was quite a while since I last looked at the patch. I've tested it again,
> and still get regression on patched version where a table has many columns.
> And it is totally CPU-bounded on tts_virtual_copyslot.
>
> Unpatched version:
> 1 col:
> Time: 8909.714 ms (00:08.910)
> Time: 8803.579 ms (00:08.804)
> Time: 8600.415 ms (00:08.600)
> 32 cols:
> Time: 12911.699 ms (00:12.912)
> Time: 13543.491 ms (00:13.543)
> Time: 13325.368 ms (00:13.325)
>
> Patched version:
> 1 col:
> Time: 3532.841 ms (00:03.533)
> Time: 3598.223 ms (00:03.598)
> Time: 3515.858 ms (00:03.516)
> 32 cols:
> Time: 35647.724 ms (00:35.648)
> Time: 35596.233 ms (00:35.596)
> Time: 35669.106 ms (00:35.669)
>
Hm, maybe I didn't choose the best way to measure performance. Can you
please share how you do it?
> I've tested your patch with tuplestore and found the regression does not exist
> anymore, but I haven't look deep inside it.
>
> Patched version (with tuplestore):
> 1 col:
> Time: 3500.502 ms (00:03.501)
> Time: 3486.886 ms (00:03.487)
> Time: 3514.233 ms (00:03.514)
> 32 cols:
> Time: 10375.391 ms (00:10.375)
> Time: 10248.256 ms (00:10.248)
> Time: 10248.289 ms (00:10.248)
>
> It seems to be a good idea if there is no other issue with your patch.
As far as I understand, the use of multi inserts for queries like
"INSERT INTO ... SELECT FROM" is not discussed here anymore due to the
fact that in such cases we will have to take into account the volatile
functions and ROW triggers.
I've been thinking about this for a while and made a patch as an
experiment. The principles that the patch works on are listed below.
1)
Since performance decreases for single INSERTs (within a multi inserts
mechanism), I designed this feature as an option for the table. Thus,
if the user knows that he will perform a lot of inserts on the table,
he can specify "WITH (append_optimized=true)".
2)
The availability of volatile functions is monitored during the
construction of a subtree for a ModifyTable node. I'm not that
familiar with the query plan construction mechanism, but it seems to
me that this way we can track any occurrence of volatile functions.
Of course, most volatile functions don't stop us from using multi
inserts, but checking each such function would take a very long time,
so the very fact of having a volatile function is enough for us to
abandon multi-inserts.
3)
Default expressions of the target table are also checked for volatile
functions. The same rules apply to them as in (2). As an exception, I
allowed the use of SERIAL in the column data type, since this is a
fairly common use case.
4)
If the target table contains any ROW triggers, we don't use multi insert.
5)
Patch also contains a regression test. This is a "sandbox" where you
can do some experiments with append-optimized tables.
I hope that patch (targeted on 'master' branch,
2c7bd2ba507e273f2d7fe1b2f6d30775ed4f3c09) will be useful for this
thread.
--
Best regards,
Daniil Davydov
Hi all,
I picked this work up again and implemented the full 5-patch series.
The series is structured as follows:
I also reran performance testing locally on my machine:
I compared the unpatched baseline against the current patched series for CTAS, CMV, RMV, and INSERT INTO ... SELECT.
Workload | 10K Before | 10K After | 10K Improv | 100K Before | 100K After | 100K Improv | 1M Before | 1M After | 1M Improv |
|---|---|---|---|---|---|---|---|---|---|
CTAS | 1.60 | 1.17 | +26.9% | 9.19 | 5.70 | +38.0% | 105.61 | 73.28 | +30.6% |
CMV | 2.11 | 1.77 | +16.1% | 10.28 | 6.20 | +39.7% | 110.10 | 79.64 | +27.7% |
RMV | 1.62 | 1.19 | +26.5% | 9.91 | 5.43 | +45.2% | 106.04 | 69.57 | +34.4% |
INSERT … SELECT | 1.72 | 0.89 | +48.3% | 15.39 | 7.46 | +51.5% | 228.24 | 84.66 | +62.9% |
Table 2 — Average (ms)
Workload | 10K Before | 10K After | 10K Improv | 100K Before | 100K After | 100K Improv | 1M Before | 1M After | 1M Improv |
|---|---|---|---|---|---|---|---|---|---|
CTAS | 1.65 | 1.26 | +24.1% | 9.37 | 5.82 | +37.9% | 104.81 | 74.31 | +29.1% |
CMV | 2.34 | 1.82 | +22.2% | 10.32 | 6.25 | +39.4% | 110.32 | 80.50 | +27.0% |
RMV | 1.92 | 1.21 | +36.9% | 9.86 | 5.49 | +44.3% | 106.79 | 69.53 | +34.9% |
INSERT … SELECT | 1.69 | 0.90 | +46.8% | 15.45 | 7.39 | +52.2% | 210.62 | 85.19 | +59.6% |
These numbers look ok to me to continue the discussion with the current design and implementation.
One point worth calling out explicitly: the INSERT INTO ... SELECT support in patch 5 is intentionally limited, as described above. It is not intended to claim broad executor coverage yet.
One further improvement still seems possible: making the heap implementation cache raw HeapTuple bytes directly instead of maintaining buffered slot arrays. I looked at that direction, but did not include it in this series because it felt like a larger scope change than what I wanted for v1.
At this point, I think the current series is in reasonable shape and I’d really appreciate review on both the API shape and the caller adoptions.
Thanks,
Haibo