public inbox for [email protected]
help / color / mirror / Atom feedFrom: Adrian Klaver <[email protected]>
To: [email protected] <[email protected]>
To: Merlin Moncure <[email protected]>
To: Laurenz Albe <[email protected]>
Cc: Pgsql-general <[email protected]>
Subject: Re: Is there any limit on the number of rows to import using copy command
Date: Thu, 24 Jul 2025 17:48:53 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
<[email protected]>
<CAHyXU0x1uW_349ONE+35KT87Ua-dX8-QaZ5Sj6eENTMwsX=okw@mail.gmail.com>
<[email protected]>
<[email protected]>
<[email protected]>
On 7/24/25 16:59, [email protected] wrote:
> 1. Testcase. Created a new database, modified the triggers (split into
> three), populated required master data, lookup tables. Then transferred
> 86420 records. Checked whether all the 86420 records inserted in table1
> and also whether the trigger created the required records in table2.
> Yes, it created.
>
> 2. In the test case above, the total time taken to insert 86420 records
> is 1.15 min only. Earlier (before splitting the triggers) we waited
> for more than 1.5 hrs first time and 2.5 hrs second time with no records
> inserted.
>
> 3. Regarding moving the logic to procedure. Won't the trigger work?
> Will it be a burden for 86420 records? It's working, if we insert few
> thousand records. After split of trigger function, it's working for
> 86420 records. Are triggers overhead for handling even 100000 records?
> In production system, the same (single) trigger is working with 3
> millions of records. There might be better alternatives to triggers,
> but triggers should also work. IMHO.
Reread this post, in the thread, from Laurenz Albe:
https://www.postgresql.org/message-id/de08fd016dd9c630f65c52b80292550e0bcdea4c.camel%40cybertec.at
>
> 4. Staging tables. Yes, I have done that in another case, where there
> was a need to add data / transform for few more columns. It worked like
> a charm. In this case, since there was no need for any other
> calculations (transformation), and with just column to column matching,
> I thought copy command will do.
There is a transformation, you are moving data to another table. That is
overhead, especially if the triggers are not optimized.
>
> Before splitting the trigger into three, we tried
> 1. Transferring data using DataWindow / PowerBuilder (that's the tool
> we use to develop our front end). With the same single trigger, it took
> few hours (more than 4 hours, exact time not noted down) to transfer the
> same 86420 records. (Datawindow fires insert statements for every
> row). Works, but the time taken is not acceptable.
INSERTs by row is going to be slow, especially if the tool is doing a
commit for each which I suspect it is. Check the Postgres logs.
>
> 2. Next, we split the larger csv file into 8, with each file containing
> 10,000 records and the last one with 16420 records. Copy command
> worked. Works, but the time taken to split the file not acceptable. We
> wrote a batch file to split the larger csv file. We felt batch file is
> easier to automate the whole process using PowerBuilder.
I find most GUI tools create extra steps and overhead. My preference are
simpler tools e.g. using Python csv module to batch/stream rows that the
Python psycopg2 Postgres driver can insert or copy into the database.
See:
https://www.psycopg.org/psycopg3/docs/basic/copy.html
>
> 3. What we observed here, is insert statement succeeds and copy command
> fails, if the records exceed a certain no. Haven't arrived the exact
> number of rows when the copy command fails.
>
> Will do further works after my return from a holiday.
>
> Happiness Always
> BKR Sivaprakash
>
>
>
> On Thursday 24 July, 2025 at 08:18:07 pm IST, Adrian Klaver
> <[email protected]> wrote:
>
>
> On 7/24/25 05:18, [email protected] <mailto:[email protected]>
> wrote:
> > Thanks Merlin, adrain, Laurenz
> >
> > As a testcase, I split the trigger function into three, one each for
> > insert, update, delete, each called from a separate trigger.
> >
> > IT WORKS!.
>
> It worked before, it just slowed down as your cases got bigger. You need
> to provide more information on what test case you used and how you
> define worked.
>
> >
> > Shouldn't we have one trigger function for all the three trigger
> > events? Is it prohibited for bulk insert like this?
>
> No. Triggers are overhead and they add to the processing that need to be
> done for moving the data into the table. Whether that is an issue is a
> case by case determination.
>
> >
> > I tried this in PGAdmin only, will complete the testing from the program
> > which we are developing, after my return from holiday.
>
> From Merlin Moncure's post:
>
> "* reconfiguring your logic to a procedure can be a better idea; COPY
> your data into some staging tables (perhaps temp, and indexed), then
> write to various tables with joins, upserts, etc."
>
> I would suggest looking into implementing the above.
>
>
> >
> > Happiness Always
> > BKR Sivaprakash
>
> >
>
>
>
> --
> Adrian Klaver
> [email protected] <mailto:[email protected]>
>
--
Adrian Klaver
[email protected]
view thread (5+ messages)
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected], [email protected], [email protected]
Subject: Re: Is there any limit on the number of rows to import using copy command
In-Reply-To: <[email protected]>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox