public inbox for [email protected]  
help / color / mirror / Atom feed
From: Adrian Klaver <[email protected]>
To: [email protected] <[email protected]>
To: Merlin Moncure <[email protected]>
To: Laurenz Albe <[email protected]>
Cc: Pgsql-general <[email protected]>
Subject: Re: Is there any limit on the number of rows to import using copy command
Date: Thu, 24 Jul 2025 17:48:53 -0700
Message-ID: <[email protected]> (raw)
In-Reply-To: <[email protected]>
References: <[email protected]>
	<[email protected]>
	<CAHyXU0x1uW_349ONE+35KT87Ua-dX8-QaZ5Sj6eENTMwsX=okw@mail.gmail.com>
	<[email protected]>
	<[email protected]>
	<[email protected]>

On 7/24/25 16:59, [email protected] wrote:
> 1.  Testcase.  Created a new database, modified the triggers (split into 
> three), populated required master data, lookup tables.  Then transferred 
> 86420 records. Checked whether all the 86420 records inserted in table1 
> and also whether the trigger created the required records in table2.  
>   Yes, it created.
> 
> 2.  In the test case above, the total time taken to insert 86420 records 
> is 1.15 min only.   Earlier (before splitting the triggers) we waited 
> for more than 1.5 hrs first time and 2.5 hrs second time with no records 
> inserted.
> 
> 3.  Regarding moving the logic to procedure.  Won't the trigger work?  
> Will it be a burden for 86420 records?  It's working, if we insert few 
> thousand records.  After split of trigger function, it's working for 
> 86420 records.  Are triggers overhead for handling even 100000 records?  
> In production system, the same (single) trigger is working with 3 
> millions of records.  There might be better alternatives to triggers, 
> but triggers should also work.  IMHO.

Reread this post, in the thread, from Laurenz Albe:

https://www.postgresql.org/message-id/de08fd016dd9c630f65c52b80292550e0bcdea4c.camel%40cybertec.at

> 
> 4.  Staging tables.  Yes, I have done that in another case, where there 
> was a need to add data / transform for few more columns.  It worked like 
> a charm.  In this case, since there was no need for any other 
> calculations (transformation), and with just column to column matching, 
> I thought copy command will do.

There is a transformation, you are moving data to another table. That is 
overhead, especially if the triggers are not optimized.

> 
> Before splitting the trigger into three, we tried
> 1.  Transferring data using DataWindow / PowerBuilder (that's the tool 
> we use to develop our front end).  With the same single trigger, it took 
> few hours (more than 4 hours, exact time not noted down) to transfer the 
> same 86420 records.  (Datawindow fires insert statements for every 
> row).  Works, but the time taken is not acceptable.

INSERTs by row is going to be slow, especially if the tool is doing a 
commit for each which I suspect it is. Check the Postgres logs.

> 
> 2.  Next, we split the larger csv file into 8, with each file containing 
> 10,000 records and the last one with 16420 records.  Copy command 
> worked.  Works, but the time taken to split the file not acceptable.  We 
> wrote a batch file to split the larger csv file.  We felt batch file is 
> easier to automate the whole process using PowerBuilder.

I find most GUI tools create extra steps and overhead. My preference are 
simpler tools e.g. using Python csv module to batch/stream rows that the 
Python psycopg2 Postgres driver can insert or copy into the database.
See:

https://www.psycopg.org/psycopg3/docs/basic/copy.html

> 
> 3.  What we observed here, is insert statement succeeds and copy command 
> fails, if the records exceed a certain no.  Haven't arrived the exact 
> number of rows when the copy command fails.
> 
> Will do further works after my return from a holiday.
> 
> Happiness Always
> BKR Sivaprakash
> 
> 
> 
> On Thursday 24 July, 2025 at 08:18:07 pm IST, Adrian Klaver 
> <[email protected]> wrote:
> 
> 
> On 7/24/25 05:18, [email protected] <mailto:[email protected]> 
> wrote:
>  > Thanks Merlin, adrain, Laurenz
>  >
>  > As a testcase, I split the trigger function into three, one each for
>  > insert, update, delete, each called from a separate trigger.
>  >
>  > IT WORKS!.
> 
> It worked before, it just slowed down as your cases got bigger. You need
> to provide more information on what test case you used and how you
> define worked.
> 
>  >
>  > Shouldn't we have one trigger function for all the three trigger
>  > events?  Is it prohibited for bulk insert like this?
> 
> No. Triggers are overhead and they add to the processing that need to be
> done for moving the data into the table. Whether that is an issue is a
> case by case determination.
> 
>  >
>  > I tried this in PGAdmin only, will complete the testing from the program
>  > which we are developing, after my return from holiday.
> 
>  From Merlin Moncure's post:
> 
> "* reconfiguring your logic to a procedure can be a better idea; COPY
> your data into some staging tables (perhaps temp, and indexed), then
> write to various tables with joins, upserts, etc."
> 
> I would suggest looking into implementing the above.
> 
> 
>  >
>  > Happiness Always
>  > BKR Sivaprakash
> 
>  >
> 
> 
> 
> -- 
> Adrian Klaver
> [email protected] <mailto:[email protected]>
> 


-- 
Adrian Klaver
[email protected]






view thread (5+ messages)

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected], [email protected], [email protected]
  Subject: Re: Is there any limit on the number of rows to import using copy command
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox