public inbox for [email protected]  
help / color / mirror / Atom feed
Difference between Bulk Load (Multiple inserts or single inserts) and COPY
6+ messages / 3 participants
[nested] [flat]

* Difference between Bulk Load (Multiple inserts or single inserts) and COPY
@ 2019-11-19 18:55  PG Doc comments form <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: PG Doc comments form @ 2019-11-19 18:55 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/sql-copy.html
Description:

Hello, 

Myself Mayank. I am a Ph.D. student.

I experimented with Bulk load and COPY.
Loading in COPY was very fast. 
However, after COPYing data from a CSV file to PostgreSQL Table. The query
execution took lot of time for 1 of the first 4 queries.
Only this slow query was taking so much time, that even if I had used normal
bulk load, it would have been faster in total.
Then all other Query executions took equal time as it took while querying a
table after the Bulk data load method.

So, I want to know the exact reason what's the issue with COPY.
How exactly they differ? The only thing from the document I could identify
was row security.
But it did not mention anything about indexing. Like, in Bulk load, do
indices(or constraint checks) are created with data loading?
& in COPY it's done after? so when indices are being created that query
slows down??

Please reply soon with more details or send a link where I can read it in
depth.
Thanks.
Mayank.
[email protected]


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Difference between Bulk Load (Multiple inserts or single inserts) and COPY
@ 2019-11-19 22:55  Laurenz Albe <[email protected]>
  parent: PG Doc comments form <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: Laurenz Albe @ 2019-11-19 22:55 UTC (permalink / raw)
  To: [email protected]; [email protected]

On Tue, 2019-11-19 at 18:55 +0000, PG Doc comments form wrote:
> I experimented with Bulk load and COPY.
> Loading in COPY was very fast. 
> However, after COPYing data from a CSV file to PostgreSQL Table. The query
> execution took lot of time for 1 of the first 4 queries.
> Only this slow query was taking so much time, that even if I had used normal
> bulk load, it would have been faster in total.
> Then all other Query executions took equal time as it took while querying a
> table after the Bulk data load method.
> 
> So, I want to know the exact reason what's the issue with COPY.
> How exactly they differ? The only thing from the document I could identify
> was row security.
> But it did not mention anything about indexing. Like, in Bulk load, do
> indices(or constraint checks) are created with data loading?
> & in COPY it's done after? so when indices are being created that query
> slows down??
> 
> Please reply soon with more details or send a link where I can read it in
> depth.

That cannot be answered without knowing the exact statements and the
table definitions.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Difference between Bulk Load (Multiple inserts or single inserts) and COPY
@ 2019-11-22 09:33  PG Doc comments form <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: PG Doc comments form @ 2019-11-22 09:33 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/sql-copy.html
Description:

Hello,

> I experimented with Bulk load and COPY.
> Loading in COPY was very fast.
> However, after COPYing data from a CSV file to PostgreSQL Table. The
query
> execution took lot of time for 1 of the first 4 queries.
> Only this slow query was taking so much time, that even if I had used
normal
> bulk load, it would have been faster in total.
> Then all other Query executions took equal time as it took while querying
a
> table after the Bulk data load method.
>
> So, I want to know the exact reason what's the issue with COPY.
> How exactly they differ? The only thing from the document I could
identify
> was row security.
> But it did not mention anything about indexing. Like, in Bulk load, do
> indices(or constraint checks) are created with data loading?
> & in COPY it's done after? so when indices are being created that query
> slows down??

*Added details*

"Table & Query details"
I have 1 Table is there having 3 attributes:
TableName{ Column1 Varchar300,  Column2 Varchar300,  Column3 Varchar300};
I haven't created any primary keys or FKs. No other constraints.

Data set size: 150MB / 1M records

Queries:
Select count(*) from Table;
Select count(distinct( Column1, Column2 , Column3 )) from Table;
Select Column1, Column2, Column3 from Table as T1, Table as T2,  Table as T3
where T1. Column1=T2.Column3 and T1. Column1="xyz";

Please let me know, how Bulk load vs. COPY different in both situations
1) Do the internal representation differs after data is loaded using Bulk
vs. COPY?
2) what if I have added Keys and Constraints, are they checked later? Means
loading is shown completed but in background it's creating indices/checking
constraints.
3) Can it be the reason that some other process(which?) is running in
background during query execution ? as I query the data as soon as the load
after COPY is complete.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Difference between Bulk Load (Multiple inserts or single inserts) and COPY
@ 2019-12-01 00:55  Bruce Momjian <[email protected]>
  parent: Laurenz Albe <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Bruce Momjian @ 2019-12-01 00:55 UTC (permalink / raw)
  To: Laurenz Albe <[email protected]>; +Cc: [email protected]; [email protected]

On Tue, Nov 19, 2019 at 11:55:44PM +0100, Laurenz Albe wrote:
> On Tue, 2019-11-19 at 18:55 +0000, PG Doc comments form wrote:
> > I experimented with Bulk load and COPY.
> > Loading in COPY was very fast. 
> > However, after COPYing data from a CSV file to PostgreSQL Table. The query
> > execution took lot of time for 1 of the first 4 queries.
> > Only this slow query was taking so much time, that even if I had used normal
> > bulk load, it would have been faster in total.
> > Then all other Query executions took equal time as it took while querying a
> > table after the Bulk data load method.
> > 
> > So, I want to know the exact reason what's the issue with COPY.
> > How exactly they differ? The only thing from the document I could identify
> > was row security.
> > But it did not mention anything about indexing. Like, in Bulk load, do
> > indices(or constraint checks) are created with data loading?
> > & in COPY it's done after? so when indices are being created that query
> > slows down??
> > 
> > Please reply soon with more details or send a link where I can read it in
> > depth.
> 
> That cannot be answered without knowing the exact statements and the
> table definitions.

I wonder if it is the overhead of rewriting all the rows to set the
per-row HEAP_XMIN_COMMITTED bit.  Unfortunately, I don't know a way to
test this hypothesis.

-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +






^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Difference between Bulk Load (Multiple inserts or single inserts) and COPY
@ 2019-12-05 15:39  PG Doc comments form <[email protected]>
  0 siblings, 1 reply; 6+ messages in thread

From: PG Doc comments form @ 2019-12-05 15:39 UTC (permalink / raw)
  To: [email protected]; +Cc: [email protected]

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/10/sql-copy.html
Description:

Hello,

> I experimented with Bulk load and COPY.
> Loading in COPY was very fast.
> However, after COPYing data from a CSV file to PostgreSQL Table. The
query
> execution took lot of time for 1 of the first 4 queries.
> Only this slow query was taking so much time, that even if I had used
normal
> bulk load, it would have been faster in total.
> Then all other Query executions took equal time as it took while querying
a
> table after the Bulk data load method.
>
> So, I want to know the exact reason what's the issue with COPY.
> How exactly they differ? The only thing from the document I could
identify
> was row security.
> But it did not mention anything about indexing. Like, in Bulk load, do
> indices(or constraint checks) are created with data loading?
> & in COPY it's done after? so when indices are being created that query
> slows down??

*Added details*

"Table & Query details"
I have 1 Table is there having 3 attributes:
TableName{ Column1 Varchar300,  Column2 Varchar300,  Column3 Varchar300};
I haven't created any primary keys or FKs. No other constraints.

Data set size: 150MB / 1M records

Queries:
Select count(*) from Table;
Select count(distinct( Column1, Column2 , Column3 )) from Table;
Select Column1, Column2, Column3 from Table as T1, Table as T2,  Table as T3
where T1. Column1=T2.Column3 and T1. Column1="xyz";

Please let me know, how Bulk load vs. COPY different in both situations
1) Do the internal representation differs after data is loaded using Bulk
vs. COPY?
2) what if I have added Keys and Constraints, are they checked later? Means
loading is shown completed but in background it's creating indices/checking
constraints.
3) Can it be the reason that some other process(which?) is running in
background during query execution ? as I query the data as soon as the load
after COPY is complete.


^ permalink  raw  reply  [nested|flat] 6+ messages in thread

* Re: Difference between Bulk Load (Multiple inserts or single inserts) and COPY
@ 2019-12-21 18:24  Bruce Momjian <[email protected]>
  parent: PG Doc comments form <[email protected]>
  0 siblings, 0 replies; 6+ messages in thread

From: Bruce Momjian @ 2019-12-21 18:24 UTC (permalink / raw)
  To: [email protected]; [email protected]


This is not a documentation question.  For assistance, please join the
appropriate mailing list and post your question:

	http://www.postgresql.org/community

You can also try the #postgresql IRC channel on irc.freenode.net.  See
the PostgreSQL FAQ for more information.

---------------------------------------------------------------------------

On Thu, Dec  5, 2019 at 03:39:24PM +0000, PG Doc comments form wrote:
> The following documentation comment has been logged on the website:
> 
> Page: https://www.postgresql.org/docs/10/sql-copy.html
> Description:
> 
> Hello,
> 
> > I experimented with Bulk load and COPY.
> > Loading in COPY was very fast.
> > However, after COPYing data from a CSV file to PostgreSQL Table. The
> query
> > execution took lot of time for 1 of the first 4 queries.
> > Only this slow query was taking so much time, that even if I had used
> normal
> > bulk load, it would have been faster in total.
> > Then all other Query executions took equal time as it took while querying
> a
> > table after the Bulk data load method.
> >
> > So, I want to know the exact reason what's the issue with COPY.
> > How exactly they differ? The only thing from the document I could
> identify
> > was row security.
> > But it did not mention anything about indexing. Like, in Bulk load, do
> > indices(or constraint checks) are created with data loading?
> > & in COPY it's done after? so when indices are being created that query
> > slows down??
> 
> *Added details*
> 
> "Table & Query details"
> I have 1 Table is there having 3 attributes:
> TableName{ Column1 Varchar300,  Column2 Varchar300,  Column3 Varchar300};
> I haven't created any primary keys or FKs. No other constraints.
> 
> Data set size: 150MB / 1M records
> 
> Queries:
> Select count(*) from Table;
> Select count(distinct( Column1, Column2 , Column3 )) from Table;
> Select Column1, Column2, Column3 from Table as T1, Table as T2,  Table as T3
> where T1. Column1=T2.Column3 and T1. Column1="xyz";
> 
> Please let me know, how Bulk load vs. COPY different in both situations
> 1) Do the internal representation differs after data is loaded using Bulk
> vs. COPY?
> 2) what if I have added Keys and Constraints, are they checked later? Means
> loading is shown completed but in background it's creating indices/checking
> constraints.
> 3) Can it be the reason that some other process(which?) is running in
> background during query execution ? as I query the data as soon as the load
> after COPY is complete.


-- 
  Bruce Momjian  <[email protected]>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +






^ permalink  raw  reply  [nested|flat] 6+ messages in thread


end of thread, other threads:[~2019-12-21 18:24 UTC | newest]

Thread overview: 6+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2019-11-19 18:55 Difference between Bulk Load (Multiple inserts or single inserts) and COPY PG Doc comments form <[email protected]>
2019-11-19 22:55 ` Laurenz Albe <[email protected]>
2019-12-01 00:55   ` Bruce Momjian <[email protected]>
2019-11-22 09:33 Difference between Bulk Load (Multiple inserts or single inserts) and COPY PG Doc comments form <[email protected]>
2019-12-05 15:39 Difference between Bulk Load (Multiple inserts or single inserts) and COPY PG Doc comments form <[email protected]>
2019-12-21 18:24 ` Bruce Momjian <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox