public inbox for [email protected]  
help / color / mirror / Atom feed
Re: Load a csv or a avro?
2+ messages / 2 participants
[nested] [flat]

* Re: Load a csv or a avro?
@ 2024-07-06 20:09  sud <[email protected]>
  0 siblings, 1 reply; 2+ messages in thread

From: sud @ 2024-07-06 20:09 UTC (permalink / raw)
  To: Adrian Klaver <[email protected]>; +Cc: pgsql-general <[email protected]>

On Fri, Jul 5, 2024 at 8:24 PM Adrian Klaver <[email protected]>
wrote:

> On 7/5/24 02:08, sud wrote:
> > Hello all,
> >
> > Its postgres database. We have option of getting files in csv and/or in
> > avro format messages from another system to load it into our postgres
> > database. The volume will be 300million messages per day across many
> > files in batches.
>
> Are dumping the entire contents of each file or are you pulling a
> portion of the data out?
>
>
>
Yes, all the fields in the file have to be loaded to the columns in the
tables in postgres. But how will that matter here for deciding if we should
ask the data in .csv or .avro format from the outside system to load into
the postgres database in row and column format? Again my understanding was
that irrespective of anything , the .csv file load will always faster as
because the data is already stored in row and column format as compared to
the .avro file in which the parser has to perform additional job to make it
row and column format or map it to the columns of the database table. Is my
understanding correct here?


^ permalink  raw  reply  [nested|flat] 2+ messages in thread

* Re: Load a csv or a avro?
@ 2024-07-07 15:14  Adrian Klaver <[email protected]>
  parent: sud <[email protected]>
  0 siblings, 0 replies; 2+ messages in thread

From: Adrian Klaver @ 2024-07-07 15:14 UTC (permalink / raw)
  To: sud <[email protected]>; +Cc: pgsql-general <[email protected]>

On 7/6/24 13:09, sud wrote:
> On Fri, Jul 5, 2024 at 8:24 PM Adrian Klaver <[email protected] 
> <mailto:[email protected]>> wrote:
> 
>     On 7/5/24 02:08, sud wrote:
>      > Hello all,
>      >
>      > Its postgres database. We have option of getting files in csv
>     and/or in
>      > avro format messages from another system to load it into our
>     postgres
>      > database. The volume will be 300million messages per day across many
>      > files in batches.
> 
>     Are dumping the entire contents of each file or are you pulling a
>     portion of the data out?
> 
> 
> 
> Yes, all the fields in the file have to be loaded to the columns in the 
> tables in postgres. But how will that matter here for deciding if we 
> should ask the data in .csv or .avro format from the outside system to 
> load into the postgres database in row and column format? Again my 
> understanding was that irrespective of anything , the .csv file load 
> will always faster as because the data is already stored in row and 
> column format as compared to the .avro file in which the parser has to 
> perform additional job to make it row and column format or map it to the 
> columns of the database table. Is my understanding correct here?

If you are going to use complete rows and all rows then COPY of CSV in 
Postgres would be your best choice.

-- 
Adrian Klaver
[email protected]







^ permalink  raw  reply  [nested|flat] 2+ messages in thread


end of thread, other threads:[~2024-07-07 15:14 UTC | newest]

Thread overview: 2+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-07-06 20:09 Re: Load a csv or a avro? sud <[email protected]>
2024-07-07 15:14 ` Adrian Klaver <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox