public inbox for [email protected]  
help / color / mirror / Atom feed
Load a csv or a avro?
3+ messages / 3 participants
[nested] [flat]

* Load a csv or a avro?
@ 2024-07-05 09:08 sud <[email protected]>
  2024-07-05 10:02 ` Re: Load a csv or a avro? Josef Šimánek <[email protected]>
  2024-07-05 13:03 ` Re: Load a csv or a avro? Ron Johnson <[email protected]>
  0 siblings, 2 replies; 3+ messages in thread

From: sud @ 2024-07-05 09:08 UTC (permalink / raw)
  To: pgsql-general <[email protected]>

Hello all,

Its postgres database. We have option of getting files in csv and/or in
avro format messages from another system to load it into our postgres
database. The volume will be 300million messages per day across many files
in batches.

My question was, which format should we chose in regards to faster data
loading performance ? and if any other aspects to it also should be
considered apart from just loading performance?


^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Load a csv or a avro?
  2024-07-05 09:08 Load a csv or a avro? sud <[email protected]>
@ 2024-07-05 10:02 ` Josef Šimánek <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Josef Šimánek @ 2024-07-05 10:02 UTC (permalink / raw)
  To: sud <[email protected]>; +Cc: pgsql-general <[email protected]>

pá 5. 7. 2024 v 11:08 odesílatel sud <[email protected]> napsal:
>
> Hello all,
>
> Its postgres database. We have option of getting files in csv and/or in avro format messages from another system to load it into our postgres database. The volume will be 300million messages per day across many files in batches.
>
> My question was, which format should we chose in regards to faster data loading performance ? and if any other aspects to it also should be considered apart from just loading performance?

We are able to load ~300 million rows per one day using CSV and COPY
functions (https://www.postgresql.org/docs/current/libpq-copy.html#LIBPQ-COPY-SEND).






^ permalink  raw  reply  [nested|flat] 3+ messages in thread

* Re: Load a csv or a avro?
  2024-07-05 09:08 Load a csv or a avro? sud <[email protected]>
@ 2024-07-05 13:03 ` Ron Johnson <[email protected]>
  1 sibling, 0 replies; 3+ messages in thread

From: Ron Johnson @ 2024-07-05 13:03 UTC (permalink / raw)
  To: pgsql-general

On Fri, Jul 5, 2024 at 5:08 AM sud <[email protected]> wrote:

> Hello all,
>
> Its postgres database. We have option of getting files in csv and/or in
> avro format messages from another system to load it into our postgres
> database. The volume will be 300million messages per day across many files
> in batches.
>
> My question was, which format should we chose in regards to faster data
> loading performance ?
>

What application will be loading the data?   If psql, then go with CSV;
COPY is *really* efficient.

If the PG tables are already mapped to the avro format, then maybe avro
will be faster.

> and if any other aspects to it also should be considered apart from just
> loading performance?
>

If all the data comes in at night, drop as many indices as possible before
loading.

Load each file in as few DB connections as possible: the most efficient
binary format won't do you any good if you open and close a connection for
each and every row.


^ permalink  raw  reply  [nested|flat] 3+ messages in thread


end of thread, other threads:[~2024-07-05 13:03 UTC | newest]

Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-07-05 09:08 Load a csv or a avro? sud <[email protected]>
2024-07-05 10:02 ` Josef Šimánek <[email protected]>
2024-07-05 13:03 ` Ron Johnson <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox