public inbox for [email protected]help / color / mirror / Atom feed
Load a csv or a avro? 3+ messages / 3 participants [nested] [flat]
* Load a csv or a avro? @ 2024-07-05 09:08 sud <[email protected]> 0 siblings, 2 replies; 3+ messages in thread From: sud @ 2024-07-05 09:08 UTC (permalink / raw) To: pgsql-general <[email protected]> Hello all, Its postgres database. We have option of getting files in csv and/or in avro format messages from another system to load it into our postgres database. The volume will be 300million messages per day across many files in batches. My question was, which format should we chose in regards to faster data loading performance ? and if any other aspects to it also should be considered apart from just loading performance? ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Load a csv or a avro? @ 2024-07-05 10:02 Josef Šimánek <[email protected]> parent: sud <[email protected]> 1 sibling, 0 replies; 3+ messages in thread From: Josef Šimánek @ 2024-07-05 10:02 UTC (permalink / raw) To: sud <[email protected]>; +Cc: pgsql-general <[email protected]> pá 5. 7. 2024 v 11:08 odesílatel sud <[email protected]> napsal: > > Hello all, > > Its postgres database. We have option of getting files in csv and/or in avro format messages from another system to load it into our postgres database. The volume will be 300million messages per day across many files in batches. > > My question was, which format should we chose in regards to faster data loading performance ? and if any other aspects to it also should be considered apart from just loading performance? We are able to load ~300 million rows per one day using CSV and COPY functions (https://www.postgresql.org/docs/current/libpq-copy.html#LIBPQ-COPY-SEND). ^ permalink raw reply [nested|flat] 3+ messages in thread
* Re: Load a csv or a avro? @ 2024-07-05 13:03 Ron Johnson <[email protected]> parent: sud <[email protected]> 1 sibling, 0 replies; 3+ messages in thread From: Ron Johnson @ 2024-07-05 13:03 UTC (permalink / raw) To: pgsql-general On Fri, Jul 5, 2024 at 5:08 AM sud <[email protected]> wrote: > Hello all, > > Its postgres database. We have option of getting files in csv and/or in > avro format messages from another system to load it into our postgres > database. The volume will be 300million messages per day across many files > in batches. > > My question was, which format should we chose in regards to faster data > loading performance ? > What application will be loading the data? If psql, then go with CSV; COPY is *really* efficient. If the PG tables are already mapped to the avro format, then maybe avro will be faster. > and if any other aspects to it also should be considered apart from just > loading performance? > If all the data comes in at night, drop as many indices as possible before loading. Load each file in as few DB connections as possible: the most efficient binary format won't do you any good if you open and close a connection for each and every row. ^ permalink raw reply [nested|flat] 3+ messages in thread
end of thread, other threads:[~2024-07-05 13:03 UTC | newest] Thread overview: 3+ messages (download: mbox mbox.gz follow: Atom feed) -- links below jump to the message on this page -- 2024-07-05 09:08 Load a csv or a avro? sud <[email protected]> 2024-07-05 10:02 ` Josef Šimánek <[email protected]> 2024-07-05 13:03 ` Ron Johnson <[email protected]>
This inbox is served by agora; see mirroring instructions for how to clone and mirror all data and code used for this inbox