public inbox for [email protected]  
help / color / mirror / Atom feed
From: Ron Johnson <[email protected]>
To: pgsql-general <[email protected]>
Subject: Re: Load a csv or a avro?
Date: Sat, 6 Jul 2024 16:23:17 -0400
Message-ID: <CANzqJaD4ndk9QH-47uKTwd=HvCcWdNd6WdWHXX4b4en86LKz1A@mail.gmail.com> (raw)
In-Reply-To: <CAD=mzVXSeb3dG-=aHfpOo2oP7Od7CmnhdVc2+pDCRdSysBa-6A@mail.gmail.com>
References: <CAD=mzVUo9UpTw7F_8HKDK19ZmO6tE6Cfa4T-7i1J_QGfi6NpOw@mail.gmail.com>
	<[email protected]>
	<CAD=mzVXSeb3dG-=aHfpOo2oP7Od7CmnhdVc2+pDCRdSysBa-6A@mail.gmail.com>

On Sat, Jul 6, 2024 at 4:10 PM sud <[email protected]> wrote:

> On Fri, Jul 5, 2024 at 8:24 PM Adrian Klaver <[email protected]>
> wrote:
>
>> On 7/5/24 02:08, sud wrote:
>> > Hello all,
>> >
>> > Its postgres database. We have option of getting files in csv and/or in
>> > avro format messages from another system to load it into our postgres
>> > database. The volume will be 300million messages per day across many
>> > files in batches.
>>
>> Are dumping the entire contents of each file or are you pulling a
>> portion of the data out?
>>
>>
>>
> Yes, all the fields in the file have to be loaded to the columns in the
> tables in postgres.
>

But you didn't say *which* columns or *which* tables.

If one row of CSV input must be split into multiple tables, then it might
be pretty slow.


> But how will that matter here for deciding if we should ask the data in
> .csv or .avro format from the outside system to load into the postgres
> database in row and column format? Again my understanding was that
> irrespective of anything , the .csv file load will always faster as because
> the data is already stored in row and column format as compared to the
> .avro file in which the parser has to perform additional job to make it row
> and column format or map it to the columns of the database table. Is my
> understanding correct here?
>

Yes and no.  It all depends on how well each input row maps to a Postgresql
table.

Bottom line: you want an absolute answer, but we can't give you an absolute
answer, since we don't know what the input data looks like, and we don't
know what the Postgresql tables look like.

An AVRO file *might* be faster to input than CSV, or it might be horribly
slower.

And you might incompetently program a CSV importer so that it's horribly
slow.

We can't give absolute answers without knowing more details than the
ambiguous generalities in your emails.


reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Load a csv or a avro?
  In-Reply-To: <CANzqJaD4ndk9QH-47uKTwd=HvCcWdNd6WdWHXX4b4en86LKz1A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox