Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPfhT-005jQY-8g for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 09:57:55 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sPfhQ-008JyN-MI for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 09:57:53 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPfhQ-008JyE-Bo for pgsql-general@lists.postgresql.org; Fri, 05 Jul 2024 09:57:53 +0000 Received: from mail-ua1-x92b.google.com ([2607:f8b0:4864:20::92b]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sPfhO-000Vu6-9J for pgsql-general@lists.postgresql.org; Fri, 05 Jul 2024 09:57:51 +0000 Received: by mail-ua1-x92b.google.com with SMTP id a1e0cc1a2514c-8102c9a5badso447957241.0 for ; Fri, 05 Jul 2024 02:57:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720173469; x=1720778269; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=AyUQdrqJzOV7cUMh18/A0JLs6jCL9MqJTFawBa3I1Fw=; b=ii5V3/88kkeN6rCMhUIbRw6Py38N9TFTav9IoBcFFnxYUsWiz8Z96cLVsNE5TbLm+h DH77PXAfEaUrM/RNcb/pNWV/S1mdTrT2M45qosl8AQ2RBq+B/nBSWcQVYhNXqtADy6nN JIk9qn4MFTHFTEm52qa84twRqumtVyrduSBk5L5mcPQG7CtJQQQO5pfKW0h92hRsp+7R TWlc5R2FT3dtWx5p/ggLLxxGxFNko7wNRD8Zy+ZOSO9YjSI5s/5o68lUqkhfBna3vdNu 6yXOpE4LXGJOiwG+OksA0h+RA8ohjjrItJFEyzLQOjnUn4x7xD5mlYGRWeYnRVAifHkN 1Rbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720173469; x=1720778269; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=AyUQdrqJzOV7cUMh18/A0JLs6jCL9MqJTFawBa3I1Fw=; b=qp4K2+SDpcNqTe+UfeyXGLEXk585LyYg8kQoNZ+N0PWv7AwoB3RTLcatlRAG6laT07 nCfBZZKQVceYd//av/8CcWdETc7wiEOgTpUPmKAcLBGkAh87Q5mC/zcV1qMIHmf3A1NU mVWxhzwIPN/XQqkhH3zrYoT/Ue9HjuuS6gHu9d3OeWLWONAGzQtOAFupzsxJC74gVE0G VUH0eVt1j4xe+wn1WCm8vYzZA5hEZZkt840/vP2z7OTWi3/gLftIKrs75xbZSOHNSJdW 7O1UFDf/GJ0ZXeEq6BBSop/OsCc4nAn8w/03JEpTHg7rci3/dMTd1paeSbU/D6PScYA8 C6vA== X-Gm-Message-State: AOJu0Yyx0UQ4tqHBdDh0qi1PMxKXjxhclfJR5DncFP64kuUq4V/5SOA6 WdyKPyVCG3DH+YExDxgzMy0vhEo5zGzEsgt6DAaR8kvAloW79EaguHvAHA3zqR9ThnbIDoeE1CF D2Yn+VchUuUIw50VSQXqt2pFfu1o= X-Google-Smtp-Source: AGHT+IG/Rya3d80wOkdnF+t/yzYAlRD/CPIN3rK95oaM1qKCw9CVb13vuv7Yhhuz0ylLiYcI3XABFcSj3bhHkTKolwE= X-Received: by 2002:a05:6102:7c8:b0:48f:3df9:ff9 with SMTP id ada2fe7eead31-48fee871324mr4455325137.8.1720173469180; Fri, 05 Jul 2024 02:57:49 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Kashif Zeeshan Date: Fri, 5 Jul 2024 14:57:37 +0500 Message-ID: Subject: Re: Load a csv or a avro? To: sud Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000952aeb061c7d18e5" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000952aeb061c7d18e5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi There are different data formats available, following are few points for there performance implications 1. CSV : It's easy to use and widely supported but it can be slower due to parsing overload. 2. Binary : Its faster to load but not human understandable. Hope this helps. Regards Kashif Zeeshan On Fri, Jul 5, 2024 at 2:08=E2=80=AFPM sud wrote: > Hello all, > > Its postgres database. We have option of getting files in csv and/or in > avro format messages from another system to load it into our postgres > database. The volume will be 300million messages per day across many file= s > in batches. > > My question was, which format should we chose in regards to faster data > loading performance ? and if any other aspects to it also should be > considered apart from just loading performance? > --000000000000952aeb061c7d18e5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi

There are different data formats ava= ilable, following are few points for there performance implications

1. CSV : It's easy to use and widely=C2=A0supported b= ut it can be slower due to parsing overload.
2. Binary : Its fast= er=C2=A0to load but not human understandable.

Hope= this=C2=A0helps.

Regards
Kashif Zeeshan=

On Fri, Jul 5, 2024 at 2:08=E2=80=AFPM sud <suds1434@gmail.com> wrote:

Hello al= l,=C2=A0

Its postgres database. We have option of getting files in cs= v and/or in avro format messages from another system to load it into our po= stgres database. The volume will be 300million messages per day across many= files in batches.

My question was, which format = should we chose in regards to faster data loading performance ? and if any = other aspects to it also should be considered apart from just loading perfo= rmance?

--000000000000952aeb061c7d18e5--