Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPfyw-005kbc-AZ for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 10:15:58 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sPfyt-008VSY-5u for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 10:15:55 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPfys-008VSQ-Qz for pgsql-general@lists.postgresql.org; Fri, 05 Jul 2024 10:15:55 +0000 Received: from mail-lj1-x22f.google.com ([2a00:1450:4864:20::22f]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sPfyr-000XJM-6w for pgsql-general@lists.postgresql.org; Fri, 05 Jul 2024 10:15:54 +0000 Received: by mail-lj1-x22f.google.com with SMTP id 38308e7fff4ca-2ee910d6a9dso11135991fa.1 for ; Fri, 05 Jul 2024 03:15:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720174551; x=1720779351; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Lo9B4bL9AnTMB41iEzqGyGDpL8K9nUBvhEIom8KmF50=; b=YDeBgHvWPBW1r0zkUI3LBUJEHF52JkYrcSY2naOR9wEyzWLyTHonIBXdaHx2ka/Iyo QpmfzHNDolah/SXdEgxnjqsoEBUHDFD/FnyzhZlF1slqoMLojsO/FZQKSa2/RRViaRrN mkGXESgJ9fZR0/eKhNC1y8j4EBsW0gDB2E3rkdKzCPW3BIURCXlQykwzaqmJ3jlb6kL5 Ahcpnl8L1SdnvRxY1ChLeNt1B6ybQfXR6ZHVi1QQdnWkTwVGb/Gq4BEUHA9XYteRXiN2 H0xmvVmh0HWstq0bTRGQVV81ObkvVfaJZ/FVzks5Ds/7WJ1ygUPj8jqmY6oBZwTt/H+j Q9tg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720174551; x=1720779351; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Lo9B4bL9AnTMB41iEzqGyGDpL8K9nUBvhEIom8KmF50=; b=mNK3nbVGCcRKc0rwyazobwrxApIz1XK36SMaozaiykZ2pk0+4DiCJarwD454in79uf YukYDVwwGmf37pXYh8QR/rL25a6NsnBprHC1CWxP0MpQ1GdsxU3Q95TGN/KzC8AHMnhZ 4e9HXGQJCHbh/Wd47FgfY8k/RVcLKwRI88LjACll8DeWNCHNR4JyYgT9jmWscy4WNnBQ zIZuJDBg2hfUkX/Z5P3rEgJbAiF7VOC18rlHuBh9+3qIP+QJSWW32e601m5CP+VbJKBR NTs/eB8TDPnnF75yXZ473YPYyLLgzWGoJz/SrtHetxKEjXPdd9CvXjn0FKBTIZ/VoyhE AWiQ== X-Forwarded-Encrypted: i=1; AJvYcCUps3oqXfpKU9GyRYJ4zaiQ+uUOGK7KK05D+JiUOWeZe0ux81XW4wTkdgag79rjkiSeoRfUhrBjqgREB1pvEsw5dovGhQFwyi28TWWigD0XtCSd X-Gm-Message-State: AOJu0YyOpFM2CODcVe6DM7iyv0DUm3rJggU1jnYg4cI806GVADyb7tI7 BcEsopQykskuvDH/3NVbYqRqdVZ35hERLaUtEF4zLNW+2K2NNEiBBlhsVpKXw2ncLjTz2xbXp00 dPOFNAA0YK5ivcJ0XswLT+QnTaog= X-Google-Smtp-Source: AGHT+IH0xler3YIeBqI4V8QyBHJKvSpVO35q7PIVCKQdStAOLgnbPldxE7Mg9hNuO3jEYuphIzZbgXMmN5KDwI6B0mw= X-Received: by 2002:a2e:8ec2:0:b0:2ed:5c34:4082 with SMTP id 38308e7fff4ca-2ee8ed62983mr28509141fa.8.1720174550506; Fri, 05 Jul 2024 03:15:50 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Muhammad Ikram Date: Fri, 5 Jul 2024 15:15:28 +0500 Message-ID: Subject: Re: Load a csv or a avro? To: =?UTF-8?B?Sm9zZWYgxaBpbcOhbmVr?= Cc: sud , pgsql-general Content-Type: multipart/alternative; boundary="00000000000008e4c0061c7d5971" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000008e4c0061c7d5971 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, Performance Considerations Avro files are smaller due to compression so needing less I/O time. whereas CSV files are simpler but larger in size so read/write will need more time. COPY command works very well with CSV files whereas ETL process is required for handling Avro. Regards, Muhammad Ikram On Fri, Jul 5, 2024 at 3:03=E2=80=AFPM Josef =C5=A0im=C3=A1nek wrote: > p=C3=A1 5. 7. 2024 v 11:08 odes=C3=ADlatel sud napsa= l: > > > > Hello all, > > > > Its postgres database. We have option of getting files in csv and/or in > avro format messages from another system to load it into our postgres > database. The volume will be 300million messages per day across many file= s > in batches. > > > > My question was, which format should we chose in regards to faster data > loading performance ? and if any other aspects to it also should be > considered apart from just loading performance? > > We are able to load ~300 million rows per one day using CSV and COPY > functions ( > https://www.postgresql.org/docs/current/libpq-copy.html#LIBPQ-COPY-SEND). > > > --=20 Muhammad Ikram --00000000000008e4c0061c7d5971 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

Performance Considerations

=
=C2=A0 =C2=A0 Avro files are smaller due to compression so needing les= s I/O time. whereas CSV files are simpler but larger in size so read/write = will need more time.
=C2=A0 =C2=A0 COPY command works very well w= ith CSV files whereas ETL process is required for handling Avro.
=
Regards,
Muhammad Ikram


On Fri, Jul 5, 2024 at 3:03=E2=80=AFPM Josef =C5=A0im=C3=A1nek <= ;josef.simanek@gmail.com>= wrote:
p=C3=A1 = 5. 7. 2024 v 11:08 odes=C3=ADlatel sud <suds1434@gmail.com> napsal:
>
> Hello all,
>
> Its postgres database. We have option of getting files in csv and/or i= n avro format messages from another system to load it into our postgres dat= abase. The volume will be 300million messages per day across many files in = batches.
>
> My question was, which format should we chose in regards to faster dat= a loading performance ? and if any other aspects to it also should be consi= dered apart from just loading performance?

We are able to load ~300 million rows per one day using CSV and COPY
functions (https://www.postgr= esql.org/docs/current/libpq-copy.html#LIBPQ-COPY-SEND).




--
Muhammad Ikram

--00000000000008e4c0061c7d5971--