Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sQBjS-008Mjd-Bt for pgsql-general@arkaria.postgresql.org; Sat, 06 Jul 2024 20:10:06 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sQBjR-005zNf-0h for pgsql-general@arkaria.postgresql.org; Sat, 06 Jul 2024 20:10:05 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sQBjQ-005zNW-MV for pgsql-general@lists.postgresql.org; Sat, 06 Jul 2024 20:10:04 +0000 Received: from mail-vk1-xa2d.google.com ([2607:f8b0:4864:20::a2d]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sQBjO-000m9w-H4 for pgsql-general@lists.postgresql.org; Sat, 06 Jul 2024 20:10:04 +0000 Received: by mail-vk1-xa2d.google.com with SMTP id 71dfb90a1353d-4e1c721c040so851299e0c.3 for ; Sat, 06 Jul 2024 13:10:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720296601; x=1720901401; darn=lists.postgresql.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=PI427krzN/sg3h4t81XVUpRujYcnMNS/Je61YaYUkBY=; b=YFGz6z0skuAkj3dLwEcKdvJ9JK12ufoM14JkTj6i/cW1q+DHZVXgzPYEuJJLktdsxy a5E6SLF/4cstLMqtshOpC1yV8B2MHzd7efHshNZiikGaE5urPxHk75qdwI2SrhGA7z4q saW8jmo1EJi93d7p6fdzkIBxloUgH9z7wUgNoh3E7I2fWq4RxvbCV98wyh4qIe3wHVCK jHaw+4ksbRUMaQR+B5jDqidy6wSJVCm5YT3ouGeqEHStSicv0sq3tzTRESTSg3x3/Bey 9FtHdxZg3QT29CHBXXxKrLcHJeckZl0eGD+LAEAVI244DS0Vu+yGHfECV+z4fxyMsnd/ XwJw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720296601; x=1720901401; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=PI427krzN/sg3h4t81XVUpRujYcnMNS/Je61YaYUkBY=; b=UNvL0Om/1WDuE+KLURNFFQLMouGqHFE2Jc/1qCEKXOWpXJ3IfSI+YSxTOCJBMOCkuN ZP0wtlQItxtOQGhMOAxPG4Z0vNTrvexBbINZh2k254cxBuJNrJChyUTn53Z/aXvh/877 F20Xp4pDgbGYLxP4FKA31zcSGyc/xsKeeYrfvJRnxXF0LrCPk3XTtwFvg9PpSBzkDseb i/iCeQDxIiqHqnEyIUhH8A9gN60I4aC7smEkJhFMtkO8tIeG9S9avxH6G08r+7lIkzox VQaReYleK17nEH0WD5H3urGYLDztM9BWgtMtqV9C8lt+/HMHDPzW1ba8hxKiw9XY7K81 oQJQ== X-Gm-Message-State: AOJu0YxsVywLyIpB5AKFVIz03kPKB9YtbAxE75Jq1BeMiLR7jU3LzldC 0kwxC0VqqgzL6XyTsOiRBQRkynAbn9Lbv11FChHL3jja689l9TpKiqqWRo5AbAgXiQcq+JA/Rxp N5lh67aiAfPgs+BrJvQW8TcJD3VAkbA== X-Google-Smtp-Source: AGHT+IGdPVInjMSgKSSq1+rXOmJ8fXvCRBVLYaQOEPs3XDqRskYbld2VsNcevzjp61C9LFyUj1ha4QDmc58gXT1MNW4= X-Received: by 2002:a05:6122:4891:b0:4ef:6618:fd5c with SMTP id 71dfb90a1353d-4f2f3edf456mr9673197e0c.3.1720296600148; Sat, 06 Jul 2024 13:10:00 -0700 (PDT) MIME-Version: 1.0 References: <11d5753c-578e-41a2-af17-6de956f03058@aklaver.com> In-Reply-To: <11d5753c-578e-41a2-af17-6de956f03058@aklaver.com> From: sud Date: Sun, 7 Jul 2024 01:39:48 +0530 Message-ID: Subject: Re: Load a csv or a avro? To: Adrian Klaver Cc: pgsql-general Content-Type: multipart/alternative; boundary="000000000000c2a018061c99c314" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000c2a018061c99c314 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jul 5, 2024 at 8:24=E2=80=AFPM Adrian Klaver wrote: > On 7/5/24 02:08, sud wrote: > > Hello all, > > > > Its postgres database. We have option of getting files in csv and/or in > > avro format messages from another system to load it into our postgres > > database. The volume will be 300million messages per day across many > > files in batches. > > Are dumping the entire contents of each file or are you pulling a > portion of the data out? > > > Yes, all the fields in the file have to be loaded to the columns in the tables in postgres. But how will that matter here for deciding if we should ask the data in .csv or .avro format from the outside system to load into the postgres database in row and column format? Again my understanding was that irrespective of anything , the .csv file load will always faster as because the data is already stored in row and column format as compared to the .avro file in which the parser has to perform additional job to make it row and column format or map it to the columns of the database table. Is my understanding correct here? --000000000000c2a018061c99c314 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, Jul 5, 2024 at 8:24=E2=80=AFPM Ad= rian Klaver <adrian.klaver@= aklaver.com> wrote:
On 7/5/24 02:08, sud wrote:
> Hello all,
>
> Its postgres database. We have option of getting files in csv and/or i= n
> avro format messages from another system to load it into our postgres =
> database. The volume will be 300million messages per day across many <= br> > files in batches.

Are dumping the entire contents of each file or are you pulling a
portion of the data out?



Yes, all the fields in the file have t= o be loaded to the columns in the tables in postgres. But how will that mat= ter here for deciding if we should ask the=C2=A0data in .csv or .avro forma= t from the outside system to load into the postgres database in=C2=A0row an= d column format? Again my understanding=C2=A0was that irrespective=C2=A0of = anything , the .csv file load will always faster as because the data is alr= eady=C2=A0stored in row and column format as compared=C2=A0to the .avro fil= e in which the parser has to perform additional=C2=A0job to make it row and= column format or map it to the columns of the database table. Is my unders= tanding=C2=A0correct here?
--000000000000c2a018061c99c314--