Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPevs-005gHj-0X for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 09:08:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sPevq-007zld-1r for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 09:08:42 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPevp-007zlK-NO for pgsql-general@lists.postgresql.org; Fri, 05 Jul 2024 09:08:42 +0000 Received: from mail-ua1-x931.google.com ([2607:f8b0:4864:20::931]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sPevo-000Wsk-5H for pgsql-general@lists.postgresql.org; Fri, 05 Jul 2024 09:08:41 +0000 Received: by mail-ua1-x931.google.com with SMTP id a1e0cc1a2514c-80fc48bb31cso472189241.0 for ; Fri, 05 Jul 2024 02:08:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720170518; x=1720775318; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=wfW44vtVmdxoNRlWZQof5YIogO8TD6QMkb6GHElicWc=; b=LHdSNwVQ/3Zz4c9KZNUIhucusZHohBe67tmxS5y3ij454HGlfs6FvpLmvOVLgCDF/O a1uFordudOgNOp30rk5ipCSGuL5zY/IZyUAH5tvpVpkgoSKB5a58CVRsAnnBK440DDij DuF2sfhcsXhMHSljPnK9t3qol/MI0LSG4UKUUHc4abLENivSGWKTTK7BSajB8iQS34vG PoYkLMZx8K98Bch3MZSkFqODWx9VdfEmApKvUDFhaTLnr7iH86WhAYmVlQ5kLHvRGr0r DTdkCwITLWetYpzMP45GLDEBI/X9W07vFbfC/HMOzL/BncjCyjHoONvBPXQIQxoeoawB HTNA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720170518; x=1720775318; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=wfW44vtVmdxoNRlWZQof5YIogO8TD6QMkb6GHElicWc=; b=C5pVyf61LqbQwefs1CmdU2fZLsBJmLe0WHlO8jtYc7tLaION57sUdLB7yJvOE5FgIH xcgtwJukd7DnN8VWpQKpTfroslohH7ubshbjzDfrmnETrk2JM39nx4TE6ke08W4YehXg Bi65yS660QTOwEXxKZO8B7E6/m6lcFzEtBRFk2Uylfj5jb2AuUVAryFq/SdyiOpGrPFm aHizEF0GJTM3zb3O/8Z0Nkq6mTULi9HRifXuppEGiHbQCDYp3W6KbkLLFfO9SOPgaUqB Yv3TusXQhSar4ctHnhbtxQS69ycvgL7O4cqJcN4Ou4teWEVM3+txotwYpGAuhmzsUT9K vakg== X-Gm-Message-State: AOJu0YyPViHkt1rY6DF1T2s1eIRqj25unF0kkhNc+VcS5qg38EHlgKHb Ggh6gPWNvovCPHup5WQ6fM69YVJmpwWOvgp4zRNeFRgMSmRu5dlxeoqLkJ7kLS3dgs+scqI5r5z CRYoSmn1CxPHDslF7pODa5iQLStcfRg== X-Google-Smtp-Source: AGHT+IFPjmdvpSr8HytfKl4LZzJmVI+RMCdjJ9bqeS0UTdHCHxZNtd05+1IoB23L0Zqj8oiwiiVXMoGzaCg5FlIzN6M= X-Received: by 2002:a67:f749:0:b0:48f:ea45:4887 with SMTP id ada2fe7eead31-48fee66085bmr4282082137.11.1720170516414; Fri, 05 Jul 2024 02:08:36 -0700 (PDT) MIME-Version: 1.0 From: sud Date: Fri, 5 Jul 2024 14:38:25 +0530 Message-ID: Subject: Load a csv or a avro? To: pgsql-general Content-Type: multipart/alternative; boundary="000000000000958ac0061c7c6829" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000958ac0061c7c6829 Content-Type: text/plain; charset="UTF-8" Hello all, Its postgres database. We have option of getting files in csv and/or in avro format messages from another system to load it into our postgres database. The volume will be 300million messages per day across many files in batches. My question was, which format should we chose in regards to faster data loading performance ? and if any other aspects to it also should be considered apart from just loading performance? --000000000000958ac0061c7c6829 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

Hello all,=C2=A0

Its postgres database. We have option of getting files in cs= v and/or in avro format messages from another system to load it into our po= stgres database. The volume will be 300million messages per day across many= files in batches.

My question was, which format = should we chose in regards to faster data loading performance ? and if any = other aspects to it also should be considered apart from just loading perfo= rmance?

--000000000000958ac0061c7c6829--