Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPibB-005xfF-VI for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 13:03:37 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sPib9-009hTZ-Tw for pgsql-general@arkaria.postgresql.org; Fri, 05 Jul 2024 13:03:36 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sPib9-009hTQ-IL for pgsql-general@lists.postgresql.org; Fri, 05 Jul 2024 13:03:36 +0000 Received: from mail-oa1-x32.google.com ([2001:4860:4864:20::32]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sPib4-000YW5-9z for pgsql-general@postgresql.org; Fri, 05 Jul 2024 13:03:35 +0000 Received: by mail-oa1-x32.google.com with SMTP id 586e51a60fabf-25e3d8d9f70so489011fac.2 for ; Fri, 05 Jul 2024 06:03:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1720184607; x=1720789407; darn=postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=SNXBwjmYtnSu+Zq/VlW6JfYiVJPFhzAzOyZe9olwdxk=; b=CzmTO5gbmWc2ppme1AFOZR4gOVfR79pIj0kEWpLpUPqY+oejPnMDqNjcAE2ghydFZa TRlzuBmhrhrq1TGwrS8kpTSLaijZ387xkyN05FJk3dW0LDXf4wXZxYKZVgTPKDD/Vn5U e1rEK7mZkVey/VBkcvRWH+Cw4qZorH43CJJ/w6/s4S0Mixf91qZ6gctaDOLc6B/MkRWx TICnzeyOC0hS6NwA9fmKLeoi5sh4EKjfX/VIBXeXeGv6fYDEofnEhJnwUbUJrZvsvdIx JZlbkKEbDUv1hbaYr0PrO5tlomlhSZVVasjR2Z9qICLyUYpZmKq5rzS59/63vlezfIfx RIKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1720184607; x=1720789407; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=SNXBwjmYtnSu+Zq/VlW6JfYiVJPFhzAzOyZe9olwdxk=; b=JwWpG/nb45Gzjuh2w6EclSaPjWk1cmivErQ3TE0+fXqxBq0qbSYB8w6UK1DtoUYFOh gN+iZpuD/5ZPNF7e8ipEVrus0JzVnlPCGVA6cJS6GRKV+j9jBCNo37Vumo6IDx8uPOdW ks/V8KwfaMcZiJjCgQr2ooCljTv1fs6/p5mU/cfCbHLrGrrWqdXazYOaAf5WNM5bVLdM CkzW5aaLwiT8XMmAK4uxmxF6HcOgT+kT/zDyD9QXqhvA7GWY1nd/u+ZMg89FAVLKWPcI GlIPxasRLb2SMDxIF0WJbdQq990+iZ5XtPc/24KL08CfhU6H3MdWYsvoO0l7wmtmkwEW FcvQ== X-Gm-Message-State: AOJu0YxO/xMRKitalt6DFfS5egOKIsO5JQLmR08ZeDeTyHvr6judkUKB sbODVgqYA8ye/IlteeEBQ9VxuJlBJ4Y6PC82z3n6CaqJearLiAZMeu5/sCdiJi8zR3881CW0yU7 Bh79I/SeVDNHW42rNz3EnBOuhpctmIuTV X-Google-Smtp-Source: AGHT+IH/3xz/z8yhlFRw6rd5EBFEJCClksscBQ100mRMXwjwcbuOqXoIqM8uHjntnrfiJ1knyJI1Dha9sz5SAQdMa/U= X-Received: by 2002:a05:6871:5d0:b0:25e:1551:a2e5 with SMTP id 586e51a60fabf-25e2b8c6054mr3408419fac.7.1720184607370; Fri, 05 Jul 2024 06:03:27 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ron Johnson Date: Fri, 5 Jul 2024 09:03:15 -0400 Message-ID: Subject: Re: Load a csv or a avro? To: pgsql-general Content-Type: multipart/alternative; boundary="00000000000078773f061c7fb002" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --00000000000078773f061c7fb002 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Jul 5, 2024 at 5:08=E2=80=AFAM sud wrote: > Hello all, > > Its postgres database. We have option of getting files in csv and/or in > avro format messages from another system to load it into our postgres > database. The volume will be 300million messages per day across many file= s > in batches. > > My question was, which format should we chose in regards to faster data > loading performance ? > What application will be loading the data? If psql, then go with CSV; COPY is *really* efficient. If the PG tables are already mapped to the avro format, then maybe avro will be faster. > and if any other aspects to it also should be considered apart from just > loading performance? > If all the data comes in at night, drop as many indices as possible before loading. Load each file in as few DB connections as possible: the most efficient binary format won't do you any good if you open and close a connection for each and every row. --00000000000078773f061c7fb002 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
On Fri, Jul 5, 2024 at 5:08=E2=80=AFAM su= d <suds1434@gmail.com> wrot= e:

Hello all,=C2=A0

Its postgres database. We have option of getting files in cs= v and/or in avro format messages from another system to load it into our po= stgres database. The volume will be 300million messages per day across many= files in batches.

My question was, which format = should we chose in regards to faster data loading performance ?

=

What = application will be loading the data?=C2=A0 =C2=A0If psql, then go with CSV= ; COPY is really=C2=A0efficient.
=C2=A0
If t= he PG tables are already mapped to the avro format, then maybe avro will be= faster.

and if any other aspect= s to it also should be considered apart from just loading performance?


If all the data comes in at ni= ght, drop as many indices as=C2=A0possible before loading.

Load each file in as few DB connections as possible: the most effi= cient binary format won't do you any good if you open and close a conne= ction for each and every row.

--00000000000078773f061c7fb002--