Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbRPE-00Gbng-UR for pgsql-general@arkaria.postgresql.org; Tue, 06 Aug 2024 21:07:44 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1sbRPD-002lZo-60 for pgsql-general@arkaria.postgresql.org; Tue, 06 Aug 2024 21:07:43 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1sbRPC-002lZe-OQ for pgsql-general@lists.postgresql.org; Tue, 06 Aug 2024 21:07:42 +0000 Received: from mail-qv1-xf32.google.com ([2607:f8b0:4864:20::f32]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1sbRPA-003WUS-EL for pgsql-general@lists.postgresql.org; Tue, 06 Aug 2024 21:07:42 +0000 Received: by mail-qv1-xf32.google.com with SMTP id 6a1803df08f44-6bb5a4668faso5794036d6.2 for ; Tue, 06 Aug 2024 14:07:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722978458; x=1723583258; darn=lists.postgresql.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=HYd4G1ZqIXbxvth/YLDb9eD1voK5tk7C0q6s8Xh55lE=; b=PBS74JGS0vuXv7M7roe16O0m9tUCMzvfKXCtTdYbClGo8C8626cgUcsVo5IiYVbC2/ UJkLanW/7ncFY66EhYP9M0vnPZVfRBO5MXoqMl6cHCQnE+o6DeI/3Htq+kcO4Q8NrqT3 yt8kGsmNTJrUE6OpzpX+ouNyWFpavRAYGJdIZDy6XkLqm+inhlhEpv2WRsg2gqGFoI/c irb+8kNNvOouGzfpKaDyCduNQFDY27w1+U95c5ggv8cihYeOf7ujDIe/1eLXdtX16uN2 +ATwulLSBCsscqCmxaLuLV0jUllP3PU1bs019WJdj30bunvzg1Hy1qch9cjJpeAAskBP JaOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722978458; x=1723583258; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=HYd4G1ZqIXbxvth/YLDb9eD1voK5tk7C0q6s8Xh55lE=; b=PuGH1dIXbgw2e9jJwEesTj/+8rWN6dHGzFbo7D3azSdNfc/qUmDR72T1GD0+OB6q3/ MFE0K3brwCw+8Uoa1giNYjDgLia7xx4kLM10hV4sDsFAAzmxcFP//YvEZZgohOCxghGi COd5Fp4mEjOUWiTFXSK2OwDOcaTv+W4HxhZa89PocyqsXnivojjgSufI703SoXV1j2JF 9kC1sZ6qQ2dU29F6tEi0zXPif2DkWHLjP8ZzPQwuvX9C5WB0cXaHHMWDOTDaapLClEYp zteAyE7mNq07kBfOCSxDz16blwAYD0kr2E4FhFR4NYR+9ujXfLAod7r49ufEiydC7OXy KTqA== X-Gm-Message-State: AOJu0Yzc5nZvk83ZeayIkOmAFb6oLzt1Hx2vWOb/Efzt9RY9qQQLoiRZ MjIsv9mI10T+1EabMi4uldxc3Oz2DnlB2Dq6iYctV86n/W5R6b1crxQpGH2Zaa1fnlPhMBX8Sk0 Kqr3j/1xG09PvIM3EamEzWu5BGauAxgyZ X-Google-Smtp-Source: AGHT+IHdA32r805j70a1ln5duxy8YEXV/ndyE5cyABMqP01yMYsRUdZHxeAaFDyZhf3VNZcOjq4ayAOBxDKE2Pijz3Q= X-Received: by 2002:a05:6214:487:b0:6bb:9c14:8bbb with SMTP id 6a1803df08f44-6bb9c148d7cmr190677036d6.48.1722978458437; Tue, 06 Aug 2024 14:07:38 -0700 (PDT) MIME-Version: 1.0 From: yudhi s Date: Wed, 7 Aug 2024 02:37:24 +0530 Message-ID: Subject: Standard of data storage and transformation To: pgsql-general Content-Type: multipart/alternative; boundary="000000000000f89308061f0a2e9d" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --000000000000f89308061f0a2e9d Content-Type: text/plain; charset="UTF-8" Hi All, We are having a use case in which we are having transaction data for multiple customers in one of the Postgres databases(version 15.4) and we are consuming it from multiple sources(batch file processing, kafka event processing etc). It's currently stored in normalized form postgres with constraints, indexes, partitions defined. This postgres database is holding the transaction data for around a month or so. There are use cases of running online transaction search reports which will be mostly real time reporting and also some daily transaction batch reports based on customers and also month end reports for customers. In target state it will hold Approx. ~400 million transactions/day which can be billions of rows across multiple related parent/child tables. There is another requirement to send these customer transaction data to an olap system which is in a snowflake database and there it will be persisted for many years. The lag between the data in postgres/oltp and in snowflake will be ~1hr. And any reporting api can query postgres for <1 month worth of transaction data and if it needs to scan for >1month worth of transaction data, it will point to the snowflake database. Now the question which we are wondering is , should we send the data as is in normalized table form to snowflake and then there we transform/flatten the data to support the reporting use case or should we first flatten or transform the data in postgres itself and make it as another structure( for example creating materialized views on top of base table) and only then move that data to the snowflake? What is the appropriate standard and downside if we do anything different. Regards Yudhi --000000000000f89308061f0a2e9d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi All,
We are having a use case in which we are havin= g transaction data for multiple customers in one of the Postgres databases(= version 15.4) and we are consuming it from multiple sources(batch file proc= essing, kafka event processing etc). It's currently stored in normalize= d form=C2=A0postgres with=C2=A0constraints,=C2=A0indexes, partitions define= d. This postgres database is holding the transaction data for around a mont= h or so. There are use cases of running online transaction search reports w= hich will be mostly real time reporting and also some daily transaction bat= ch reports based on customers and also month end reports for customers. In = target state it will hold Approx. ~400 million transactions/day which can b= e billions of rows across multiple related parent/child tables.

There is another requirement to send these customer transaction data= to an olap system which is in a snowflake database and there it will be pe= rsisted for many years. The lag between the data in postgres/oltp and in sn= owflake will be ~1hr. And any reporting api can query postgres for <1 mo= nth worth of transaction data and if it needs to scan for >1month worth = of transaction data, it will point to the snowflake database.

Now th= e question which we are wondering is , should we send the data as is in nor= malized table form to snowflake and then there we transform/flatten the dat= a to support the reporting use case or should we first flatten or transform= the data in postgres itself and make it as another structure( for example = creating materialized views on top of base table) and only then move that d= ata to the snowflake? What is the appropriate standard and downside if we d= o anything different.

Regards
Yudhi
<= /div>
--000000000000f89308061f0a2e9d--