MIME-Version: 1.0
From: yudhi s <learnerdatabase99@gmail.com>
Date: Wed, 7 Aug 2024 02:37:24 +0530
Message-ID: <CAEzWdqfqSt6J-ja--fj5FvO2rG2wvETzJv4oYoPREJ3DbTijBA@mail.gmail.com>
Subject: Standard of data storage and transformation
To: pgsql-general <pgsql-general@lists.postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000f89308061f0a2e9d"
Archived-At: <https://www.postgresql.org/message-id/CAEzWdqfqSt6J-ja--fj5FvO2rG2wvETzJv4oYoPREJ3DbTijBA%40mail.gmail.com>
Precedence: bulk

--000000000000f89308061f0a2e9d
Content-Type: text/plain; charset="UTF-8"

Hi All,
We are having a use case in which we are having transaction data for
multiple customers in one of the Postgres databases(version 15.4) and we
are consuming it from multiple sources(batch file processing, kafka event
processing etc). It's currently stored in normalized form postgres
with constraints, indexes, partitions defined. This postgres database is
holding the transaction data for around a month or so. There are use cases
of running online transaction search reports which will be mostly real time
reporting and also some daily transaction batch reports based on customers
and also month end reports for customers. In target state it will hold
Approx. ~400 million transactions/day which can be billions of rows across
multiple related parent/child tables.

There is another requirement to send these customer transaction data to an
olap system which is in a snowflake database and there it will be persisted
for many years. The lag between the data in postgres/oltp and in snowflake
will be ~1hr. And any reporting api can query postgres for <1 month worth
of transaction data and if it needs to scan for >1month worth of
transaction data, it will point to the snowflake database.

Now the question which we are wondering is , should we send the data as is
in normalized table form to snowflake and then there we transform/flatten
the data to support the reporting use case or should we first flatten or
transform the data in postgres itself and make it as another structure( for
example creating materialized views on top of base table) and only then
move that data to the snowflake? What is the appropriate standard and
downside if we do anything different.

Regards
Yudhi

--000000000000f89308061f0a2e9d
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr">Hi All,<div>We are having a use case in which we are havin=
g transaction data for multiple customers in one of the Postgres databases(=
version 15.4) and we are consuming it from multiple sources(batch file proc=
essing, kafka event processing etc). It&#39;s currently stored in normalize=
d form=C2=A0postgres with=C2=A0constraints,=C2=A0indexes, partitions define=
d. This postgres database is holding the transaction data for around a mont=
h or so. There are use cases of running online transaction search reports w=
hich will be mostly real time reporting and also some daily transaction bat=
ch reports based on customers and also month end reports for customers. In =
target state it will hold Approx. ~400 million transactions/day which can b=
e billions of rows across multiple related parent/child tables.<br></div><d=
iv><br>There is another requirement to send these customer transaction data=
 to an olap system which is in a snowflake database and there it will be pe=
rsisted for many years. The lag between the data in postgres/oltp and in sn=
owflake will be ~1hr. And any reporting api can query postgres for &lt;1 mo=
nth worth of transaction data and if it needs to scan for &gt;1month worth =
of transaction data, it will point to the snowflake database.<br><br>Now th=
e question which we are wondering is , should we send the data as is in nor=
malized table form to snowflake and then there we transform/flatten the dat=
a to support the reporting use case or should we first flatten or transform=
 the data in postgres itself and make it as another structure( for example =
creating materialized views on top of base table) and only then move that d=
ata to the snowflake? What is the appropriate standard and downside if we d=
o anything different.<br><div><br></div><div>Regards</div><div>Yudhi</div><=
/div></div>

--000000000000f89308061f0a2e9d--