Design strategy for table with many attributes

public inbox for [email protected]  
help / color / mirror / Atom feed

Design strategy for table with many attributes
5+ messages / 5 participants
[nested] [flat]

* Design strategy for table with many attributes
@ 2024-07-04 19:37 Lok P <[email protected]>
  2024-07-04 19:56 ` Re: Design strategy for table with many attributes David G. Johnston <[email protected]>
  2024-07-04 20:35 ` Re: Design strategy for table with many attributes Kent Dorfman <[email protected]>
  2024-07-04 20:50 ` Re: Design strategy for table with many attributes Ron Johnson <[email protected]>
  2024-07-04 23:29 ` Re: Design strategy for table with many attributes Guyren Howe <[email protected]>
  0 siblings, 4 replies; 5+ messages in thread

From: Lok P @ 2024-07-04 19:37 UTC (permalink / raw)
  To: pgsql-general <[email protected]>

Hello,
In one of the applications we are getting transactions in messages/events
format and also in files and then they are getting parsed and stored into
the relational database. The number of attributes/columns each transaction
has is ~900+. Logically they are part of one single transaction and should
be stored in one table as one single row. There will be ~500million such
transactions each day coming into the system. And there will be approx ~10K
peak write TPS and 5K read TPS in target state. This system has a postgres
database as a "source of truth" or OLTP store. And then data moves to
snowflakes for the olap store.

Initially when the system was designed the number of attributes per
transaction was <100 but slowly the business wants to keep/persist other
attributes too in the current system and the number of columns keep growing.

However, as worked with some database systems , we get few suggestions from
DBA's to not have many columns in a single table. For example in oracle
they say not to go beyond ~255 columns as then row chaining and row
migration type of things are going to hunt us. Also we are afraid
concurrent DMLS on the table may cause this as a contention point. So I
wanted to understand , in such a situation what would be the best design
approach we should use irrespective of databases? Or say, what is the
maximum number of columns per table we should restrict? Should we break the
single transaction into multiple tables like one main table and other
addenda tables with the same primary key to join and fetch the results
wherever necessary?

Regards
Lok

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Design strategy for table with many attributes
  2024-07-04 19:37 Design strategy for table with many attributes Lok P <[email protected]>
@ 2024-07-04 19:56 ` David G. Johnston <[email protected]>
  3 siblings, 0 replies; 5+ messages in thread

From: David G. Johnston @ 2024-07-04 19:56 UTC (permalink / raw)
  To: Lok P <[email protected]>; +Cc: pgsql-general <[email protected]>

On Thu, Jul 4, 2024 at 12:38 PM Lok P <[email protected]> wrote:

>
> Should we break the single transaction into multiple tables like one main
> table and other addenda tables with the same primary key to join and fetch
> the results wherever necessary?
>
>
I would say yes.  Find a way to logically group sets of columns together
and place those groups into separate tables.  I'd also be looking for cases
where multiple columns really should be multiple rows.  This is not
uncommon.

David J.

David J.

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Design strategy for table with many attributes
  2024-07-04 19:37 Design strategy for table with many attributes Lok P <[email protected]>
@ 2024-07-04 20:35 ` Kent Dorfman <[email protected]>
  3 siblings, 0 replies; 5+ messages in thread

From: Kent Dorfman @ 2024-07-04 20:35 UTC (permalink / raw)
  To: Lok P <[email protected]>; pgsql-general <[email protected]>

On 7/4/24 15:37, Lok P wrote:
> Or say, what is the maximum number of columns per table we should 
> restrict? Should we break the single transaction into multiple tables 
> like one main table and other addenda tables with the same primary key 
> to join and fetch the results wherever necessary?

900 columns makes my head hurt badly...

The neophyte will design a table with a separate field for each type of 
phone number that may be encountered.  The experienced designer will 
move all phone numbers to its own table, where each entry/row contains a 
reference link, a "number_type" field, and a field with the actual 
number in it...Three fields in a table that is manageable and can be 
queried/modified without stressing the database server.

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Design strategy for table with many attributes
  2024-07-04 19:37 Design strategy for table with many attributes Lok P <[email protected]>
@ 2024-07-04 20:50 ` Ron Johnson <[email protected]>
  3 siblings, 0 replies; 5+ messages in thread

From: Ron Johnson @ 2024-07-04 20:50 UTC (permalink / raw)
  To: pgsql-generallists.postgresql.org <[email protected]>

On Thu, Jul 4, 2024 at 3:38 PM Lok P <[email protected]> wrote:

> Hello,
> In one of the applications we are getting transactions in messages/events
> format and also in files and then they are getting parsed and stored into
> the relational database. The number of attributes/columns each transaction
> has is ~900+. Logically they are part of one single transaction
>

Nothing out of the ordinary.


> and should be stored in one table as one single row.
>

Says who?


> There will be ~500million such transactions each day coming into the
> system. And there will be approx ~10K peak write TPS and 5K read TPS in
> target state. This system has a postgres database as a "source of truth" or
> OLTP store. And then data moves to snowflakes for the olap store.
>
> Initially when the system was designed the number of attributes per
> transaction was <100 but slowly the business wants to keep/persist other
> attributes too in the current system and the number of columns keep growing.
>
> However, as worked with some database systems , we get few suggestions
> from DBA's to not have many columns in a single table. For example in
> oracle they say not to go beyond ~255 columns as then row chaining and row
> migration type of things are going to hunt us. Also we are afraid
> concurrent DMLS on the table may cause this as a contention point. So I
> wanted to understand , in such a situation what would be the best design
> approach we should use irrespective of databases? Or say, what is the
> maximum number of columns per table we should restrict? Should we break the
> single transaction into multiple tables like one main table and other
> addenda tables with the same primary key to join and fetch the results
> wherever necessary?
>

You need database normalization.  It's a big topic.  Here's a good simple
explanation:
https://popsql.com/blog/normalization-in-sql


^ permalink  raw  reply  [nested|flat] 5+ messages in thread

* Re: Design strategy for table with many attributes
  2024-07-04 19:37 Design strategy for table with many attributes Lok P <[email protected]>
@ 2024-07-04 23:29 ` Guyren Howe <[email protected]>
  3 siblings, 0 replies; 5+ messages in thread

From: Guyren Howe @ 2024-07-04 23:29 UTC (permalink / raw)
  To: Lok P <[email protected]>; +Cc: [email protected] <[email protected]>

Ultimately, the way you should store the data depends on how you will use it. When you retrieve these values, what are the different ways you’ll be using them?

Normalised representations are more flexible, and the pragmatic, performance-based consideration is that all the values in a row are always effectively retrieved together. So if you reasonably often have a query that only accesses the creation date and transaction id, then it will be pretty slow if you are also always retrieving 500 other columns you don’t need.

So: you might often pull all the values *other* than the attributes (creation date, creator, job information, whatever) together. This argues that those values should be in one table, and the attributes in another.

Will you usually be using *all* of the attributes for a particular transaction together in the same operation? It might make sense to store them in eg an array in that case. But this significantly reduces the speed of accessing particular attributes separately.

It is likely that you will want to slice things by particular named attribute across many transactions. This argues for the more normalised form, as does the general principle of doing things in the way that is most general and flexible.

When considering how the data will be used, please consider not only the developers of your current application(s), but also eg data analysts, managers, future applications etc.

The less specific you can be about how you want to use the data, the more you should lean toward fully normalising.

Note also that you can store your data in a normalised and maximally flexible form, but also use triggers or batch jobs to gather various permutations of the data for specific purposes. If you really do, say, both have some major part of your project that uses all the attributes on a given transaction together, but you also have other uses, you may want to store both the normalised/attribute table and the “all the values together” version.

Even if you want to store “all the values together”, it may well be better to use an array, JSON or HStore, rather than having a super-wide table. JSON would eg let you enumerate all the column names (for example) and employ Postgres’s really nice JSON query features.

> On Jul 4, 2024, at 12:37, Lok P <[email protected]> wrote:
> 
> Hello,
> In one of the applications we are getting transactions in messages/events format and also in files and then they are getting parsed and stored into the relational database. The number of attributes/columns each transaction has is ~900+. Logically they are part of one single transaction and should be stored in one table as one single row. There will be ~500million such transactions each day coming into the system. And there will be approx ~10K peak write TPS and 5K read TPS in target state. This system has a postgres database as a "source of truth" or OLTP store. And then data moves to snowflakes for the olap store.
> 
> Initially when the system was designed the number of attributes per transaction was <100 but slowly the business wants to keep/persist other attributes too in the current system and the number of columns keep growing.
> 
> However, as worked with some database systems , we get few suggestions from DBA's to not have many columns in a single table. For example in oracle they say not to go beyond ~255 columns as then row chaining and row migration type of things are going to hunt us. Also we are afraid concurrent DMLS on the table may cause this as a contention point. So I wanted to understand , in such a situation what would be the best design approach we should use irrespective of databases? Or say, what is the maximum number of columns per table we should restrict? Should we break the single transaction into multiple tables like one main table and other addenda tables with the same primary key to join and fetch the results wherever necessary?
> 
> Regards
> Lok

^ permalink  raw  reply  [nested|flat] 5+ messages in thread

end of thread, other threads:[~2024-07-04 23:29 UTC | newest]

Thread overview: 5+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-07-04 19:37 Design strategy for table with many attributes Lok P <[email protected]>
2024-07-04 19:56 ` David G. Johnston <[email protected]>
2024-07-04 20:35 ` Kent Dorfman <[email protected]>
2024-07-04 20:50 ` Ron Johnson <[email protected]>
2024-07-04 23:29 ` Guyren Howe <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox