Re: Design strategy for table with many attributes

public inbox for [email protected]  
help / color / mirror / Atom feed

Re: Design strategy for table with many attributes
4+ messages / 3 participants
[nested] [flat]

* Re: Design strategy for table with many attributes
@ 2024-07-05 05:07 Lok P <[email protected]>
  2024-07-05 05:27 ` Re: Design strategy for table with many attributes David Rowley <[email protected]>
  2024-07-05 05:36 ` Re: Design strategy for table with many attributes David G. Johnston <[email protected]>
  2024-07-05 06:28 ` Re: Design strategy for table with many attributes Lok P <[email protected]>
  0 siblings, 3 replies; 4+ messages in thread

From: Lok P @ 2024-07-05 05:07 UTC (permalink / raw)
  To: David G. Johnston <[email protected]>; +Cc: pgsql-general <[email protected]>

On Fri, Jul 5, 2024 at 1:26 AM David G. Johnston <[email protected]>
wrote:

> On Thu, Jul 4, 2024 at 12:38 PM Lok P <[email protected]> wrote:
>
>>
>> Should we break the single transaction into multiple tables like one main
>> table and other addenda tables with the same primary key to join and fetch
>> the results wherever necessary?
>>
>>
> I would say yes.  Find a way to logically group sets of columns together
> and place those groups into separate tables.  I'd also be looking for cases
> where multiple columns really should be multiple rows.  This is not
> uncommon.
>
> David J.
>
>
Thank you David.

As you said, to logically break this into multiple tables so i believe it
means it should be such that there will be no need to query multiple tables
and join them most of the time for fetching the results. It should just
fetch the results from one table at any point in time.

But do you also suggest keeping those table pieces related to each other
through the same primary key ? Won't there be a problem when we load the
data like say for example , in normal scenario the data load will be to one
table but when we break it to multiple tables it will happen to all the
individual pieces, won't that cause additional burden to the data load?

Also I understand the technical limitation of the max number of columns per
table is ~1600. But should you advise to restrict/stop us to some low
number long before reaching that limit , such that we will not face any
anomalies when we grow in future. And if we should maintain any specific
order in the columns from start to end column in the specific table?

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Design strategy for table with many attributes
  2024-07-05 05:07 Re: Design strategy for table with many attributes Lok P <[email protected]>
@ 2024-07-05 05:27 ` David Rowley <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: David Rowley @ 2024-07-05 05:27 UTC (permalink / raw)
  To: Lok P <[email protected]>; +Cc: David G. Johnston <[email protected]>; pgsql-general <[email protected]>

On Fri, 5 Jul 2024 at 17:07, Lok P <[email protected]> wrote:
> Also I understand the technical limitation of the max number of columns per table is ~1600. But should you advise to restrict/stop us to some low number long before reaching that limit , such that we will not face any anomalies when we grow in future. And if we should maintain any specific order in the columns from start to end column in the specific table?

Something else you may wish to consider, depending on the column types
of your 900+ columns is the possibility that some INSERTs may fail due
to row length while others with shorter variable length values may be
ok.

Here's a quick example with psql:

select 'create table a (' || string_agg('a'||x||' text not null
default $$$$',',') || ')' from generate_series(1,1000)x;
\gexec
insert into a default values;
INSERT 0 1

again but with a larger DEFAULT to make the tuple larger.

select 'create table b (' || string_agg('a'||x||' text not null
default $$hello world$$',',') || ')' from generate_series(1,1000)x;
\gexec
insert into b default values;
ERROR:  row is too big: size 12024, maximum size 8160

There is a paragraph at the bottom of [1] with some warnings about
things relating to this.

The tuple length would be fixed for fixed-length types defined as NOT
NULL. So, if you have that, there should be no such surprises.

David

[1] https://www.postgresql.org/docs/current/limits.html

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Design strategy for table with many attributes
  2024-07-05 05:07 Re: Design strategy for table with many attributes Lok P <[email protected]>
@ 2024-07-05 05:36 ` David G. Johnston <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: David G. Johnston @ 2024-07-05 05:36 UTC (permalink / raw)
  To: Lok P <[email protected]>; +Cc: pgsql-general <[email protected]>

On Thursday, July 4, 2024, Lok P <[email protected]> wrote:

>
> But do you also suggest keeping those table pieces related to each other
> through the same primary key ?
>
>
Yes, everyone row produced from the input data “row” should get the same ID
associated with it - either as an entire PK or a component of a
multi-column PK/unique index.

>
>
>  Won't there be a problem when we load the data like say for example , in
> normal scenario the data load will be to one table but when we break it to
> multiple tables it will happen to all the individual pieces, won't that
> cause additional burden to the data load?
>

Yes, doing this requires additional CPU time to perform the work.  I’d say
IO is hopefully a wash.

>
> Also I understand the technical limitation of the max number of
> columns per table is ~1600. But should you advise to restrict/stop us to
> some low number long before reaching that limit , such that we will not
> face any anomalies when we grow in future.
>

In a row-oriented system wider is worser.  I start evaluation of table
design with that in mind at the fourth column (including the surrogate key
that is usually present, and the natural key - ignoring auditing columns.)

>
>  And if we should maintain any specific order in the columns from start to
> end column in the specific table?
>

There is material out there on micro-optimizing column ordering to match
with alignment boundaries.  I’d the benefit is meaningful but there is a
non-trivial cost to actually setup the testing to verify that what you’ve
figured out is working.  Never actually done it myself.  Though it actually
seems like something someone could/may have written an algorithm for
(though I do not recall ever seeing mention of one.)

David J.

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

* Re: Design strategy for table with many attributes
  2024-07-05 05:07 Re: Design strategy for table with many attributes Lok P <[email protected]>
@ 2024-07-05 06:28 ` Lok P <[email protected]>
  2 siblings, 0 replies; 4+ messages in thread

From: Lok P @ 2024-07-05 06:28 UTC (permalink / raw)
  To: Guyren Howe <[email protected]>; +Cc: pgsql-general <[email protected]>

On Fri, Jul 5, 2024 at 10:45 AM Guyren Howe <[email protected]> wrote:

> On Jul 4, 2024, at 22:07, Lok P <[email protected]> wrote:
>
> If you stick to the principle of grouping columns in a table when you use
> those columns together, you should be good.
>
> Note that you might want to split up the “parent” table if that naturally
> groups its columns together for certain uses. In that case, you could have
> the same pk on all the 1:1 tables you then have. In that case, the pk for
> each of those tables is also the fk.
>

Thank you.

When you said below,

*"Note that you might want to split up the “parent” table if that naturally
groups its columns together for certain uses. In that case, you could have
the same pk on all the 1:1 tables you then have. In that case, the pk for
each of those tables is also the fk."*
Do you mean having a real FK created through DDL and maintaining it or just
assume it and no need to define it for all the pieces/tables. Only keep the
same PK across all the pieces and as we know these are related to the same
transaction and are logically related?

^ permalink  raw  reply  [nested|flat] 4+ messages in thread

end of thread, other threads:[~2024-07-05 06:28 UTC | newest]

Thread overview: 4+ messages (download: mbox mbox.gz follow: Atom feed)
-- links below jump to the message on this page --
2024-07-05 05:07 Re: Design strategy for table with many attributes Lok P <[email protected]>
2024-07-05 05:27 ` David Rowley <[email protected]>
2024-07-05 05:36 ` David G. Johnston <[email protected]>
2024-07-05 06:28 ` Lok P <[email protected]>

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox