Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tPnGr-00FLjQ-Az for pgsql-general@arkaria.postgresql.org; Mon, 23 Dec 2024 18:35:14 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tPnGq-00EVVY-6S for pgsql-general@arkaria.postgresql.org; Mon, 23 Dec 2024 18:35:11 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tPnGp-00EVVP-Mm for pgsql-general@lists.postgresql.org; Mon, 23 Dec 2024 18:35:11 +0000 Received: from ms11p00im-qufo17291601.me.com ([17.58.38.45]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tPnGl-001GOC-NU for pgsql-general@postgresql.org; Mon, 23 Dec 2024 18:35:10 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=mac.com; s=1a1hai; t=1734978905; bh=bCaQeP9aVTzAznQ1EHL7c51vPqOV4wOBIhJv8b+KSdw=; h=From:Message-Id:Content-Type:Mime-Version:Subject:Date:To: x-icloud-hme; b=GlqRmiO/kaLDCNSkGifGIJpInrU6P8R0IiThxqWgrWsynmUG+IMMKLz9MgLIG7v6Q gSJ+FWa3OY61UgFTpZbSryW3rv1auPfF1N2KkTYqdRnWUePp9KkeTyeYkSSSpPKo9+ mOqMvaVFagSS+hH3XFdLurJ+X2J7H16gzER7uG9a3XmUdSFlWqRIUySVtqpEiM8avQ 3QQiKzQ6l6f4GOQD7dLgC2/PAvV/kRXqCGo5Gojhcjw3HgsBBFDXKwsJAO3qOGm5yA 83RM/7Lq+lAsNPWSP8ihYKfs7IO/LnZA/5oBelXiFpnsjWhicOIjzLRhkxyeCax975 Gc+oq+yLyiwhQ== Received: from smtpclient.apple (ms11p00im-dlb-asmtpmailmevip.me.com [17.57.154.19]) by ms11p00im-qufo17291601.me.com (Postfix) with ESMTPSA id C776E3A00C7; Mon, 23 Dec 2024 18:35:03 +0000 (UTC) From: Arnold Morein Message-Id: Content-Type: multipart/alternative; boundary="Apple-Mail=_E9BEF892-949C-458B-8756-73FFCC98896B" Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3826.200.121\)) Subject: Re: Need help in database design Date: Mon, 23 Dec 2024 12:34:52 -0600 In-Reply-To: Cc: Ron Johnson , pgsql-general To: Divyansh Gupta JNsThMAudy References: <06e1f1ee-74b2-43a2-9a63-da20ae455ae2@aklaver.com> X-Mailer: Apple Mail (2.3826.200.121) X-Proofpoint-GUID: Cm9U045CIIipNOn3mMl3EUJh_6Q7jfIm X-Proofpoint-ORIG-GUID: Cm9U045CIIipNOn3mMl3EUJh_6Q7jfIm X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.272,Aquarius:18.0.1057,Hydra:6.0.680,FMLib:17.12.68.34 definitions=2024-12-23_08,2024-12-23_01,2024-11-22_01 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 phishscore=0 mlxlogscore=999 adultscore=0 clxscore=1011 bulkscore=0 spamscore=0 mlxscore=0 suspectscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.19.0-2308100000 definitions=main-2412230165 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --Apple-Mail=_E9BEF892-949C-458B-8756-73FFCC98896B Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 I would like to make a suggestion, if I may. Granted, I do not = understand the underlying task at hand, but: A table with multiple columns of the same type smacks of designs that = harken back to the days of mainframes. (STOP THAT!) The data described = is a non-normalized array of integers that is meaningless outside of = code. Table structures should be at least a LITTLE self-descriptive. It is also not flexible (what if you suddenly need t51? how long would = that table space adjustment take in production?) and space is wasted if = not all 50 columns are populated. Use a design that is basically a storage area for name/value pairs: create table dbo.googledocs_tbl ( id long identity primary key, =E2=80=94 easy way to access a single = record owner_id integer/long not null, =E2=80=94 fk to owning parent record = in other table such as user owner_type char(2), =E2=80=94 optional field, identifies the owing = table, makes this table even more generic property_name varchar(n) not null, =E2=80=94 required unique name = for property, not an array reference (t1, t4, t50) =E2=80=94 the names are controlled by the developer but should be = human interpretable which can then be used in queries property_value int4 not null =E2=80=94 the important value in = question ); The fields owner_id, owner_type, property_name become a tertiary key = that can never be changed, are unique and easily accessible via index = lookup. Add a timestamp if need be You could then partition the record by owner_type or owner_id or = whatever else comes to mind. Then you just have to figure out the best way to index this monster for = optimized queries. > On Dec 23, 2024, at 12:31=E2=80=AFPM, Divyansh Gupta JNsThMAudy = wrote: >=20 > Currently I haven't created those columns , I have created addons_json = column which is a JSONB column yet in a discussion weather I should = create or consider only one JSONB column. >=20 >=20 > On Tue, 24 Dec 2024, 12:00=E2=80=AFam Divyansh Gupta JNsThMAudy, = > wrote: >> Range partition can help when you applies filter for a specific range = but in my case I need to apply filter on userid always, however I have = date columns but there is less variation in timestamp which I have = that's why didn't go for range partition. >>=20 >>=20 >> On Mon, 23 Dec 2024, 11:57=E2=80=AFpm Ron Johnson, = > wrote: >>>=20 >>> 1. I bet you'd get better performance using RANGE partitioning. >>> 2. Twenty million rows per userid is a LOT. No subdivisions (like = date range)? >>>=20 >>> On Mon, Dec 23, 2024 at 1:23=E2=80=AFPM Divyansh Gupta JNsThMAudy = > wrote: >>>> Adrian, Please check this out; >>>>=20 >>>> PARTITION BY HASH (userid); >>>> CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF = dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); >>>> ... >>>> CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF = dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83); >>>>=20 >>>>=20 >>>>=20 >>>> On Mon, Dec 23, 2024 at 11:48=E2=80=AFPM Divyansh Gupta JNsThMAudy = > wrote: >>>>> Adrian, the partition is on userid using hash partition with 84 = partitions >>>>>=20 >>>>> Ron, there could be more than 20 Million records possible for a = single userid in that case if I create index on userid only not on other = column the query is taking more than 30 seconds to return the results. >>>>>=20 >>>>>=20 >>>>> On Mon, 23 Dec 2024, 11:40=E2=80=AFpm Ron Johnson, = > wrote: >>>>>> If your queries all reference userid, then you only need indices = on gdid and userid. >>>>>>=20 >>>>>> On Mon, Dec 23, 2024 at 12:49=E2=80=AFPM Divyansh Gupta = JNsThMAudy > wrote: >>>>>>> I have one confusion with this design if I opt to create 50 = columns I need to create 50 index which will work with userid index in = Bitmap on the other hand if I create a JSONB column I need to create a = single index ? >>>>>>>=20 >>>>>>>=20 >>>>>>> On Mon, 23 Dec 2024, 11:10=E2=80=AFpm Ron Johnson, = > wrote: >>>>>>>> Given what you just wrote, I'd stick with 50 separate t* = columns. Simplifies queries, simplifies updates, and eliminates JSONB = conversions. >>>>>>>>=20 >>>>>>>> On Mon, Dec 23, 2024 at 12:29=E2=80=AFPM Divyansh Gupta = JNsThMAudy > wrote: >>>>>>>>> Values can be updated based on customer actions >>>>>>>>>=20 >>>>>>>>> All rows won't have all 50 key value pairs always if I make = those keys into columns the rows might have null value on the other hand = if it is JSONB then the key value pair will not be there >>>>>>>>>=20 >>>>>>>>> Yes in UI customers can search for the key value pairs >>>>>>>>>=20 >>>>>>>>> During data population the key value pair will be empty array = in case of JSONB column or NULL in case of table columns, later when = customer performs some actions that time the key value pairs will = populate and update, based on what action customer performs. >>>>>>>>>=20 >>>>>>>>>=20 >>>>>>>>> On Mon, 23 Dec 2024, 10:51=E2=80=AFpm Divyansh Gupta = JNsThMAudy, > wrote: >>>>>>>>>> Let's make it more understandable, here is the table schema = with 50 columns in it=20 >>>>>>>>>>=20 >>>>>>>>>> CREATE TABLE dbo.googledocs_tbl ( >>>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 = MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT = NULL, >>>>>>>>>> userid int8 NOT NULL, >>>>>>>>>> t1 int4 NULL, >>>>>>>>>> t2 int4 NULL, >>>>>>>>>> t3 int4 NULL, >>>>>>>>>> t4 int4 NULL, >>>>>>>>>> t5 int4 NULL, >>>>>>>>>> t6 int4 NULL, >>>>>>>>>> t7 int4 NULL, >>>>>>>>>> t8 int4 NULL, >>>>>>>>>> t9 int4 NULL, >>>>>>>>>> t10 int4 NULL, >>>>>>>>>> t11 int4 NULL, >>>>>>>>>> t12 int4 NULL, >>>>>>>>>> t13 int4 NULL, >>>>>>>>>> t14 int4 NULL, >>>>>>>>>> t15 int4 NULL, >>>>>>>>>> t16 int4 NULL, >>>>>>>>>> t17 int4 NULL, >>>>>>>>>> t18 int4 NULL, >>>>>>>>>> t19 int4 NULL, >>>>>>>>>> t20 int4 NULL, >>>>>>>>>> t21 int4 NULL, >>>>>>>>>> t22 int4 NULL, >>>>>>>>>> t23 int4 NULL, >>>>>>>>>> t24 int4 NULL, >>>>>>>>>> t25 int4 NULL, >>>>>>>>>> t26 int4 NULL, >>>>>>>>>> t27 int4 NULL, >>>>>>>>>> t28 int4 NULL, >>>>>>>>>> t29 int4 NULL, >>>>>>>>>> t30 int4 NULL, >>>>>>>>>> t31 int4 NULL, >>>>>>>>>> t32 int4 NULL, >>>>>>>>>> t33 int4 NULL, >>>>>>>>>> t34 int4 NULL, >>>>>>>>>> t35 int4 NULL, >>>>>>>>>> t36 int4 NULL, >>>>>>>>>> t37 int4 NULL, >>>>>>>>>> t38 int4 NULL, >>>>>>>>>> t39 int4 NULL, >>>>>>>>>> t40 int4 NULL, >>>>>>>>>> t41 int4 NULL, >>>>>>>>>> t42 int4 NULL, >>>>>>>>>> t43 int4 NULL, >>>>>>>>>> t44 int4 NULL, >>>>>>>>>> t45 int4 NULL, >>>>>>>>>> t46 int4 NULL, >>>>>>>>>> t47 int4 NULL, >>>>>>>>>> t48 int4 NULL, >>>>>>>>>> t49 int4 NULL, >>>>>>>>>> t50 int4 NULL, >>>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), >>>>>>>>>> ); >>>>>>>>>>=20 >>>>>>>>>> Every time when i query I will query it along with userid=20 >>>>>>>>>> Ex : where userid =3D 12345678 and t1 in (1,2,3) and t2 in = (0,1,2) >>>>>>>>>> more key filters if customer applies=20 >>>>>>>>>>=20 >>>>>>>>>> On the other hand if I create a single jsonb column the = schema will look like : >>>>>>>>>>=20 >>>>>>>>>> CREATE TABLE dbo.googledocs_tbl ( >>>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 = MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT = NULL, >>>>>>>>>> userid int8 NOT NULL, >>>>>>>>>> addons_json jsonb default '{}'::jsonb >>>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), >>>>>>>>>> ); >>>>>>>>>>=20 >>>>>>>>>> and the query would be like=20 >>>>>>>>>> where userid =3D 12345678 and ((addons_json @> {t1:1}) or = (addons_json @> {t1:2}) or=C2=A0 <> (addons_json @> {t1:3}) >>>>>>>>>> more key filters if customer applies=C2=A0 >>>>>>>>>>=20 >>>>>>>>>>=20 >>>>>>>>>> <> >>>>>>>>>> On Mon, Dec 23, 2024 at 10:38=E2=80=AFPM David G. Johnston = > wrote: >>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy = > wrote: >>>>>>>>>>>>=20 >>>>>>>>>>>> So here my question is considering one JSONB column is = perfect or considering 50 columns will be more optimised. >>>>>>>>>>>>=20 >>>>>>>>>>>=20 >>>>>>>>>>> The relational database engine is designed around the = column-based approach. Especially if the columns are generally = unchanging, combined with using fixed-width data types. >>>>>>>>>>>=20 >>>>>>>>>>> David J. >>>>>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>>=20 >>>>>>>> -- >>>>>>>> Death to , and butter sauce. >>>>>>>> Don't boil me, I'm still alive. >>>>>>>> lobster! >>>>>>=20 >>>>>>=20 >>>>>>=20 >>>>>> -- >>>>>> Death to , and butter sauce. >>>>>> Don't boil me, I'm still alive. >>>>>> lobster! >>>=20 >>>=20 >>>=20 >>> -- >>> Death to , and butter sauce. >>> Don't boil me, I'm still alive. >>> lobster! --Apple-Mail=_E9BEF892-949C-458B-8756-73FFCC98896B Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8 I would like to make a suggestion, if I may. Granted, = I do not understand the underlying task at hand, but:


A table with multiple columns of the same type smacks = of designs that harken back to the days of mainframes. (STOP THAT!) The = data described is a non-normalized array of integers that is meaningless = outside of code. Table structures should be at least a LITTLE = self-descriptive.

It is also not flexible (what if you suddenly need = t51? how long would that table space adjustment take in production?) and = space is wasted if not all 50 columns are populated.

Use a design that is = basically a storage area for name/value pairs:

create table dbo.googledocs_tbl = (
  =   id long identity primary key, =E2=80=94 easy way to access a = single record
  =   owner_id integer/long not null, =E2=80=94 fk to owning = parent record in other table such as user
    owner_type char(2), =E2=80=94 optional field, = identifies the owing table, makes this table even more = generic
  =   property_name varchar(n) not null, =E2=80=94 required = unique name for property, not an array reference (t1, t4, = t50)
    =E2=80=94 the names are controlled by the developer = but should be human interpretable which can then be used in = queries
  =   property_value int4 not null =E2=80=94 the important = value in question
);

The fields owner_id, owner_type, property_name become = a tertiary key that can never be changed, are unique and easily = accessible via index lookup.

Add a timestamp if need be

You could then partition the record by owner_type or = owner_id or whatever else comes to mind.

Then you just have to figure out the = best way to index this monster for optimized = queries.



On Dec 23, = 2024, at 12:31=E2=80=AFPM, Divyansh Gupta JNsThMAudy = <ag1567827@gmail.com> wrote:

Currently I = haven't created those columns , I have created addons_json column which = is a JSONB column yet in a discussion weather I should create or = consider only one JSONB column.


On Tue, 24 Dec 2024, 12:00=E2=80=AFam Divyansh = Gupta JNsThMAudy, <ag1567827@gmail.com> = wrote:

Range = partition can help when you applies filter for a specific range but in = my case I need to apply filter on userid always, however I have date = columns but there is less variation in timestamp which I have that's why = didn't go for range partition.


On = Mon, 23 Dec 2024, 11:57=E2=80=AFpm Ron Johnson, <ronljohnsonjr@gmail.com> = wrote:

1. I bet you'd get better performance using = RANGE partitioning.
2. Twenty million rows per userid is = a LOT.  No subdivisions (like date range)?

On Mon, Dec = 23, 2024 at 1:23=E2=80=AFPM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> = wrote:
Adrian, Please check = this out;

PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 = PARTITION = OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 = PARTITION = OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec = 23, 2024 at 11:48=E2=80=AFPM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> = wrote:

Adrian, the partition = is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible = for a single userid in that case if I create index on userid only not on = other column the query is taking more than 30 seconds to return the = results.


On = Mon, 23 Dec 2024, 11:40=E2=80=AFpm Ron Johnson, <ronljohnsonjr@gmail.com> = wrote:
If your queries = all reference userid, then you only need indices on gdid and = userid.

On Mon, Dec 23, 2024 at 12:49=E2=80=AFPM Divyansh = Gupta JNsThMAudy <ag1567827@gmail.com> = wrote:

I have one confusion = with this design if I opt to create 50 columns I need to create 50 index = which will work with userid index in Bitmap on the other hand if I = create a JSONB column I need to create a single index ?


On = Mon, 23 Dec 2024, 11:10=E2=80=AFpm Ron Johnson, <ronljohnsonjr@gmail.com> = wrote:
Given what you = just wrote, I'd stick with 50 separate t* columns.  Simplifies = queries, simplifies updates, and eliminates JSONB = conversions.

On Mon, Dec 23, 2024 at 12:29=E2=80=AFPM Divyansh = Gupta JNsThMAudy <ag1567827@gmail.com> = wrote:

Values can be updated = based on customer actions

All rows won't have all 50 = key value pairs always if I make those keys into columns the rows might = have null value on the other hand if it is JSONB then the key value pair = will not be there

Yes in UI customers can search for = the key value pairs

During data population the key = value pair will be empty array in case of JSONB column or NULL in case = of table columns, later when customer performs some actions that time = the key value pairs will populate and update, based on what action = customer performs.


On = Mon, 23 Dec 2024, 10:51=E2=80=AFpm Divyansh Gupta JNsThMAudy, <ag1567827@gmail.com> = wrote:
Let's make it more = understandable, here is the table schema with 50 columns in = it 

CREATE TABLE dbo.googledocs_tbl (
gdid int8 = GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE = 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
userid = int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
= t3 int4 NULL,
t4 int4 NULL,
t5 int4 NULL,
= t6 int4 NULL,
t7 int4 NULL,
t8 int4 NULL,
= t9 int4 NULL,
t10 int4 NULL,
t11 int4 NULL,
= t12 int4 NULL,
t13 int4 NULL,
t14 int4 NULL,
= t15 int4 NULL,
t16 int4 NULL,
t17 int4 NULL,
= t18 int4 NULL,
t19 int4 NULL,
t20 int4 NULL,
= t21 int4 NULL,
t22 int4 NULL,
t23 int4 NULL,
= t24 int4 NULL,
t25 int4 NULL,
t26 int4 NULL,
= t27 int4 NULL,
t28 int4 NULL,
t29 int4 NULL,
= t30 int4 NULL,
t31 int4 NULL,
t32 int4 NULL,
= t33 int4 NULL,
t34 int4 NULL,
t35 int4 NULL,
= t36 int4 NULL,
t37 int4 NULL,
t38 int4 NULL,
= t39 int4 NULL,
t40 int4 NULL,
t41 int4 NULL,
= t42 int4 NULL,
t43 int4 NULL,
t44 int4 NULL,
= t45 int4 NULL,
t46 int4 NULL,
t47 int4 NULL,
= t48 int4 NULL,
t49 int4 NULL,
t50 int4 NULL,
= CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);

Every = time when i query I will query it along with userid 
Ex : where = userid =3D 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
more key = filters if customer applies 

On the other hand if I create a = single jsonb column the schema will look like :

CREATE TABLE = dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS = IDENTITY( INCREMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 = CACHE 1 NO CYCLE) NOT NULL,
userid int8 NOT NULL,
= addons_json jsonb default '{}'::jsonb
CONSTRAINT = googledocs_tbl_pkey PRIMARY KEY (gdid),
);

and the query would = be like 
where userid =3D 12345678 and ((addons_json @> = {t1:1}) or  (addons_json @> = {t1:2}) or  (addons_json @> = {t1:3})
more key filters if customer = applies 



On Mon, Dec 23, 2024 at 10:38=E2=80=AFPM = David G. Johnston <david.g.johnston@gmail.com> = wrote:


On Mon, Dec 23, 2024, 10:01 Divyansh Gupta = JNsThMAudy <ag1567827@gmail.com> wrote:


So here my question is considering = one JSONB column is perfect or considering 50 columns will be more = optimised.

The relational = database engine is designed around the column-based approach.  = Especially if the columns are generally unchanging, combined with using = fixed-width data types.

David J.



--
Death to <Redacted>, = and butter sauce.
Don't boil me, I'm still = alive.
<Redacted> = lobster!


--
Death to <Redacted>, = and butter sauce.
Don't boil me, I'm still = alive.
<Redacted> = lobster!


--
Death to <Redacted>, = and butter sauce.
Don't boil me, I'm still = alive.
<Redacted> = lobster!

= --Apple-Mail=_E9BEF892-949C-458B-8756-73FFCC98896B--