MIME-Version: 1.0
References: <CAHesJ5LES3aTDf=xp7NOwrADQ_HWC-Spsv7yLu9ZY+zxzZO53A@mail.gmail.com>
 <b450927c-49da-46e5-ad74-bf38ceff166b@aklaver.com> <CAHesJ5+ASNoSNMiC5Ms0Ts=gw7v2_UeBpUT=phujO4yE_XCbEw@mail.gmail.com>
 <CAHesJ5JbkCBZ2f_AvUr8+KWnGPAsobu4zyfnWm8bEeb7X9oqDQ@mail.gmail.com>
 <06e1f1ee-74b2-43a2-9a63-da20ae455ae2@aklaver.com> <CAHesJ5JLzhHiGSBSkJZ7x7rGgHeeByP=wWk1D5GG=x8cJ5YY6Q@mail.gmail.com>
 <CAKFQuwYdpzwcbSdQ8TvZ-nVjPeHVVz+5=bWofCbUK+p_o=axrQ@mail.gmail.com>
 <CAHesJ5+yTenkAxOT8H33Cfe=1b2kSyXGqxFYfYz5fgYAVVvFmw@mail.gmail.com>
 <CAHesJ5KaJ8p7QhB9UUoFEbA87cU7ke4GBMkKR3q2FJPVv9GXyw@mail.gmail.com>
 <CANzqJaB_s8eXCZJvYO9CLvgJNqrshD=G5GgECi1M9=vk-JHjdQ@mail.gmail.com>
 <CAHesJ5LgLi9-uGCk3J9TUkuyttysz3fzTaP+o57EjcBtwDYKZA@mail.gmail.com> <CANzqJaD-MwXzvg97q0iLvAdkf=DnUMOq0Ex2_eNU7sTxEL7bfA@mail.gmail.com>
In-Reply-To: <CANzqJaD-MwXzvg97q0iLvAdkf=DnUMOq0Ex2_eNU7sTxEL7bfA@mail.gmail.com>
From: Divyansh Gupta JNsThMAudy <ag1567827@gmail.com>
Date: Mon, 23 Dec 2024 23:48:01 +0530
Message-ID: <CAHesJ5KtKm9fjhMdR1+cC-M5jW98Sz6sWKbt0mN6SJcfkq9eig@mail.gmail.com>
Subject: Re: Need help in database design
To: Ron Johnson <ronljohnsonjr@gmail.com>
Cc: pgsql-general <pgsql-general@postgresql.org>
Content-Type: multipart/alternative; boundary="000000000000fee3dc0629f40434"
Archived-At: <https://www.postgresql.org/message-id/CAHesJ5KtKm9fjhMdR1%2BcC-M5jW98Sz6sWKbt0mN6SJcfkq9eig%40mail.gmail.com>
Precedence: bulk

--000000000000fee3dc0629f40434
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

Adrian, the partition is on userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible for a single
userid in that case if I create index on userid only not on other column
the query is taking more than 30 seconds to return the results.

On Mon, 23 Dec 2024, 11:40=E2=80=AFpm Ron Johnson, <ronljohnsonjr@gmail.com=
> wrote:

> If your queries all reference userid, then you only need indices on gdid
> and userid.
>
> On Mon, Dec 23, 2024 at 12:49=E2=80=AFPM Divyansh Gupta JNsThMAudy <
> ag1567827@gmail.com> wrote:
>
>> I have one confusion with this design if I opt to create 50 columns I
>> need to create 50 index which will work with userid index in Bitmap on t=
he
>> other hand if I create a JSONB column I need to create a single index ?
>>
>> On Mon, 23 Dec 2024, 11:10=E2=80=AFpm Ron Johnson, <ronljohnsonjr@gmail.=
com>
>> wrote:
>>
>>> Given what you just wrote, I'd stick with 50 separate t* columns.
>>> Simplifies queries, simplifies updates, and eliminates JSONB conversion=
s.
>>>
>>> On Mon, Dec 23, 2024 at 12:29=E2=80=AFPM Divyansh Gupta JNsThMAudy <
>>> ag1567827@gmail.com> wrote:
>>>
>>>> Values can be updated based on customer actions
>>>>
>>>> All rows won't have all 50 key value pairs always if I make those keys
>>>> into columns the rows might have null value on the other hand if it is
>>>> JSONB then the key value pair will not be there
>>>>
>>>> Yes in UI customers can search for the key value pairs
>>>>
>>>> During data population the key value pair will be empty array in case
>>>> of JSONB column or NULL in case of table columns, later when customer
>>>> performs some actions that time the key value pairs will populate and
>>>> update, based on what action customer performs.
>>>>
>>>> On Mon, 23 Dec 2024, 10:51=E2=80=AFpm Divyansh Gupta JNsThMAudy, <
>>>> ag1567827@gmail.com> wrote:
>>>>
>>>>> Let's make it more understandable, here is the table schema with 50
>>>>> columns in it
>>>>>
>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1
>>>>> MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
>>>>> userid int8 NOT NULL,
>>>>> t1 int4 NULL,
>>>>> t2 int4 NULL,
>>>>> t3 int4 NULL,
>>>>> t4 int4 NULL,
>>>>> t5 int4 NULL,
>>>>> t6 int4 NULL,
>>>>> t7 int4 NULL,
>>>>> t8 int4 NULL,
>>>>> t9 int4 NULL,
>>>>> t10 int4 NULL,
>>>>> t11 int4 NULL,
>>>>> t12 int4 NULL,
>>>>> t13 int4 NULL,
>>>>> t14 int4 NULL,
>>>>> t15 int4 NULL,
>>>>> t16 int4 NULL,
>>>>> t17 int4 NULL,
>>>>> t18 int4 NULL,
>>>>> t19 int4 NULL,
>>>>> t20 int4 NULL,
>>>>> t21 int4 NULL,
>>>>> t22 int4 NULL,
>>>>> t23 int4 NULL,
>>>>> t24 int4 NULL,
>>>>> t25 int4 NULL,
>>>>> t26 int4 NULL,
>>>>> t27 int4 NULL,
>>>>> t28 int4 NULL,
>>>>> t29 int4 NULL,
>>>>> t30 int4 NULL,
>>>>> t31 int4 NULL,
>>>>> t32 int4 NULL,
>>>>> t33 int4 NULL,
>>>>> t34 int4 NULL,
>>>>> t35 int4 NULL,
>>>>> t36 int4 NULL,
>>>>> t37 int4 NULL,
>>>>> t38 int4 NULL,
>>>>> t39 int4 NULL,
>>>>> t40 int4 NULL,
>>>>> t41 int4 NULL,
>>>>> t42 int4 NULL,
>>>>> t43 int4 NULL,
>>>>> t44 int4 NULL,
>>>>> t45 int4 NULL,
>>>>> t46 int4 NULL,
>>>>> t47 int4 NULL,
>>>>> t48 int4 NULL,
>>>>> t49 int4 NULL,
>>>>> t50 int4 NULL,
>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>> );
>>>>>
>>>>> Every time when i query I will query it along with userid
>>>>> Ex : where userid =3D 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
>>>>> more key filters if customer applies
>>>>>
>>>>> On the other hand if I create a single jsonb column the schema will
>>>>> look like :
>>>>>
>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1
>>>>> MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
>>>>> userid int8 NOT NULL,
>>>>> addons_json jsonb default '{}'::jsonb
>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>> );
>>>>>
>>>>> and the query would be like
>>>>> where userid =3D 12345678 and ((addons_json @> {t1:1}) or  (addons_js=
on @>
>>>>> {t1:2}) or  (addons_json @> {t1:3})
>>>>> more key filters if customer applies
>>>>>
>>>>>
>>>>>
>>>>> On Mon, Dec 23, 2024 at 10:38=E2=80=AFPM David G. Johnston <
>>>>> david.g.johnston@gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <
>>>>>> ag1567827@gmail.com> wrote:
>>>>>>
>>>>>>>
>>>>>>> So here my question is considering one JSONB column is perfect or
>>>>>>> considering 50 columns will be more optimised.
>>>>>>>
>>>>>> The relational database engine is designed around the column-based
>>>>>> approach.  Especially if the columns are generally unchanging, combi=
ned
>>>>>> with using fixed-width data types.
>>>>>>
>>>>>> David J.
>>>>>>
>>>>>>
>>>
>>> --
>>> Death to <Redacted>, and butter sauce.
>>> Don't boil me, I'm still alive.
>>> <Redacted> lobster!
>>>
>>
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>

--000000000000fee3dc0629f40434
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<p dir=3D"ltr">Adrian, the partition is on userid using hash partition with=
 84 partitions</p>
<p dir=3D"ltr">Ron, there could be more than 20 Million records possible fo=
r a single userid in that case if I create index on userid only not on othe=
r column the query is taking more than 30 seconds to return the results.</p=
>
<br><div class=3D"gmail_quote gmail_quote_container"><div dir=3D"ltr" class=
=3D"gmail_attr">On Mon, 23 Dec 2024, 11:40=E2=80=AFpm Ron Johnson, &lt;<a h=
ref=3D"mailto:ronljohnsonjr@gmail.com">ronljohnsonjr@gmail.com</a>&gt; wrot=
e:<br></div><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;bo=
rder-left:1px #ccc solid;padding-left:1ex"><div dir=3D"ltr"><div>If your qu=
eries all reference userid, then you only need indices on gdid and userid.<=
/div><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">O=
n Mon, Dec 23, 2024 at 12:49=E2=80=AFPM Divyansh Gupta JNsThMAudy &lt;<a hr=
ef=3D"mailto:ag1567827@gmail.com" target=3D"_blank" rel=3D"noreferrer">ag15=
67827@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" s=
tyle=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pad=
ding-left:1ex"><p dir=3D"ltr">I have one confusion with this design if I op=
t to create 50 columns I need to create 50 index which will work with useri=
d index in Bitmap on the other hand if I create a JSONB column I need to cr=
eate a single index ?</p>
<br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Mon=
, 23 Dec 2024, 11:10=E2=80=AFpm Ron Johnson, &lt;<a href=3D"mailto:ronljohn=
sonjr@gmail.com" target=3D"_blank" rel=3D"noreferrer">ronljohnsonjr@gmail.c=
om</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margi=
n:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex=
"><div dir=3D"ltr"><div>Given what you just wrote, I&#39;d stick with 50 se=
parate t* columns.=C2=A0 Simplifies queries, simplifies updates, and elimin=
ates JSONB conversions.</div><br><div class=3D"gmail_quote"><div dir=3D"ltr=
" class=3D"gmail_attr">On Mon, Dec 23, 2024 at 12:29=E2=80=AFPM Divyansh Gu=
pta JNsThMAudy &lt;<a href=3D"mailto:ag1567827@gmail.com" rel=3D"noreferrer=
 noreferrer" target=3D"_blank">ag1567827@gmail.com</a>&gt; wrote:<br></div>=
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex"><p dir=3D"ltr">Values can=
 be updated based on customer actions</p>
<p dir=3D"ltr">All rows won&#39;t have all 50 key value pairs always if I m=
ake those keys into columns the rows might have null value on the other han=
d if it is JSONB then the key value pair will not be there</p>
<p dir=3D"ltr">Yes in UI customers can search for the key value pairs</p>
<p dir=3D"ltr">During data population the key value pair will be empty arra=
y in case of JSONB column or NULL in case of table columns, later when cust=
omer performs some actions that time the key value pairs will populate and =
update, based on what action customer performs.<br>
</p>
<br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On Mon=
, 23 Dec 2024, 10:51=E2=80=AFpm Divyansh Gupta JNsThMAudy, &lt;<a href=3D"m=
ailto:ag1567827@gmail.com" rel=3D"noreferrer noreferrer" target=3D"_blank">=
ag1567827@gmail.com</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quot=
e" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204)=
;padding-left:1ex"><div dir=3D"ltr">Let&#39;s make it more understandable, =
here is the table schema with 50 columns in it=C2=A0<br><br>CREATE TABLE db=
o.googledocs_tbl (<br>	gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMEN=
T BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NO=
T NULL,<br>	userid int8 NOT NULL,<br>	t1 int4 NULL,<br>	t2 int4 NULL,<br>	t=
3 int4 NULL,<br>	t4 int4 NULL,<br>	t5 int4 NULL,<br>	t6 int4 NULL,<br>	t7 i=
nt4 NULL,<br>	t8 int4 NULL,<br>	t9 int4 NULL,<br>	t10 int4 NULL,<br>	t11 in=
t4 NULL,<br>	t12 int4 NULL,<br>	t13 int4 NULL,<br>	t14 int4 NULL,<br>	t15 i=
nt4 NULL,<br>	t16 int4 NULL,<br>	t17 int4 NULL,<br>	t18 int4 NULL,<br>	t19 =
int4 NULL,<br>	t20 int4 NULL,<br>	t21 int4 NULL,<br>	t22 int4 NULL,<br>	t23=
 int4 NULL,<br>	t24 int4 NULL,<br>	t25 int4 NULL,<br>	t26 int4 NULL,<br>	t2=
7 int4 NULL,<br>	t28 int4 NULL,<br>	t29 int4 NULL,<br>	t30 int4 NULL,<br>	t=
31 int4 NULL,<br>	t32 int4 NULL,<br>	t33 int4 NULL,<br>	t34 int4 NULL,<br>	=
t35 int4 NULL,<br>	t36 int4 NULL,<br>	t37 int4 NULL,<br>	t38 int4 NULL,<br>=
	t39 int4 NULL,<br>	t40 int4 NULL,<br>	t41 int4 NULL,<br>	t42 int4 NULL,<br=
>	t43 int4 NULL,<br>	t44 int4 NULL,<br>	t45 int4 NULL,<br>	t46 int4 NULL,<b=
r>	t47 int4 NULL,<br>	t48 int4 NULL,<br>	t49 int4 NULL,<br>	t50 int4 NULL,<=
br>	CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),<br>);<br><br>Every t=
ime when i query I will query it along with userid=C2=A0<br>Ex : where user=
id =3D 12345678 and t1 in (1,2,3) and t2 in (0,1,2)<br>more key filters if =
customer applies=C2=A0<br><br>On the other hand if I create a single jsonb =
column the schema will look like :<br><br>CREATE TABLE dbo.googledocs_tbl (=
<br>	gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 =
MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,<br>	userid=
 int8 NOT NULL,<br>	addons_json jsonb default &#39;{}&#39;::jsonb<br>	CONST=
RAINT googledocs_tbl_pkey PRIMARY KEY (gdid),<br>);<br><br>and the query wo=
uld be like=C2=A0<br>where userid =3D 12345678 and ((addons_json=C2=A0@&gt;=
 {t1:1}) or=C2=A0

(addons_json=C2=A0<a class=3D"gmail_plusreply" id=3D"m_226170299586707328m_=
2525891115250520179m_5704739134775453558m_4822255652052756050m_-12155675527=
91878704gmail-plusReplyChip-0" rel=3D"noreferrer noreferrer noreferrer">@&g=
t; {t1:2}) or=C2=A0</a>

(addons_json=C2=A0<a class=3D"gmail_plusreply" id=3D"m_226170299586707328m_=
2525891115250520179m_5704739134775453558m_4822255652052756050m_-12155675527=
91878704gmail-plusReplyChip-0" rel=3D"noreferrer noreferrer noreferrer">@&g=
t; {t1:3})<br>more key filters if customer applies=C2=A0<br><br><br></a></d=
iv><br><div class=3D"gmail_quote"><div dir=3D"ltr" class=3D"gmail_attr">On =
Mon, Dec 23, 2024 at 10:38=E2=80=AFPM David G. Johnston &lt;<a href=3D"mail=
to:david.g.johnston@gmail.com" rel=3D"noreferrer noreferrer noreferrer" tar=
get=3D"_blank">david.g.johnston@gmail.com</a>&gt; wrote:<br></div><blockquo=
te class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px =
solid rgb(204,204,204);padding-left:1ex"><div dir=3D"auto"><br><br><div cla=
ss=3D"gmail_quote" dir=3D"auto"><div dir=3D"ltr" class=3D"gmail_attr">On Mo=
n, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy &lt;<a href=3D"mailto:ag15=
67827@gmail.com" rel=3D"noreferrer noreferrer noreferrer" target=3D"_blank"=
>ag1567827@gmail.com</a>&gt; wrote:</div><blockquote class=3D"gmail_quote" =
style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);pa=
dding-left:1ex"><div dir=3D"auto"><p dir=3D"ltr"><br></p><p dir=3D"ltr">So =
here my question is considering one JSONB column is perfect or considering =
50 columns will be more optimised.</p></div></blockquote></div><div dir=3D"=
auto">The relational database engine is designed around the column-based ap=
proach.=C2=A0 Especially if the columns are generally unchanging, combined =
with using fixed-width data types.</div><div dir=3D"auto"><br></div><div di=
r=3D"auto">David J.</div><div dir=3D"auto"><br></div><div class=3D"gmail_qu=
ote" dir=3D"auto"><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px=
 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
</blockquote></div></div>
</blockquote></div>
</blockquote></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr">Death to &lt;Redacted&gt;, and butter sauce.<div=
>Don&#39;t boil me, I&#39;m still alive.<br><div><div>&lt;Redacted&gt; lobs=
ter!</div></div></div></div></div></div>
</blockquote></div>
</blockquote></div><div><br clear=3D"all"></div><div><br></div><span class=
=3D"gmail_signature_prefix">-- </span><br><div dir=3D"ltr" class=3D"gmail_s=
ignature"><div dir=3D"ltr">Death to &lt;Redacted&gt;, and butter sauce.<div=
>Don&#39;t boil me, I&#39;m still alive.<br><div><div>&lt;Redacted&gt; lobs=
ter!</div></div></div></div></div></div>
</blockquote></div>

--000000000000fee3dc0629f40434--