public inbox for [email protected]
help / color / mirror / Atom feedFrom: Divyansh Gupta JNsThMAudy <[email protected]>
To: Ron Johnson <[email protected]>
Cc: pgsql-general <[email protected]>
Subject: Re: Need help in database design
Date: Tue, 24 Dec 2024 00:00:25 +0530
Message-ID: <CAHesJ5LbFjqN0NndYUHKsXX_JssgwM-VPPaYoi28kZvsciFifQ@mail.gmail.com> (raw)
In-Reply-To: <CANzqJaDb901c=fbicfuzXu1kvf1OR+6rtRZvtHOPyGaOD2E99Q@mail.gmail.com>
References: <CAHesJ5LES3aTDf=xp7NOwrADQ_HWC-Spsv7yLu9ZY+zxzZO53A@mail.gmail.com>
<[email protected]>
<CAHesJ5+ASNoSNMiC5Ms0Ts=gw7v2_UeBpUT=phujO4yE_XCbEw@mail.gmail.com>
<CAHesJ5JbkCBZ2f_AvUr8+KWnGPAsobu4zyfnWm8bEeb7X9oqDQ@mail.gmail.com>
<[email protected]>
<CAHesJ5JLzhHiGSBSkJZ7x7rGgHeeByP=wWk1D5GG=x8cJ5YY6Q@mail.gmail.com>
<CAKFQuwYdpzwcbSdQ8TvZ-nVjPeHVVz+5=bWofCbUK+p_o=axrQ@mail.gmail.com>
<CAHesJ5+yTenkAxOT8H33Cfe=1b2kSyXGqxFYfYz5fgYAVVvFmw@mail.gmail.com>
<CAHesJ5KaJ8p7QhB9UUoFEbA87cU7ke4GBMkKR3q2FJPVv9GXyw@mail.gmail.com>
<CANzqJaB_s8eXCZJvYO9CLvgJNqrshD=G5GgECi1M9=vk-JHjdQ@mail.gmail.com>
<CAHesJ5LgLi9-uGCk3J9TUkuyttysz3fzTaP+o57EjcBtwDYKZA@mail.gmail.com>
<CANzqJaD-MwXzvg97q0iLvAdkf=DnUMOq0Ex2_eNU7sTxEL7bfA@mail.gmail.com>
<CAHesJ5KtKm9fjhMdR1+cC-M5jW98Sz6sWKbt0mN6SJcfkq9eig@mail.gmail.com>
<CAHesJ5Kne6MZakdhcQ9Zc-5KhBvhgUt+zUXuX5v6z+zwTY6gLQ@mail.gmail.com>
<CANzqJaDb901c=fbicfuzXu1kvf1OR+6rtRZvtHOPyGaOD2E99Q@mail.gmail.com>
Range partition can help when you applies filter for a specific range but
in my case I need to apply filter on userid always, however I have date
columns but there is less variation in timestamp which I have that's why
didn't go for range partition.
On Mon, 23 Dec 2024, 11:57 pm Ron Johnson, <[email protected]> wrote:
>
> 1. I bet you'd get better performance using RANGE partitioning.
> 2. Twenty million rows per userid is a *LOT*. No subdivisions (like date
> range)?
>
> On Mon, Dec 23, 2024 at 1:23 PM Divyansh Gupta JNsThMAudy <
> [email protected]> wrote:
>
>> Adrian, Please check this out;
>>
>> PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0
>> PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84,
>> remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION
>> OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);
>>
>> On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <
>> [email protected]> wrote:
>>
>>> Adrian, the partition is on userid using hash partition with 84
>>> partitions
>>>
>>> Ron, there could be more than 20 Million records possible for a single
>>> userid in that case if I create index on userid only not on other column
>>> the query is taking more than 30 seconds to return the results.
>>>
>>> On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <[email protected]>
>>> wrote:
>>>
>>>> If your queries all reference userid, then you only need indices on
>>>> gdid and userid.
>>>>
>>>> On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <
>>>> [email protected]> wrote:
>>>>
>>>>> I have one confusion with this design if I opt to create 50 columns I
>>>>> need to create 50 index which will work with userid index in Bitmap on the
>>>>> other hand if I create a JSONB column I need to create a single index ?
>>>>>
>>>>> On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Given what you just wrote, I'd stick with 50 separate t* columns.
>>>>>> Simplifies queries, simplifies updates, and eliminates JSONB conversions.
>>>>>>
>>>>>> On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Values can be updated based on customer actions
>>>>>>>
>>>>>>> All rows won't have all 50 key value pairs always if I make those
>>>>>>> keys into columns the rows might have null value on the other hand if it is
>>>>>>> JSONB then the key value pair will not be there
>>>>>>>
>>>>>>> Yes in UI customers can search for the key value pairs
>>>>>>>
>>>>>>> During data population the key value pair will be empty array in
>>>>>>> case of JSONB column or NULL in case of table columns, later when customer
>>>>>>> performs some actions that time the key value pairs will populate and
>>>>>>> update, based on what action customer performs.
>>>>>>>
>>>>>>> On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Let's make it more understandable, here is the table schema with 50
>>>>>>>> columns in it
>>>>>>>>
>>>>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE
>>>>>>>> 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
>>>>>>>> userid int8 NOT NULL,
>>>>>>>> t1 int4 NULL,
>>>>>>>> t2 int4 NULL,
>>>>>>>> t3 int4 NULL,
>>>>>>>> t4 int4 NULL,
>>>>>>>> t5 int4 NULL,
>>>>>>>> t6 int4 NULL,
>>>>>>>> t7 int4 NULL,
>>>>>>>> t8 int4 NULL,
>>>>>>>> t9 int4 NULL,
>>>>>>>> t10 int4 NULL,
>>>>>>>> t11 int4 NULL,
>>>>>>>> t12 int4 NULL,
>>>>>>>> t13 int4 NULL,
>>>>>>>> t14 int4 NULL,
>>>>>>>> t15 int4 NULL,
>>>>>>>> t16 int4 NULL,
>>>>>>>> t17 int4 NULL,
>>>>>>>> t18 int4 NULL,
>>>>>>>> t19 int4 NULL,
>>>>>>>> t20 int4 NULL,
>>>>>>>> t21 int4 NULL,
>>>>>>>> t22 int4 NULL,
>>>>>>>> t23 int4 NULL,
>>>>>>>> t24 int4 NULL,
>>>>>>>> t25 int4 NULL,
>>>>>>>> t26 int4 NULL,
>>>>>>>> t27 int4 NULL,
>>>>>>>> t28 int4 NULL,
>>>>>>>> t29 int4 NULL,
>>>>>>>> t30 int4 NULL,
>>>>>>>> t31 int4 NULL,
>>>>>>>> t32 int4 NULL,
>>>>>>>> t33 int4 NULL,
>>>>>>>> t34 int4 NULL,
>>>>>>>> t35 int4 NULL,
>>>>>>>> t36 int4 NULL,
>>>>>>>> t37 int4 NULL,
>>>>>>>> t38 int4 NULL,
>>>>>>>> t39 int4 NULL,
>>>>>>>> t40 int4 NULL,
>>>>>>>> t41 int4 NULL,
>>>>>>>> t42 int4 NULL,
>>>>>>>> t43 int4 NULL,
>>>>>>>> t44 int4 NULL,
>>>>>>>> t45 int4 NULL,
>>>>>>>> t46 int4 NULL,
>>>>>>>> t47 int4 NULL,
>>>>>>>> t48 int4 NULL,
>>>>>>>> t49 int4 NULL,
>>>>>>>> t50 int4 NULL,
>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>>>>> );
>>>>>>>>
>>>>>>>> Every time when i query I will query it along with userid
>>>>>>>> Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
>>>>>>>> more key filters if customer applies
>>>>>>>>
>>>>>>>> On the other hand if I create a single jsonb column the schema will
>>>>>>>> look like :
>>>>>>>>
>>>>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE
>>>>>>>> 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
>>>>>>>> userid int8 NOT NULL,
>>>>>>>> addons_json jsonb default '{}'::jsonb
>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>>>>> );
>>>>>>>>
>>>>>>>> and the query would be like
>>>>>>>> where userid = 12345678 and ((addons_json @> {t1:1}) or
>>>>>>>> (addons_json @> {t1:2}) or (addons_json @> {t1:3})
>>>>>>>> more key filters if customer applies
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So here my question is considering one JSONB column is perfect or
>>>>>>>>>> considering 50 columns will be more optimised.
>>>>>>>>>>
>>>>>>>>> The relational database engine is designed around the column-based
>>>>>>>>> approach. Especially if the columns are generally unchanging, combined
>>>>>>>>> with using fixed-width data types.
>>>>>>>>>
>>>>>>>>> David J.
>>>>>>>>>
>>>>>>>>>
>>>>>>
>>>>>> --
>>>>>> Death to <Redacted>, and butter sauce.
>>>>>> Don't boil me, I'm still alive.
>>>>>> <Redacted> lobster!
>>>>>>
>>>>>
>>>>
>>>> --
>>>> Death to <Redacted>, and butter sauce.
>>>> Don't boil me, I'm still alive.
>>>> <Redacted> lobster!
>>>>
>>>
>
> --
> Death to <Redacted>, and butter sauce.
> Don't boil me, I'm still alive.
> <Redacted> lobster!
>
view thread (19+ messages) latest in thread
reply
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Reply to all the recipients using the --to and --cc options:
reply via email
To: [email protected]
Cc: [email protected], [email protected]
Subject: Re: Need help in database design
In-Reply-To: <CAHesJ5LbFjqN0NndYUHKsXX_JssgwM-VPPaYoi28kZvsciFifQ@mail.gmail.com>
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox