Re: Need help in database design

public inbox for [email protected]  
help / color / mirror / Atom feed

From: Ron Johnson <[email protected]>
To: pgsql-general <[email protected]>
Subject: Re: Need help in database design
Date: Mon, 23 Dec 2024 13:29:51 -0500
Message-ID: <CANzqJaCMGz5i2igRYjcm321kkWdn9Uy25U1Y+8QGqZxg4yLB=A@mail.gmail.com> (raw)
In-Reply-To: <CAHesJ5KKgX0abQy5Yi=wdA_gvv3zXmk9rFKcKKR4awoz=B68zA@mail.gmail.com>
References: <CAHesJ5LES3aTDf=xp7NOwrADQ_HWC-Spsv7yLu9ZY+zxzZO53A@mail.gmail.com>
	<[email protected]>
	<CAHesJ5+ASNoSNMiC5Ms0Ts=gw7v2_UeBpUT=phujO4yE_XCbEw@mail.gmail.com>
	<CAHesJ5JbkCBZ2f_AvUr8+KWnGPAsobu4zyfnWm8bEeb7X9oqDQ@mail.gmail.com>
	<[email protected]>
	<CAHesJ5JLzhHiGSBSkJZ7x7rGgHeeByP=wWk1D5GG=x8cJ5YY6Q@mail.gmail.com>
	<CAKFQuwYdpzwcbSdQ8TvZ-nVjPeHVVz+5=bWofCbUK+p_o=axrQ@mail.gmail.com>
	<CAHesJ5+yTenkAxOT8H33Cfe=1b2kSyXGqxFYfYz5fgYAVVvFmw@mail.gmail.com>
	<CAHesJ5KaJ8p7QhB9UUoFEbA87cU7ke4GBMkKR3q2FJPVv9GXyw@mail.gmail.com>
	<CANzqJaB_s8eXCZJvYO9CLvgJNqrshD=G5GgECi1M9=vk-JHjdQ@mail.gmail.com>
	<CAHesJ5LgLi9-uGCk3J9TUkuyttysz3fzTaP+o57EjcBtwDYKZA@mail.gmail.com>
	<CANzqJaD-MwXzvg97q0iLvAdkf=DnUMOq0Ex2_eNU7sTxEL7bfA@mail.gmail.com>
	<CAHesJ5KtKm9fjhMdR1+cC-M5jW98Sz6sWKbt0mN6SJcfkq9eig@mail.gmail.com>
	<CAHesJ5Kne6MZakdhcQ9Zc-5KhBvhgUt+zUXuX5v6z+zwTY6gLQ@mail.gmail.com>
	<CAHesJ5KKgX0abQy5Yi=wdA_gvv3zXmk9rFKcKKR4awoz=B68zA@mail.gmail.com>

Where are the 50 "t* columns?

On Mon, Dec 23, 2024 at 1:26 PM Divyansh Gupta JNsThMAudy <
[email protected]> wrote:

> Ron here is the entire table schema FYI, userid is the mandate column on
> which filter is always applies:
>
> CREATE TABLE dbo.googledocs_tbl (
>
> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1
> MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
>
> userid int8 NOT NULL,
>
> useremail varchar(600) NOT NULL,
>
> title public.citext NULL,
>
> authorname varchar(600) NULL,
>
> authoremail varchar(600) NULL,
>
> updated varchar(300) NOT NULL,
>
> entryid varchar(2000) NOT NULL,
>
> lastmodifiedby varchar(600) NULL,
>
> lastmodifiedbyemail varchar(600) NULL,
>
> "size" varchar(300) NULL,
>
> contenttype varchar(250) NULL,
>
> fileextension varchar(50) NULL,
>
> docfoldername public.citext NULL,
>
> folderresourceid public.citext NULL,
>
> filesize int8 DEFAULT 0 NOT NULL,
>
> retentionstatus int2 DEFAULT 0 NOT NULL,
>
> docfileref int8 NULL,
>
> usid int4 NULL,
>
> archivepath varchar(500) NULL,
>
> createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL,
>
> zipfilename varchar(100) NULL,
>
> oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
>
> onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,
>
> startsnapshot int4 DEFAULT 0 NOT NULL,
>
> currentsnapshot int4 DEFAULT 0 NOT NULL,
>
> dismiss int2 DEFAULT 0 NOT NULL,
>
> checksum varchar NULL,
>
> typeoffile int2 GENERATED ALWAYS AS (
>
> CASE
>
> WHEN authoremail::text = useremail::text THEN 0::smallint
>
> ELSE 1::smallint
>
> END) STORED NOT NULL,
>
> parquetfilename varchar(100) NULL,
>
> metadata_vector tsvector GENERATED ALWAYS AS (to_tsvector('english'::
> regconfig, (((((COALESCE(title::character varying, ''::text::character
> varying)::text || ' '::text) || (COALESCE(docfoldername::text, ''::text)
> || ' '::text)) || (COALESCE(authorname, ''::text::character varying)::text
> || ' '::text)) || (COALESCE(fileextension, ''::text::character varying)::
> text || ' '::text)) || (COALESCE(lastmodifiedby, ''::text::character
> varying)::text || ' '::text)) || COALESCE(contenttype, ''::character
> varying::text::character varying)::text)) STORED NULL,
>
> isfileencrypted int4 DEFAULT 0 NULL,
>
> addons_json jsonb DEFAULT '{}'::jsonb NULL,
>
> CONSTRAINT googledocs_tbl_clone_pkey PRIMARY KEY (gdid, userid),
>
> CONSTRAINT fk_googledocs_tbl_clone_users_tbl FOREIGN KEY (userid)
> REFERENCES dbo.users_tbl(uid) ON DELETE CASCADE
>
> )
>
> PARTITION BY HASH (userid);
>
> On Mon, Dec 23, 2024 at 11:53 PM Divyansh Gupta JNsThMAudy <
> [email protected]> wrote:
>
>> Adrian, Please check this out;
>>
>> PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0
>> PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84,
>> remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION
>> OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);
>>
>> On Mon, Dec 23, 2024 at 11:48 PM Divyansh Gupta JNsThMAudy <
>> [email protected]> wrote:
>>
>>> Adrian, the partition is on userid using hash partition with 84
>>> partitions
>>>
>>> Ron, there could be more than 20 Million records possible for a single
>>> userid in that case if I create index on userid only not on other column
>>> the query is taking more than 30 seconds to return the results.
>>>
>>> On Mon, 23 Dec 2024, 11:40 pm Ron Johnson, <[email protected]>
>>> wrote:
>>>
>>>> If your queries all reference userid, then you only need indices on
>>>> gdid and userid.
>>>>
>>>> On Mon, Dec 23, 2024 at 12:49 PM Divyansh Gupta JNsThMAudy <
>>>> [email protected]> wrote:
>>>>
>>>>> I have one confusion with this design if I opt to create 50 columns I
>>>>> need to create 50 index which will work with userid index in Bitmap on the
>>>>> other hand if I create a JSONB column I need to create a single index ?
>>>>>
>>>>> On Mon, 23 Dec 2024, 11:10 pm Ron Johnson, <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Given what you just wrote, I'd stick with 50 separate t* columns.
>>>>>> Simplifies queries, simplifies updates, and eliminates JSONB conversions.
>>>>>>
>>>>>> On Mon, Dec 23, 2024 at 12:29 PM Divyansh Gupta JNsThMAudy <
>>>>>> [email protected]> wrote:
>>>>>>
>>>>>>> Values can be updated based on customer actions
>>>>>>>
>>>>>>> All rows won't have all 50 key value pairs always if I make those
>>>>>>> keys into columns the rows might have null value on the other hand if it is
>>>>>>> JSONB then the key value pair will not be there
>>>>>>>
>>>>>>> Yes in UI customers can search for the key value pairs
>>>>>>>
>>>>>>> During data population the key value pair will be empty array in
>>>>>>> case of JSONB column or NULL in case of table columns, later when customer
>>>>>>> performs some actions that time the key value pairs will populate and
>>>>>>> update, based on what action customer performs.
>>>>>>>
>>>>>>> On Mon, 23 Dec 2024, 10:51 pm Divyansh Gupta JNsThMAudy, <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Let's make it more understandable, here is the table schema with 50
>>>>>>>> columns in it
>>>>>>>>
>>>>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE
>>>>>>>> 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
>>>>>>>> userid int8 NOT NULL,
>>>>>>>> t1 int4 NULL,
>>>>>>>> t2 int4 NULL,
>>>>>>>>
>>>>>>>> [snip]

> t46 int4 NULL,
>>>>>>>> t47 int4 NULL,
>>>>>>>> t48 int4 NULL,
>>>>>>>> t49 int4 NULL,
>>>>>>>> t50 int4 NULL,
>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>>>>> );
>>>>>>>>
>>>>>>>> Every time when i query I will query it along with userid
>>>>>>>> Ex : where userid = 12345678 and t1 in (1,2,3) and t2 in (0,1,2)
>>>>>>>> more key filters if customer applies
>>>>>>>>
>>>>>>>> On the other hand if I create a single jsonb column the schema will
>>>>>>>> look like :
>>>>>>>>
>>>>>>>> CREATE TABLE dbo.googledocs_tbl (
>>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE
>>>>>>>> 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,
>>>>>>>> userid int8 NOT NULL,
>>>>>>>> addons_json jsonb default '{}'::jsonb
>>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
>>>>>>>> );
>>>>>>>>
>>>>>>>> and the query would be like
>>>>>>>> where userid = 12345678 and ((addons_json @> {t1:1}) or
>>>>>>>> (addons_json @> {t1:2}) or  (addons_json @> {t1:3})
>>>>>>>> more key filters if customer applies
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 23, 2024 at 10:38 PM David G. Johnston <
>>>>>>>> [email protected]> wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy <
>>>>>>>>> [email protected]> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> So here my question is considering one JSONB column is perfect or
>>>>>>>>>> considering 50 columns will be more optimised.
>>>>>>>>>>
>>>>>>>>> The relational database engine is designed around the column-based
>>>>>>>>> approach.  Especially if the columns are generally unchanging, combined
>>>>>>>>> with using fixed-width data types.
>>>>>>>>>
>>>>>>>>> David J.
>>>>>>>>>
>>>>>>>>>

-- 
Death to <Redacted>, and butter sauce.
Don't boil me, I'm still alive.
<Redacted> lobster!

view thread (19+ messages)  latest in thread

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected]
  Subject: Re: Need help in database design
  In-Reply-To: <CANzqJaCMGz5i2igRYjcm321kkWdn9Uy25U1Y+8QGqZxg4yLB=A@mail.gmail.com>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox