Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tPnBy-00FL2e-Lr for pgsql-general@arkaria.postgresql.org; Mon, 23 Dec 2024 18:30:11 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.94.2) (envelope-from ) id 1tPnBw-00EKu7-KX for pgsql-general@arkaria.postgresql.org; Mon, 23 Dec 2024 18:30:08 +0000 Received: from magus.postgresql.org ([2a02:c0:301:0:ffff::29]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1tPnBw-00EKs5-6x for pgsql-general@lists.postgresql.org; Mon, 23 Dec 2024 18:30:08 +0000 Received: from mail-oa1-x33.google.com ([2001:4860:4864:20::33]) by magus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.94.2) (envelope-from ) id 1tPnBs-001GKl-Fr for pgsql-general@postgresql.org; Mon, 23 Dec 2024 18:30:07 +0000 Received: by mail-oa1-x33.google.com with SMTP id 586e51a60fabf-2a01bcd0143so2980570fac.2 for ; Mon, 23 Dec 2024 10:30:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1734978602; x=1735583402; darn=postgresql.org; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :from:to:cc:subject:date:message-id:reply-to; bh=LiuCJ4tbmoNTyVu0hVb3uYsSHurs9zo8h9iBQBhguHQ=; b=mZd4Z+Bj/ycNHqfboatvJfB1+S9YFd4feYmtpSFv0Ms8d/X2IR9LTBMiwAgvB0ivXR gemqtFw0cTq//r5UkgmO4ycYioOYzWUO0CK6nxXQKY5utHxv0sEkwu3upc4QqO3vM5L/ xCQhpL5Rectq5lLTxmWeMmfxASTIOXCMUFbPzvJm5JDF5XicgM6y9VIQs15CsSwGSEDF baHlYYCw4CI2MOVe+JoYbNsuBawxBkbRP1Rj9i1tEBc68zOAA5BAlFq9PNat9iwfPBLg HBhkj6+J7dKSUe3d7CF9tPPmbH2vJOpj1Q9p0AsWzUeyuOaI5dEcCoaT344ITTNl//YR 5bEQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1734978602; x=1735583402; h=to:subject:message-id:date:from:in-reply-to:references:mime-version :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LiuCJ4tbmoNTyVu0hVb3uYsSHurs9zo8h9iBQBhguHQ=; b=ww4cMFX7rA60Dim0/MJXh5EKWEmM62jzgiF05zJDV/cGTFPSjeLjd134liUB3hB44K NUK3EJYPXlX++8dwqrzz81vaSblWkU4irrJaUBt15kY/nJd+FegHYWngMOerJTkIZAlw wm3Fp81PDr9l9p+Ib1ygDFpgM3UUHdExVShrsXvXNILBasxRXwPE3dg+Z57Ju9Dc+klz /GG568+82+Q7Vw/fmWKkTbqDK9lqqrzy00bADbMFXxChDr0XDomHOq9VvVLMjMOLdHqu gz/qMa6aaa0KC3BM0XnaUhI3+VerirpLGGsBcZpjH/V0RLS68a4tXz6q701naM6s77le ZTBg== X-Gm-Message-State: AOJu0YzRjR+/gieGdqBN3Zb//KIDUca3gbLkh4tOkwlzlTIXGyJMhIMh pqZHi7306I7r09maVzvfOo7Tc1Tuhv8/MhLxoEegbYuMJMuugL98cX2PELo+1T5ixp6Iq/WJJgJ dL9I1vTDiyCwxCmcFtPan7PHMvcB7Lg== X-Gm-Gg: ASbGncv0ojpkzTJRZ3QsvJmGn9P15GKBOGqewV6gSp5wo4tsN90iwBABmAQ39DFpZih A92icDDnRsqSgPnlCLLNyxaxyVxX+bQMgaXo6rm1JVC+WEUAumOgJMswqEIp9wotgbWh5ADv4 X-Google-Smtp-Source: AGHT+IHI2heBFvIk/Hxn8UOfrbpwvZtLZJfJl6IJswTuv6VkdeRYV6wEpm2yRfvtsLnHeCUVRB1jDmNzwbLkSpFfzfc= X-Received: by 2002:a05:6870:3c10:b0:295:ed0a:8061 with SMTP id 586e51a60fabf-2a7fb553bf8mr7273627fac.39.1734978602589; Mon, 23 Dec 2024 10:30:02 -0800 (PST) MIME-Version: 1.0 References: <06e1f1ee-74b2-43a2-9a63-da20ae455ae2@aklaver.com> In-Reply-To: From: Ron Johnson Date: Mon, 23 Dec 2024 13:29:51 -0500 Message-ID: Subject: Re: Need help in database design To: pgsql-general Content-Type: multipart/alternative; boundary="0000000000004ce52c0629f42f7a" List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk --0000000000004ce52c0629f42f7a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Where are the 50 "t* columns? On Mon, Dec 23, 2024 at 1:26=E2=80=AFPM Divyansh Gupta JNsThMAudy < ag1567827@gmail.com> wrote: > Ron here is the entire table schema FYI, userid is the mandate column on > which filter is always applies: > > CREATE TABLE dbo.googledocs_tbl ( > > gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALUE 1 > MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL, > > userid int8 NOT NULL, > > useremail varchar(600) NOT NULL, > > title public.citext NULL, > > authorname varchar(600) NULL, > > authoremail varchar(600) NULL, > > updated varchar(300) NOT NULL, > > entryid varchar(2000) NOT NULL, > > lastmodifiedby varchar(600) NULL, > > lastmodifiedbyemail varchar(600) NULL, > > "size" varchar(300) NULL, > > contenttype varchar(250) NULL, > > fileextension varchar(50) NULL, > > docfoldername public.citext NULL, > > folderresourceid public.citext NULL, > > filesize int8 DEFAULT 0 NOT NULL, > > retentionstatus int2 DEFAULT 0 NOT NULL, > > docfileref int8 NULL, > > usid int4 NULL, > > archivepath varchar(500) NULL, > > createddate timestamp(6) DEFAULT NULL::timestamp without time zone NULL, > > zipfilename varchar(100) NULL, > > oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT NULL, > > onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL, > > startsnapshot int4 DEFAULT 0 NOT NULL, > > currentsnapshot int4 DEFAULT 0 NOT NULL, > > dismiss int2 DEFAULT 0 NOT NULL, > > checksum varchar NULL, > > typeoffile int2 GENERATED ALWAYS AS ( > > CASE > > WHEN authoremail::text =3D useremail::text THEN 0::smallint > > ELSE 1::smallint > > END) STORED NOT NULL, > > parquetfilename varchar(100) NULL, > > metadata_vector tsvector GENERATED ALWAYS AS (to_tsvector('english':: > regconfig, (((((COALESCE(title::character varying, ''::text::character > varying)::text || ' '::text) || (COALESCE(docfoldername::text, ''::text) > || ' '::text)) || (COALESCE(authorname, ''::text::character varying)::tex= t > || ' '::text)) || (COALESCE(fileextension, ''::text::character varying):: > text || ' '::text)) || (COALESCE(lastmodifiedby, ''::text::character > varying)::text || ' '::text)) || COALESCE(contenttype, ''::character > varying::text::character varying)::text)) STORED NULL, > > isfileencrypted int4 DEFAULT 0 NULL, > > addons_json jsonb DEFAULT '{}'::jsonb NULL, > > CONSTRAINT googledocs_tbl_clone_pkey PRIMARY KEY (gdid, userid), > > CONSTRAINT fk_googledocs_tbl_clone_users_tbl FOREIGN KEY (userid) > REFERENCES dbo.users_tbl(uid) ON DELETE CASCADE > > ) > > PARTITION BY HASH (userid); > > On Mon, Dec 23, 2024 at 11:53=E2=80=AFPM Divyansh Gupta JNsThMAudy < > ag1567827@gmail.com> wrote: > >> Adrian, Please check this out; >> >> PARTITION BY HASH (userid); CREATE TABLE dbo.googledocs_tbl_clone_part_0 >> PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, >> remainder 0); ... CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITIO= N >> OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83); >> >> On Mon, Dec 23, 2024 at 11:48=E2=80=AFPM Divyansh Gupta JNsThMAudy < >> ag1567827@gmail.com> wrote: >> >>> Adrian, the partition is on userid using hash partition with 84 >>> partitions >>> >>> Ron, there could be more than 20 Million records possible for a single >>> userid in that case if I create index on userid only not on other colum= n >>> the query is taking more than 30 seconds to return the results. >>> >>> On Mon, 23 Dec 2024, 11:40=E2=80=AFpm Ron Johnson, >>> wrote: >>> >>>> If your queries all reference userid, then you only need indices on >>>> gdid and userid. >>>> >>>> On Mon, Dec 23, 2024 at 12:49=E2=80=AFPM Divyansh Gupta JNsThMAudy < >>>> ag1567827@gmail.com> wrote: >>>> >>>>> I have one confusion with this design if I opt to create 50 columns I >>>>> need to create 50 index which will work with userid index in Bitmap o= n the >>>>> other hand if I create a JSONB column I need to create a single index= ? >>>>> >>>>> On Mon, 23 Dec 2024, 11:10=E2=80=AFpm Ron Johnson, >>>>> wrote: >>>>> >>>>>> Given what you just wrote, I'd stick with 50 separate t* columns. >>>>>> Simplifies queries, simplifies updates, and eliminates JSONB convers= ions. >>>>>> >>>>>> On Mon, Dec 23, 2024 at 12:29=E2=80=AFPM Divyansh Gupta JNsThMAudy < >>>>>> ag1567827@gmail.com> wrote: >>>>>> >>>>>>> Values can be updated based on customer actions >>>>>>> >>>>>>> All rows won't have all 50 key value pairs always if I make those >>>>>>> keys into columns the rows might have null value on the other hand = if it is >>>>>>> JSONB then the key value pair will not be there >>>>>>> >>>>>>> Yes in UI customers can search for the key value pairs >>>>>>> >>>>>>> During data population the key value pair will be empty array in >>>>>>> case of JSONB column or NULL in case of table columns, later when c= ustomer >>>>>>> performs some actions that time the key value pairs will populate a= nd >>>>>>> update, based on what action customer performs. >>>>>>> >>>>>>> On Mon, 23 Dec 2024, 10:51=E2=80=AFpm Divyansh Gupta JNsThMAudy, < >>>>>>> ag1567827@gmail.com> wrote: >>>>>>> >>>>>>>> Let's make it more understandable, here is the table schema with 5= 0 >>>>>>>> columns in it >>>>>>>> >>>>>>>> CREATE TABLE dbo.googledocs_tbl ( >>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALU= E >>>>>>>> 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL, >>>>>>>> userid int8 NOT NULL, >>>>>>>> t1 int4 NULL, >>>>>>>> t2 int4 NULL, >>>>>>>> >>>>>>>> [snip] > t46 int4 NULL, >>>>>>>> t47 int4 NULL, >>>>>>>> t48 int4 NULL, >>>>>>>> t49 int4 NULL, >>>>>>>> t50 int4 NULL, >>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), >>>>>>>> ); >>>>>>>> >>>>>>>> Every time when i query I will query it along with userid >>>>>>>> Ex : where userid =3D 12345678 and t1 in (1,2,3) and t2 in (0,1,2) >>>>>>>> more key filters if customer applies >>>>>>>> >>>>>>>> On the other hand if I create a single jsonb column the schema wil= l >>>>>>>> look like : >>>>>>>> >>>>>>>> CREATE TABLE dbo.googledocs_tbl ( >>>>>>>> gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMENT BY 1 MINVALU= E >>>>>>>> 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL, >>>>>>>> userid int8 NOT NULL, >>>>>>>> addons_json jsonb default '{}'::jsonb >>>>>>>> CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid), >>>>>>>> ); >>>>>>>> >>>>>>>> and the query would be like >>>>>>>> where userid =3D 12345678 and ((addons_json @> {t1:1}) or >>>>>>>> (addons_json @> {t1:2}) or (addons_json @> {t1:3}) >>>>>>>> more key filters if customer applies >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Mon, Dec 23, 2024 at 10:38=E2=80=AFPM David G. Johnston < >>>>>>>> david.g.johnston@gmail.com> wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Dec 23, 2024, 10:01 Divyansh Gupta JNsThMAudy < >>>>>>>>> ag1567827@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> So here my question is considering one JSONB column is perfect o= r >>>>>>>>>> considering 50 columns will be more optimised. >>>>>>>>>> >>>>>>>>> The relational database engine is designed around the column-base= d >>>>>>>>> approach. Especially if the columns are generally unchanging, co= mbined >>>>>>>>> with using fixed-width data types. >>>>>>>>> >>>>>>>>> David J. >>>>>>>>> >>>>>>>>> --=20 Death to , and butter sauce. Don't boil me, I'm still alive. lobster! --0000000000004ce52c0629f42f7a Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Where are the 50 "t* columns?
<= br>
On Mon, Dec 23, 2024 at 1:26=E2=80=AFPM Divyansh Gupta JNsT= hMAudy <ag1567827@gmail.com&g= t; wrote:
Ron here is the entire table schema FYI, userid is the mandate= =C2=A0column on which filter is always applies:

= CREATE TABLE= dbo.googledocs_tbl (

<= span style=3D"color:rgb(0,184,184)">gdid int8 GENERATED BY DEFAUL= T AS = IDENTITY( INCREMENT BY 1 MINVALUE 1<= /span> MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NOT NULL,

userid int8 NOT NULL,

useremail varchar(600) NOT<= /span> NULL,=

title public.citext NULL,

authorname varchar(600= ) NULL,

authore= mail varchar= (600) NULL,

updated varchar(300) NOT NULL,

entryid varchar(2000) NOT NULL,

lastmodifiedby varchar(600) NULL,

lastmodifiedbyemail varchar(600) NULL,

"size" varchar(300) NULL,

contenttype= varchar(250) NULL,

fileextension varchar(50) NULL,

docfoldername = public.citext NULL,

folderresourceid = public.citext NULL,

filesize int8 DEFAULT 0 NOT NULL,

retentionstatus int2 DEFAULT 0 NOT NULL,

docfileref int8 NUL= L,

= usid int4 NULL,

archivepath= varchar(500) NULL,

createddate timestamp(6) DEFAULT NULL::timestamp w= ithout time<= /span> zone = NULL,

zipfilename varchar(<= span style=3D"color:rgb(192,192,192)">100) NULL,

oncreatedat timestamp(6) DEFAULT clock_timestamp() NOT = NULL,

onupdateat timestamp(6) DEFAULT clock_timestamp() NOT NULL,

startsnapshot int4 DEFAULT 0 NOT NULL,

currentsnapshot int4 DEFAULT 0 NOT NULL,

dismiss int2 DEFAULT 0 NOT NULL<= /span>,

ch= ecksum varch= ar NULL,

typeof= file int2 GENERATED ALWAYS AS (

CASE

WHEN authoremail= ::text =3D <= span style=3D"color:rgb(158,158,158)">useremail::text THEN 0::sma= llint

ELSE 1::smallint=

END) STORED NOT NULL= ,

parquetfilename varchar(100) NULL,

metadata_vector tsvector GENERATED ALWAYS AS (to_tsvector('english'::regconfig, (((((COALESCE(= title::chara= cter varying= , ''::text::character varying)::text || ' '::text) || (COALESCE(docf= oldername::t= ext, ''::text) || ' '::text)) || (COALESCE(authorname, '&= #39;::text::character varying= )::text || <= span style=3D"color:rgb(202,197,128)">' '::text)) || (COALESCE(fileextension, ''::text::ch= aracter vary= ing)::text || ' '::text)) || (COALESCE(lastmodifiedby, ''::text::character varying)::text || ' '::text)) |= | COALESCE(<= span style=3D"color:rgb(158,158,158)">contenttype, ''::character varying::text::character varying)::text)) STORED NUL= L,

isfileencrypted int4 D= EFAULT 0 NULL,

addons_json jsonb DEFAULT '{}'::jsonb NULL,

CONSTRAINT googledocs_tbl_clone_pkey PRIMARY KEY (gdid, userid= ),

CONSTRAINT fk_goo= gledocs_tbl_clone_users_tbl FOREIGN KEY (userid) REFERENCES dbo.users_tbl(uid= ) ON = DELETE CASCADE

)

PARTITION BY = HASH (userid);


On Mon, Dec 23, 2024= at 11:53=E2=80=AFPM Divyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:
Adrian= , Please check this out;

PARTITION BY HASH (userid); = CREATE TABLE dbo.googledocs_tbl_clone_part_0 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 0); ... = CREATE TABLE dbo.googledocs_tbl_clone_part_83 PARTITION OF dbo.googledocs_tbl_clone FOR VALUES WITH (modulus 84, remainder 83);

On Mon, Dec 23, 2024 at 11:48=E2=80=AFPM D= ivyansh Gupta JNsThMAudy <ag1567827@gmail.com> wrote:

Adrian, the partition is on= userid using hash partition with 84 partitions

Ron, there could be more than 20 Million records possible fo= r a single userid in that case if I create index on userid only not on othe= r column the query is taking more than 30 seconds to return the results.

On Mon= , 23 Dec 2024, 11:40=E2=80=AFpm Ron Johnson, <ronljohnsonjr@gmail.com> wrote:
=
If your queries all reference userid, then you only need indices on gd= id and userid.

On Mon, Dec 23, 2024 at 12:49=E2=80=AFPM Divyansh Gupta JNs= ThMAudy <ag1567827@gmail.com> wrote:

I have one confusion with t= his design if I opt to create 50 columns I need to create 50 index which wi= ll work with userid index in Bitmap on the other hand if I create a JSONB c= olumn I need to create a single index ?


On Mon= , 23 Dec 2024, 11:10=E2=80=AFpm Ron Johnson, <ronljohnsonjr@gmail.c= om> wrote:
Given what you just wrote, I'd stick with 50 se= parate t* columns.=C2=A0 Simplifies queries, simplifies updates, and elimin= ates JSONB conversions.

On Mon, Dec 23, 2024 at 12:29=E2=80=AFPM Divyansh Gu= pta JNsThMAudy <ag1567827@gmail.com> wrote:
=

Values can= be updated based on customer actions

All rows won't have all 50 key value pairs always if I m= ake those keys into columns the rows might have null value on the other han= d if it is JSONB then the key value pair will not be there

Yes in UI customers can search for the key value pairs

During data population the key value pair will be empty arra= y in case of JSONB column or NULL in case of table columns, later when cust= omer performs some actions that time the key value pairs will populate and = update, based on what action customer performs.


On Mon= , 23 Dec 2024, 10:51=E2=80=AFpm Divyansh Gupta JNsThMAudy, <= ag1567827@gmail.com> wrote:
Let's make it more understandable, = here is the table schema with 50 columns in it=C2=A0

CREATE TABLE db= o.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCREMEN= T BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE) NO= T NULL,
userid int8 NOT NULL,
t1 int4 NULL,
t2 int4 NULL,
<= br>
[snip]=C2=A0
t46 int4 NULL,
t47 int4 NULL,
t48 int4 NULL,
t49 int4 NUL= L,
t50 int4 NULL,
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid)= ,
);

Every time when i query I will query it along with userid=C2= =A0
Ex : where userid =3D 12345678 and t1 in (1,2,3) and t2 in (0,1,2)more key filters if customer applies=C2=A0

On the other hand if I = create a single jsonb column the schema will look like :

CREATE TABL= E dbo.googledocs_tbl (
gdid int8 GENERATED BY DEFAULT AS IDENTITY( INCR= EMENT BY 1 MINVALUE 1 MAXVALUE 9223372036854775807 START 1 CACHE 1 NO CYCLE= ) NOT NULL,
userid int8 NOT NULL,
addons_json jsonb default '{}= '::jsonb
CONSTRAINT googledocs_tbl_pkey PRIMARY KEY (gdid),
);
and the query would be like=C2=A0
where userid =3D 12345678 and ((= addons_json=C2=A0@> {t1:1}) or=C2=A0 (addons_json=C2=A0@> {t1:2}) = or=C2=A0 (addons_json=C2=A0@> {t1:3})<= br>more key filters if customer applies=C2=A0



On Mon, Dec 23= , 2024 at 10:38=E2=80=AFPM David G. Johnston <david.g.johnston@gmail.com> wrote:


On Mon, Dec 23, = 2024, 10:01 Divyansh Gupta JNsThMAudy <ag1567827@= gmail.com> wrote:


So here my que= stion is considering one JSONB column is perfect or considering 50 columns = will be more optimised.

The r= elational database engine is designed around the column-based approach.=C2= =A0 Especially if the columns are generally unchanging, combined with using= fixed-width data types.

David J.

<= /div>
=


= --
D= eath to <Redacted>, and butter sauce.
Don't boil me, I'm = still alive.
<Redacted> lobster!
=
--0000000000004ce52c0629f42f7a--