Date: Wed, 3 Jun 2026 14:48:19 +0000 (UTC)
From: apurba saha <aksaha37@yahoo.com>
To: 
	"pgsql-general@lists.postgresql.org" <pgsql-general@lists.postgresql.org>
Message-ID: <1960411813.727800.1780498099726@mail.yahoo.com>
References: <1960411813.727800.1780498099726.ref@mail.yahoo.com>
Subject: Asking suggestions on how to vectorize big texts for conceptual
 searching in postgresql 18
MIME-Version: 1.0
Content-Type: multipart/alternative;
	boundary="----=_Part_727799_1264969099.1780498099724"
Content-Length: 8220
Archived-At: 
 <https://www.postgresql.org/message-id/1960411813.727800.1780498099726%40mail.yahoo.com>
Precedence: bulk

------=_Part_727799_1264969099.1780498099724
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Dear all,
Very good morning please.I have some big texts in my tables. On average, ea=
ch row contains about 4.2KB data and there are 9.5 million rows.I want to p=
erform various conceptual searches on technical terms, technical phrases an=
d would like to retrieve all texts with nearest meanings.=C2=A0 So I have t=
o vectorize the data.What is the best approach please?
I was trying to fragment the data into small fragments of 4.2 KB & then do =
embedding using small vector size with the help of pgvector.Once I have the=
 embedding vectors on fragments, then I can combine them using some close r=
elationship model or average.
This way, we generate embedding for the full text.
Or would you recommend any other approach to generate embedding for the ful=
l text please?
Also I have another question. I have title, abstract & description where de=
scription is about 3KB and I would like to search title, abstract, descript=
ion. Should I merge all the data (& generate embeddings) or keep the embedd=
ings separate?
Have a wonderful day please.Thank you,Apurba K. Saha

------=_Part_727799_1264969099.1780498099724
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

<html><head></head><body><div class=3D"yahoo-style-wrap" style=3D"font-fami=
ly:Helvetica Neue, Helvetica, Arial, sans-serif;font-size:13px;"><div dir=
=3D"ltr" data-setdir=3D"false"><div><div class=3D"ydpa5e0411ex_elementToPro=
of" data-olk-copy-source=3D"MessageBody" style=3D"border: 0px; font-stretch=
: inherit; font-size: 12pt; line-height: inherit; font-family: Calibri, Hel=
vetica, sans-serif; font-size-adjust: inherit; margin: 0px; padding: 0px; v=
ertical-align: baseline; color: rgb(0, 0, 0);">Dear all,</div><div class=3D=
"ydpa5e0411ex_elementToProof" style=3D"border: 0px; font-stretch: inherit; =
font-size: 12pt; line-height: inherit; font-family: Calibri, Helvetica, san=
s-serif; font-size-adjust: inherit; margin: 0px; padding: 0px; vertical-ali=
gn: baseline; color: rgb(0, 0, 0);"><br></div><div class=3D"ydpa5e0411ex_el=
ementToProof" style=3D"border: 0px; font-stretch: inherit; font-size: 12pt;=
 line-height: inherit; font-family: Calibri, Helvetica, sans-serif; font-si=
ze-adjust: inherit; margin: 0px; padding: 0px; vertical-align: baseline; co=
lor: rgb(0, 0, 0);">Very good morning please.</div><div class=3D"ydpa5e0411=
ex_elementToProof" style=3D"border: 0px; font-stretch: inherit; font-size: =
12pt; line-height: inherit; font-family: Calibri, Helvetica, sans-serif; fo=
nt-size-adjust: inherit; margin: 0px; padding: 0px; vertical-align: baselin=
e; color: rgb(0, 0, 0);">I have some big texts in my tables. On average, ea=
ch row contains about 4.2KB data and there are 9.5 million rows.</div><div =
class=3D"ydpa5e0411ex_elementToProof" style=3D"border: 0px; font-stretch: i=
nherit; font-size: 12pt; line-height: inherit; font-family: Calibri, Helvet=
ica, sans-serif; font-size-adjust: inherit; margin: 0px; padding: 0px; vert=
ical-align: baseline; color: rgb(0, 0, 0);">I want to perform various conce=
ptual searches on technical terms, technical phrases and would like to retr=
ieve all texts with nearest meanings.&nbsp; So I have to vectorize the data=
.</div><div class=3D"ydpa5e0411ex_elementToProof" style=3D"border: 0px; fon=
t-stretch: inherit; font-size: 12pt; line-height: inherit; font-family: Cal=
ibri, Helvetica, sans-serif; font-size-adjust: inherit; margin: 0px; paddin=
g: 0px; vertical-align: baseline; color: rgb(0, 0, 0);">What is the best ap=
proach please?</div><div class=3D"ydpa5e0411ex_elementToProof" style=3D"bor=
der: 0px; font-stretch: inherit; font-size: 12pt; line-height: inherit; fon=
t-family: Calibri, Helvetica, sans-serif; font-size-adjust: inherit; margin=
: 0px; padding: 0px; vertical-align: baseline; color: rgb(0, 0, 0);"><br></=
div><div class=3D"ydpa5e0411ex_elementToProof" style=3D"border: 0px; font-s=
tretch: inherit; font-size: 12pt; line-height: inherit; font-family: Calibr=
i, Helvetica, sans-serif; font-size-adjust: inherit; margin: 0px; padding: =
0px; vertical-align: baseline; color: rgb(0, 0, 0);">I was trying to fragme=
nt the data into small fragments of 4.2 KB &amp; then do embedding using sm=
all vector size with the help of pgvector.</div><div class=3D"ydpa5e0411ex_=
elementToProof" style=3D"border: 0px; font-stretch: inherit; font-size: 12p=
t; line-height: inherit; font-family: Calibri, Helvetica, sans-serif; font-=
size-adjust: inherit; margin: 0px; padding: 0px; vertical-align: baseline; =
color: rgb(0, 0, 0);">Once I have the embedding vectors on fragments, then =
I can combine them using some close relationship model or average.</div><di=
v class=3D"ydpa5e0411ex_elementToProof" style=3D"border: 0px; font-stretch:=
 inherit; font-size: 12pt; line-height: inherit; font-family: Calibri, Helv=
etica, sans-serif; font-size-adjust: inherit; margin: 0px; padding: 0px; ve=
rtical-align: baseline; color: rgb(0, 0, 0);"><br></div><div class=3D"ydpa5=
e0411ex_elementToProof" style=3D"border: 0px; font-stretch: inherit; font-s=
ize: 12pt; line-height: inherit; font-family: Calibri, Helvetica, sans-seri=
f; font-size-adjust: inherit; margin: 0px; padding: 0px; vertical-align: ba=
seline; color: rgb(0, 0, 0);">This way, we generate embedding for the full =
text.</div><div class=3D"ydpa5e0411ex_elementToProof" style=3D"border: 0px;=
 font-stretch: inherit; font-size: 12pt; line-height: inherit; font-family:=
 Calibri, Helvetica, sans-serif; font-size-adjust: inherit; margin: 0px; pa=
dding: 0px; vertical-align: baseline; color: rgb(0, 0, 0);"><br></div><div =
class=3D"ydpa5e0411ex_elementToProof" style=3D"border: 0px; font-stretch: i=
nherit; font-size: 12pt; line-height: inherit; font-family: Calibri, Helvet=
ica, sans-serif; font-size-adjust: inherit; margin: 0px; padding: 0px; vert=
ical-align: baseline; color: rgb(0, 0, 0);">Or would you recommend any othe=
r approach to generate embedding for the full text please?</div><div class=
=3D"ydpa5e0411ex_elementToProof" style=3D"border: 0px; font-stretch: inheri=
t; font-size: 12pt; line-height: inherit; font-family: Calibri, Helvetica, =
sans-serif; font-size-adjust: inherit; margin: 0px; padding: 0px; vertical-=
align: baseline; color: rgb(0, 0, 0);"><br></div><div class=3D"ydpa5e0411ex=
_elementToProof" style=3D"border: 0px; font-stretch: inherit; font-size: 12=
pt; line-height: inherit; font-family: Calibri, Helvetica, sans-serif; font=
-size-adjust: inherit; margin: 0px; padding: 0px; vertical-align: baseline;=
 color: rgb(0, 0, 0);">Also I have another question. I have title, abstract=
 &amp; description where description is about 3KB and I would like to searc=
h title, abstract, description. Should I merge all the data (&amp; generate=
 embeddings) or keep the embeddings separate?</div><div class=3D"ydpa5e0411=
ex_elementToProof" style=3D"border: 0px; font-stretch: inherit; font-size: =
12pt; line-height: inherit; font-family: Calibri, Helvetica, sans-serif; fo=
nt-size-adjust: inherit; margin: 0px; padding: 0px; vertical-align: baselin=
e; color: rgb(0, 0, 0);"><br></div><div class=3D"ydpa5e0411ex_elementToProo=
f" style=3D"border: 0px; font-stretch: inherit; font-size: 12pt; line-heigh=
t: inherit; font-family: Calibri, Helvetica, sans-serif; font-size-adjust: =
inherit; margin: 0px; padding: 0px; vertical-align: baseline; color: rgb(0,=
 0, 0);">Have a wonderful day please.</div><div class=3D"ydpa5e0411ex_eleme=
ntToProof" style=3D"border: 0px; font-stretch: inherit; font-size: 12pt; li=
ne-height: inherit; font-family: Calibri, Helvetica, sans-serif; font-size-=
adjust: inherit; margin: 0px; padding: 0px; vertical-align: baseline; color=
: rgb(0, 0, 0);">Thank you,</div><div class=3D"ydpa5e0411ex_elementToProof"=
 style=3D"border: 0px; font-stretch: inherit; font-size: 12pt; line-height:=
 inherit; font-family: Calibri, Helvetica, sans-serif; font-size-adjust: in=
herit; margin: 0px; padding: 0px; vertical-align: baseline; color: rgb(0, 0=
, 0);">Apurba K. Saha</div></div><br></div></div></body></html>
------=_Part_727799_1264969099.1780498099724--