Received: from malur.postgresql.org ([217.196.149.56]) by arkaria.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wVANd-001kIR-19 for pgsql-general@arkaria.postgresql.org; Thu, 04 Jun 2026 15:53:13 +0000 Received: from localhost ([127.0.0.1] helo=malur.postgresql.org) by malur.postgresql.org with esmtp (Exim 4.96) (envelope-from ) id 1wVANc-007bp8-0h for pgsql-general@arkaria.postgresql.org; Thu, 04 Jun 2026 15:53:12 +0000 Received: from makus.postgresql.org ([2001:4800:3e1:1::229]) by malur.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1wUmtP-001uTV-2X for pgsql-general@lists.postgresql.org; Wed, 03 Jun 2026 14:48:27 +0000 Received: from sonic305-20.consmr.mail.gq1.yahoo.com ([98.137.64.83]) by makus.postgresql.org with esmtps (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.98.2) (envelope-from ) id 1wUmtN-00000000vMi-0xUn for pgsql-general@lists.postgresql.org; Wed, 03 Jun 2026 14:48:26 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1780498103; bh=z6lvmuZpJEh7EHNc0rtA9P9wGhGgIkv+ZIKDnBjhu9s=; h=Date:From:To:References:Subject:From:Subject:Reply-To; b=h2ax3UfJ6Kkr/j1Ct+UpctyA/dv1/7x8AVlRChYnt0WmrROk/4Akvz44mqIWWGKQLskKxoZkNDx++lZqEmM0aKHjbZRs5LIKK4zgQHttXvZHbhF2Ooz/SpsOPybspOLeSfaXBQt0U26ngOHFUFtGMC/q+woFMbOOfzIkFHso0Hj1r4DwqlQR7NYQU9CmQudIGpfrxPv+PFmMcgASw/f/m09ScyXSAv0lqg2WQ7lt86CxipiKACiTiEupUm1ueGEK3yJJoksHdP4YPbxdIUUTxKbXVq13eEiIgW9M5lmUiQNilrX9H+4ayHmI/NEbGs5UFv3OZiFi60tYsnkLJ4Yolg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1780498103; bh=LavXJNKnQwiFoQAGuibPtc06DGy0weX94ObX/X9ehMa=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=EbDlNav+H3fOzm9xXUe+z6hGTlMCQgiBJcRjSJx6ACO5tQAgTLVcScffpVEPyugPpopmagyqJGFQueMRrYRBjcWRXZWM6PTaWgzgH/l8vL2oD7mrLX17rSu5cqn7hgLoL0eH/fsV2FFiuAt5L1lX1EiYzOezIcak3rOAkN13LBj4g3GvnwM225N8AnIN9FjatPMXOJucA4SbD3+QoxFN+9ynrl60sh4aC8TnSdpm5arNorSEuL9VxZd2dLdboxchlcbiZghq4JYBQWqrOfBqKhDiyUTeUYVdv1NMFwGEOpoupgTKon/h3rPmhV2HBD8LjiEkVw3JJGc/6atSqmdWuw== X-YMail-OSG: icD7dy8VM1kLcA2zVbM6p3R3Ud6FxQmbpYMKoGdKh6Vk0jSdMBycLfphYtAH3zk KbZEBkKvBCkv1qGmo4hcFhWLXu75ncawgLywSe9laX11tZUPxb27MuDQrX45GNgSLKBT3XwMHfxm IsvHpKye2_k0o4gmKjV6tanmxzNw7aFaeIGIprVLQuGvcBPQXQoooG.jo6Fq3L3inLxVDYmA8sIn PqAxb_7WGJfTqYWcxQxIMIIfXk7wxn4mB.RKlaCXok70EKa.cz6gIpe4v1KsTlxIRnUo1b0edX3p GYMkqSYBJWvfY8B4LOxgVATMABLiNOXo7kHGUirtFn7hvWcSnLHwIA1CSE4OfWxey4B6QuOjpUE4 HKxpvwTMKAwuL8YL7bHxfETYwWIdmVMkXT6K_Aw9ObTHhoGEV5pZL_bE1sgywRzyUdnxj0TgHgy2 8trf.Wl2xfMi5X_3J9ID94zk2wqcB1NhmqxYxa1sGnJ6QgQJFNs8.pJFE5RCMpUQy8jMsdimquZQ 9vWM207jN1OKJ8q767GOcZCR85HbwwDwh5iDxQz07yC2.O4.KBSP4qAgcM5cD2XZrCX3hvyRH3jE zumLKbg8knF_7_Qqbmp_fC3Okttw7KOJJegtReGwJxkC9tvjtFkGX3w3MFhO9Fhg4LYTZq7q1gyO 0AALv2tUu9GfFC4_nfYh8P6xmdA6e4h0DBFvoB.jYo0NBmq797StBPFNW_vnhxRGn7rBbqZLl7qf iMadGwjKUouivcsVZHxCS1Czye.bat3eUp6_ayMkG8klnBjLcUrXsJ1XYw_4ee4NQsySb15ZZpG7 NW32A1VXFnoDpu1yWFNT0vilBiQoCqXGxTBjOeFlrgrJRdX8udAvXEcbamBCFMyN0v.wknjjUsKX XipkIY81CUtPdV0Wl6Oz7.UeTNXLXWNyhqhZquTeUsUwZqSUShvtr5Ur8z3tisotIY_m00jY_4Bl vq3OKR7ftdo1o33EzFmbFGqYOMCsy2BgDOQFqm1vDm_NcdVX5G13lv12EZm9.fLKCnoGKBMt8zbo 5dt2pbPUd0BYWZVwdbHdQdA7Hg27pbRV7JhhTV_vmViplR_zpEnbKju7GK61iRPEMUAD3Ckum6T2 LD9uJKysH9RtVq5NZZSzSmNquWADMMZfmXvHn8OUz7OJq8woGer0LuTD7tUAEu2v7yOknwqfC_RB wSdXz8lYBkgCNUgAE6DS5di90ee9hcaUkNYKSasBVx5ydDDaw5rK6Krd1WSEvLjZ6v5cVjanf.E0 nf2Rn9napJ3j_FBAM1T1PLI0l9QH9woqd3B8U466VPJ0q6.9unVC7XUttkvYuMadpdqS_WhjNMgO SJm8iobkwS1.iWYGbZdmKWTvGLPsrUa7v79tkyPuwLvO64mRhpSN0.qGQBvn1nZJcagwfZUKuokQ lerr8YcrT33qQCgQpNjW7FTw6bGYxN6CSuH8wyUuiBhLz9Jl1j.T76E3I.kLiItiIwFtDwWcHhcW 9TpbHHRB.c.WR46fxBLTKsG.SoKRMMC6Srw4xQj5jhkW6mvH5zYBtyCxETZvGFxz0GHuTO5hZ10L 36P8_oFjMe0Oddqs.67kesgKgUItCb36JGuuhhO9e.Mt62aEeE61JeN85d4RK8kStl2fkQqjQEBk l97uYkNP9aYhKOqgS074PtDtYcROds7Yx5afytJPKZb_IuU1L57QC68P5gVYycKjgM2MSmYWLPPI ZSX5K.WKvqMD9Ea0y6rdVb2FJwHduPjBf0VV8BTlE5Rv9wNz_wfXuN0wZh5WVoUVTzmTWjuq9spB QPwQgZb_VT6dGRCY2UAZ8rJkS_A5i2Sog_zj_lci5LFcJtTZ5cxhZtnyKfTMctUq6uNNaBI_VuYf mAGrnJKP4d.AzKmG5KXAmkrt9_EeD4wK6S3wQXLMc6lvSFFqw7un6OCU8mM8RHmminvEuoPER9wz AK9_IyH7Z3APcZaK_OiMkpMxOT0lk.2ODKnl5rDHq622cYxJyy0Zm9kdK4KtUZhGye8gQwPIyHiE 73vQgZrQaHO507Uow9Z.7S3kGiH2BkHVe71oERb0tEfxnTHq0WReUXuI4G3.Mi9S9PxqL1SnTDdK CRUyZguRznee7tuCcCDtz6bdX3MuYpYWTg1gQjArPx5cpTffTBIjFqb7F6VEPyvXV8FJB_gH7M3k HkWgwFen5Bzrk9cvIns4_d51zVm.tIt22jKAW7c3XbwQZ_aMgZsN7gcbQsNLOYXz5kl4d.SN0Q9X 98Ynv4MYNoCqKtyq6XvdTmb5A X-Sonic-MF: X-Sonic-ID: 13c3a2f3-85db-40a6-b5e7-811831ccb162 Received: from sonic.gate.mail.ne1.yahoo.com by sonic305.consmr.mail.gq1.yahoo.com with HTTP; Wed, 3 Jun 2026 14:48:23 +0000 Date: Wed, 3 Jun 2026 14:48:19 +0000 (UTC) From: apurba saha To: "pgsql-general@lists.postgresql.org" Message-ID: <1960411813.727800.1780498099726@mail.yahoo.com> References: <1960411813.727800.1780498099726.ref@mail.yahoo.com> Subject: Asking suggestions on how to vectorize big texts for conceptual searching in postgresql 18 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_727799_1264969099.1780498099724" X-Mailer: WebService/1.1.25882 YMailNorrin Content-Length: 8220 List-Id: List-Help: List-Subscribe: List-Post: List-Owner: List-Archive: Archived-At: Precedence: bulk ------=_Part_727799_1264969099.1780498099724 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Dear all, Very good morning please.I have some big texts in my tables. On average, ea= ch row contains about 4.2KB data and there are 9.5 million rows.I want to p= erform various conceptual searches on technical terms, technical phrases an= d would like to retrieve all texts with nearest meanings.=C2=A0 So I have t= o vectorize the data.What is the best approach please? I was trying to fragment the data into small fragments of 4.2 KB & then do = embedding using small vector size with the help of pgvector.Once I have the= embedding vectors on fragments, then I can combine them using some close r= elationship model or average. This way, we generate embedding for the full text. Or would you recommend any other approach to generate embedding for the ful= l text please? Also I have another question. I have title, abstract & description where de= scription is about 3KB and I would like to search title, abstract, descript= ion. Should I merge all the data (& generate embeddings) or keep the embedd= ings separate? Have a wonderful day please.Thank you,Apurba K. Saha ------=_Part_727799_1264969099.1780498099724 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
Dear all,

Very good morning please.
I have some big texts in my tables. On average, ea= ch row contains about 4.2KB data and there are 9.5 million rows.
I want to perform various conce= ptual searches on technical terms, technical phrases and would like to retr= ieve all texts with nearest meanings.  So I have to vectorize the data= .
What is the best ap= proach please?

I was trying to fragme= nt the data into small fragments of 4.2 KB & then do embedding using sm= all vector size with the help of pgvector.
Once I have the embedding vectors on fragments, then = I can combine them using some close relationship model or average.

This way, we generate embedding for the full = text.

Or would you recommend any othe= r approach to generate embedding for the full text please?

Also I have another question. I have title, abstract= & description where description is about 3KB and I would like to searc= h title, abstract, description. Should I merge all the data (& generate= embeddings) or keep the embeddings separate?

Have a wonderful day please.
Thank you,
Apurba K. Saha

------=_Part_727799_1264969099.1780498099724--