Asking suggestions on how to vectorize big texts for conceptual searching in postgresql 18

public inbox for [email protected]  
help / color / mirror / Atom feed

From: apurba saha <[email protected]>
To: [email protected] <[email protected]>
Subject: Asking suggestions on how to vectorize big texts for conceptual searching in postgresql 18
Date: Wed, 3 Jun 2026 14:48:19 +0000 (UTC)
Message-ID: <[email protected]> (raw)
References: <[email protected]>

Dear all,
Very good morning please.I have some big texts in my tables. On average, each row contains about 4.2KB data and there are 9.5 million rows.I want to perform various conceptual searches on technical terms, technical phrases and would like to retrieve all texts with nearest meanings.  So I have to vectorize the data.What is the best approach please?
I was trying to fragment the data into small fragments of 4.2 KB & then do embedding using small vector size with the help of pgvector.Once I have the embedding vectors on fragments, then I can combine them using some close relationship model or average.
This way, we generate embedding for the full text.
Or would you recommend any other approach to generate embedding for the full text please?
Also I have another question. I have title, abstract & description where description is about 3KB and I would like to search title, abstract, description. Should I merge all the data (& generate embeddings) or keep the embeddings separate?
Have a wonderful day please.Thank you,Apurba K. Saha

reply

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Reply to all the recipients using the --to and --cc options:
  reply via email

  To: [email protected]
  Cc: [email protected], [email protected]
  Subject: Re: Asking suggestions on how to vectorize big texts for conceptual searching in postgresql 18
  In-Reply-To: <[email protected]>

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

This inbox is served by agora; see mirroring instructions
for how to clone and mirror all data and code used for this inbox