I’m working with a metadata field in Pinecone that contains text, and I need to search for a specific word within that text. Essentially, I am looking to perform a ‘contains’ match. However, I’m not sure how to achieve this in Pinecone.
Can this be done? If yes, how? Any guidance or examples would be greatly appreciated!
I understand that you are looking for something to help look for specific words inside the metadata. Any chance you are also embedding this metadata text in the vector database? As in, is this the text representation of the vectors in your index?
If that’s the case, it may be easier to implement keyword search (also known as lexical or sparse) and semantic search. That way, querying with specific words will return specific results in addition to semantically similar ones.
Thanks Arjun. I think hybrid is the way to go for our use case. Pardon my ignorance, but how would the new sparse indexing fit or differentiate from hybrid option?
No problem, that’s a great question! This depends on where you are doing your search, but I will explain it in the context of what Pinecone offers now.
Hybrid search indexes dense and sparse vectors in the same index, and relies on searching over these jointly. From our docs:
Sparse-dense workflow
Using sparse-dense vectors involves the following general steps:
When using sparse, you will use a new model we have developed that is specifically designed to maximize performance in situations where you’d need keyword search. Accordingly, it uses a sparse-only index which is on a waitlist.
So with Hybrid, you’d have a specialized index that combines both, but with sparse, you’d have one that specializes in that search.
In order to choose, you should eval if you need this kind of performance all of the time, or only occasionally.
Most people choose to support sparse and dense search, and then combine them, much like we recommend in that cascading retrieval article.