Search within Metadata

ForHum · December 9, 2024, 5:23pm

Hello Pineconers,

I’m working with a metadata field in Pinecone that contains text, and I need to search for a specific word within that text. Essentially, I am looking to perform a ‘contains’ match. However, I’m not sure how to achieve this in Pinecone.

Can this be done? If yes, how? Any guidance or examples would be greatly appreciated!

Thank you in advance for your help!

-ForHum

arjun · December 10, 2024, 2:33pm

Hi ForHum,

Great question, and welcome to the forum!

I understand that you are looking for something to help look for specific words inside the metadata. Any chance you are also embedding this metadata text in the vector database? As in, is this the text representation of the vectors in your index?

If that’s the case, it may be easier to implement keyword search (also known as lexical or sparse) and semantic search. That way, querying with specific words will return specific results in addition to semantically similar ones.

This is referred to as Hybrid Search, which you can learn more about here.

I also want to point out we are launching sparse indexes soon, which you may want to learn about as well, especially if you will often have these kind of keyword searches.

Other than that, we do support operators with metadata filtering at query time, although the closest thing we support to what you are describing is probably the $in operator.

I hope this helps; please let us know if you have more questions!

Sincerely,
Arjun

ForHum · December 10, 2024, 8:45pm

Thanks Arjun. I think hybrid is the way to go for our use case. Pardon my ignorance, but how would the new sparse indexing fit or differentiate from hybrid option?

arjun · December 11, 2024, 7:47pm

Hi again!

No problem, that’s a great question! This depends on where you are doing your search, but I will explain it in the context of what Pinecone offers now.

Hybrid search indexes dense and sparse vectors in the same index, and relies on searching over these jointly. From our docs:

Sparse-dense workflow

Using sparse-dense vectors involves the following general steps:

Create dense vectors using a dense embedding model.
Create sparse vectors using a sparse embedding model.
Create an index with the dotproduct metric.
Upsert sparse-dense vectors to your index.
Search the index using sparse-dense vectors.
Pinecone returns sparse-dense vectors.

When using sparse, you will use a new model we have developed that is specifically designed to maximize performance in situations where you’d need keyword search. Accordingly, it uses a sparse-only index which is on a waitlist.

So with Hybrid, you’d have a specialized index that combines both, but with sparse, you’d have one that specializes in that search.

In order to choose, you should eval if you need this kind of performance all of the time, or only occasionally.

Most people choose to support sparse and dense search, and then combine them, much like we recommend in that cascading retrieval article.