Technical product specifications embeddings

I have an excel file with thousands of products and each with hundreds of specifications ( numeric and alphabetic). Is the best way to tokenize each and create embeddings? Are there any best practices recommended?

Hi @git2sunder, and welcome to the Pinecone forums!

Thanks for your question.

Yes, at a high level, if you want to be able to perform semantic search on your products and specifications, you’ll be looking to convert each product and its related specifications to embeddings.

These embeddings will represent the meanings and relationships of the semantic entities in your products and specifications data.

You’ll then upsert these embeddings (vectors) to Pinecone - and when doing so you’ll likely want to attach some metadata, perhaps associating each specification with its parent product.

That way, when your application retrieves relevant vectors from Pinecone, you can examine the metadata in your retrieved items to know which product the specification is related to.

Here’s a guide to using metadata for filtering once you have created your index.

You may also want to review our Quickstart.

Hope this helps!


Here’s our guide on chunking strategies for LLMs.

1 Like