Enquiry about a search application implementation

Hi Pinecone support, I am trying to use Pinecone to implement a common text/semantic search problem. Not sure whether it is an ideal application for Pincecone.

Taking an e-commerce website as an exmaple, the search sceanrios can be these:

  1. Customers can search all type of IPhones by enter “IPhone”, the system will return all available IPhones like “IPhone 16”, “IPhone 15”, etc. This is straigtforward to implement I guess.
  2. The system defines product categories (e.g., cell phone, latop, desktop) behind the scenses. Customers are allowed to enter “cell phone”. The system will returne all available cell phones such as “samsung s24”, “vivo x100”, “IPhone 16”, etc. Of course, customers may just enter “phone” and the system will semantically match “cell phone”, returning the same results of “cell phone”.
  3. Customers may enter hardware specifications like “memory 512G”. The system will return all products (no matter cell phones, laptops) with any kind of memories (e.g., SSD) of 512G

#2,3 are my primay questiones. How do I use Pinecone to implement the search functions by the product category, specifications and more possible attributes.

PS Previously, we implemented similar search functions over in-memory data structues (e.g., array, hashtable) to leverage their out-of-the-box methods (e.g., string.contain) to do the search, but the performance, scalability are not quite ideal. After I learn Pinecone thanks to RAG, I feel this kind of search problems may be solved by it too. I imagine customers can flexibily enter either a simple term (IPhone 16) or a sentence (list all IPhones) to get the desired results. If they mistakenly enter a word, vector DB still return most semantically close items.

Thanks a lot for reading the lengthy question. Hope you would offer some recommendation or releated artices.

Regards,
Ricky

Hi Ricky!

Yes, Pinecone can definitely help you build this kind of application. This blog post on semantic search with Pinecone gives you a good overview of the components you need to build it, including a notebook with related sample code.: Semantic search with Pinecone | Pinecone

You’ll want to use a combination of a vector embedding (this is what it sounds like you’re getting at in part 2, a semantic representation of the product), and metadata for structured data filtering (like if you want to be able to filter by specific models): Filter with metadata - Pinecone Docs

If you don’t have an embedding model in mind already, you can use one of Pinecone’s hosted ones to create the embeddings: Generate embeddings - Pinecone Docs

Start there and let us know if you’ve got follow-up questions!

Hi Bear, much appreciated your response. And yes, I had tried some approaches described in Pinecone website to do the PoC.
One of my conclusions is if our applications need to find out the products at a nearly 100% accruacy rate, hybird searches (e.g., semantic search + BM25) seem inevitable.

Filter sounds to be another way to achieve our features. I will try it out.

Regards,
Ricky