Understanding The pinecone vector search response

Hi All, I am a Laravel developer. We use the MySQL database to store records.

We are using the full_text search method to get relevant records based on user search queries, we use the tags for search that are stored in records.

To get the more relevant results of user search query we go with semantic search with pinecone.

All things are done well.

But I have two records that response is suspicious.

Record-1 :-

tags :- diwali,special,exhibition,poster,posters,festival,festivals,divali,dewali,deepavali,deepawali,red,elegant
created_at :- 2023-10-24 11:26:03

Record-2 :-

tags :- diwali,festival,lights,festivals,deepawali,deepavali,greetings,wish,wishes,card,greeting,design,celebration,dipawali,दीपावली,dewali,deewali,dipavali,दिवाळी,divali,dipawli,wishfully,festivel,featival,dipabali,fastivel,depavali,lighted

created_at :- 2022-07-25 09:52:49

The response for this records are when I search for Diwali:-

  1. Record-2 with 0.698541462.
  2. Record-1 with 0.614984274.

As of my know this score will be generated based on how close vectors are.

Both records have diwali tags but the record-2 has more diwali tags than record-1.

My query is that :- Is the tags cout reason for this type of score response or there an another things are reason for this type of score response ?


To be able to help here, I suspect we might need to know some more details.

Are you using hybrid search, or is it pure semantic search?

Can you share the search vector and the result vectors?

How are you storing the tags inside Pinecone? Is it the metadata, or are you generating the embeddings from them?