Handling multiple fields in a document

adrian1 · July 3, 2023, 10:48am

Hi, we’re implementing semantic search using Pinecone for hybrid search. Our documents have a number of fields that it makes sense to keep separate.

To expand a little, The documents contain the employment histories of individuals.

Simplified version of some documents:
Document 1: John Smith was the first software engineer at Pinecone, reporting to the CEO
Document 2: Jane Jones, Chief Executive Officer of Pinecone
Document 3: 2015-2017 - Bob James, CEO, Pinecone

If we take the search query: “CEO and other leaders of companies like Pinecone”

In order to get max relevancy, we’re processing the search query to extract relevant entities (e.g. job title, Company name).

We’re also indexing those entities from each doc in separate namespaces so that we only search job titles in the job title “field” (and so that additional text that simply refers to the job title CEO isn’t searched). We want to keep the semantic similarity (so Chief Executive matches with CEO), so we aren’t using filters for this job.

All of this is fine, but ultimately we want to get the best ranked documents where Job Title is semantically similar to CEO AND Companies are semantically similar to Pinecone.

There doesn’t seem to be a way of getting a combined ranking score across namespaces. There also doesn’t appear to index documents in anything other than flat text format.

What am I missing? How are people out there solving problems of semantically searching across different “fields” using pinecone? This is trivial with something like Elasticsearch, but without the semantic similarity part!