Hey folks -
Been using Pinecone for the better part of two months, experimenting with RAG schemes for interrogation of different data sets.
Recently, I’ve been running up against the issue that translating questions whose format does not align with the content format itself, thereby matching to vectors that don’t directly relate to the query. I’m working with a set of emails, for reference.
I’ve worked around this using tags and then preprocessing queries to search subsets of embeddings with pre-defined topics, which has worked decently, but is difficult to chain when an inquiry contains multiple qualifiers concerning the content being interrogated. For example, a complicated query to translate into tag subsets is something like “Were there any meetings that took place between January 7th and January 20th of 2023 that were cancelled due to travel restrictions?”.
This makes me wonder about the capacity to add a kind of ‘relational’ scheme to the content in the DB.
My current strategy involves making a more thorough metadata ‘tree’ of concepts included for each embedded email, but still I wonder: is it possible to add some kind of logical connective tissue [system] to the embedding space so that a query with multiple qualifiers/quantifiers can be directed through an index in a similar manner to how neo4j creates pointer relationships between nodes?
Any insight would help! I am a computer science and cognitive science senior at the University of Michigan and love learning about this stuff, so everything is helpful/interesting!
Cheers,
-Noah