Query by group of sentences

Hello there!

I’m developing a mobile app, in which each user has multiple keywords or sentences for its preferences or interests.
I’m trying to use pinecone to store and query this user prefs.

I would need to make some matchmaking for a user voice chat, so I need to query a group of keywords against the other terms, also grouped by user, and select the user which matches better by its group of keywords (not only one).

Would be some way to do so?

Thank you in advance :slight_smile:

Hi @axgalache,

When the user stores their preferences, how are you storing that data? As just the bare words, or are you transforming them into vectors via a model first?

Cory

I’m transforming each word/sentence to vectors. In fact I have both words and vectors.
I guess I’d need the vectors to compare against the ones stored in DB. I just need to know if I can send multiple vectors on my query

Hi @axgalache,

When you say “multiple vectors”, what do you mean exactly? Pinecone queries support sending a single vector array at a time; so you can query ‘[0.1,0.2,0.3]’ or ‘[0.4,0.5,0.6]’ but not both in same query.

If you have to query both of these to get the results you need there are some steps you can take. You could filter the results against each other in your app; or, take the average of each dimension in each vector and use that as a new vector. So using the three dimensional vectors above you would query with ‘[0.2,0.35,0.45]’. This works well for most applications since you end up getting results that are still close to either of the original vectors but weighted to being somewhere between them.

What are you building that needs multiple vectors per query?

Cory

Thank you for the feedback :slight_smile:
I’m planning to have each vector stored with metadata having user info (e.g. its id)

I would need to compare a group of user tags or interests with all other vectors and get the user with most similar tags (example: querying [“cars”,“races”, “f1”] the first result should be a hypothetical user with tags [“nascar”,“drag race”,“motors”,“drifting”].

Taking into account your response, I think a good approach could be taking the average of all vectors and perform the query.
I’ve tried performing a query for each vector, but the approach is quite bad attending to loading times.

I’ll go with the vectors average for now :slight_smile:
Thank you!

I would be cautious about how you’re planning to use metadata. As a rule, metadata should be used to filter down vectors specifically, not to perform general searches. For example, filtering users based on geographic location or the car brand they own.

If you want to filter based on similarity, that data should be stored in the vector. When generating your vector data for your users, you would include a section in the raw data of “interests” and list all the racing types they follow. This would then comprise part of their vector. When performing similarity searches between users, that part would be analyzed along with all the other details about them to find other users who share their interests.

Keep in mind that Pinecone is not a key-value store and is not optimized for storing or sorting that kind of data. Metadata is there to help filter down your searches and eliminate vectors to compare against, not find similar ones.

2 posts were split to a new topic: Averaging vector dimensions