Is it possible to run a query without providing an embedding?

This might be a silly question, but I’m trying to create a duplicate image identifier.

I could manually make a loop that goes and runs a query for each image, finding possible duplicates of each. But with thousands of images, this will add up quickly.

Is there a way to ask “find me the most closely-related clusters of vectors”?

Hi @katangafor, and welcome to the Pinecone forums!

Thanks for your question. It’s definitely not silly!

If I’m understanding you correctly, you’d like to build an application that can consider an image and tell the user if it’s a duplicate of an image that the system has already seen.

Do I have that correct?

If so, you could convert your image to query vectors and use them to search your Pinecone index, which will return the vectors representing the nearest neighbors to your query image.

Your application code could then determine if the vectors are identical or if they’re close enough to represent a duplicate in your consideration.

Please also have a look at some of our Image search example Jupyter Notebooks that include facial similarity searches and image retrieval use cases.

You should hopefully be able to lift and shift some relevant code from these.

Hope that helps!

Best,
Zack

You may also be interested to know that Pinecone supports querying by ID: Query data - Pinecone Docs

There are some key limitations to be aware of with this functionality.

Hope that helps!

Best,
Zack