Introduction to K-Means Clustering

discobot · March 14, 2022, 3:11pm

With massive data volumes growing at exponential rates, we need to find scalable methods to process them and find insights. The world of data entered the Zettabyte era several years ago. What’s a Zettabyte? Well, it is enough storage for 30 billion 4K movies, or 60 billion video games, or 7.5 trillion MP3 songs.

Today, the total amount of data created, captured, copied, and consumed globally is in the order of 100 Zettabytes and just keeps growing.

This is a companion discussion topic for the original entry at https://www.pinecone.io/learn/k-means-clustering/

mihow · February 23, 2023, 7:42pm

Is it possible to use any of these clustering methods with Pinecone or a vector database in general?

For example, what if I have a vector database of images, and each image has a category attribute. I need to display 10 example images from a single category and I need the examples to be very different from each other (representative of the types of variation within that category).
For example, “show me 10 types of roses”.

How could I group all of the images in the “roses” category into 10 clusters and then return the a representative example image from each cluster?

Thank you!

ptah23 · June 21, 2023, 9:13am

this article is heavy on theory and has no practical code. also this is not supported by pinecone. you will need to pull all your vectors into memory and use scikit learn to make clusters. 0 stars