(These questions were submitted in a recent workshop.)
David S. asked: What does Pinecone bring compared to emerging vector databases like pgvector?
Josh G. asked: Why use a specialized vector database vs. an RDBMS solution like pg_vector? What pros/cons do you see for each?
For those not familiar, pgvector is an extension for PostgreSQL that brings vector search into the database. As of the 0.5.0 update it includes the option to use an HNSW vector index.
There are two main differences between Pinecone and Postgres with pgvector:
1. Purpose-built vs. bolted-on
Pinecone is a purpose-built vector database — everything from its architecture to vector indexing algorithms to functionality were designed and built specifically for working with vector embeddings. As a result, it provides better performance and cost efficiency at scale, that’s hard or impossible to match with a bolted-on solution.
This has been true for any new data type and query pattern in the past 50 years (eg, NoSQL, time series, etc), and it’s true for vector embeddings and vector search. You can read more about where bolted-on HNSW solutions fall short for production applications, and why at Pinecone we decided to build everything from scratch.
2. OSS vs. Cloud-Native
Postgres and pgvector are open-source, so you have to deploy, scale, and maintain the system on your own. Pinecone is only available as a managed service. This comes down to the same point as any build-vs-buy decision: Do you want (or need) to manage your own infrastructure, or do you want to use a managed service and not have to worry about it?
There is also the option of using pgvector inside a hosted Postgres service. In this case, refer to #1 above: The seeming convenience of having vector search in the same place as the raw data comes at a significant cost and operational burden (eg, frequent index rebuilds with downtime that needs to be accounted for).
If you have any hopes of scaling your vector-search powered application (like RAG), we strongly recommend starting with a purpose-built and cloud-native solution like Pinecone from the very beginning so that you’re not left needing to switch systems later when you’re under pressure to get things into production or to quickly scale up to handle growing demand.