Vector DB vs. ElasticSearch/Opensearch?

How do Vector databases differ from ElasticSearch/Opersearch?

1 Like

With the introduction of vector seasrch in ES and OpenSearch, those can be used as vector databases as well. There are differences between Pinecone and them (eg, filtering, real-time index updates) mainly due to the fact that Pinecone was designed as a vector database from the ground up.

Hi James, would you be able to discuss a bit more about Pinecone indexing and how it reindexes when we have a large volume of data?
Also, how can we scale the infrastructure requirements as the number of documents increases?

2 Likes

Hi sure, indexing is done effectively instantly. If you upsert 1M new vectors in a 1B index, those new vectors will be available to your queries within milliseconds. Reindexing is done periodically as the index changes.
At the moment we can help you switch to larger indexes if needed, in the future we plan on supporting the function to increase index size with new pods without the need to create a new index.

Hi @jamesbriggs , with the introduction of a dedicated vector engine for OpenSearch: Introducing the vector engine for Amazon OpenSearch Serverless, now in preview | AWS Big Data Blog, I would like to understand better what Pinecone offers that OpenSearch does not.
Thanks!

hi @othrif — it depends on what you want to do. Pinecone is a dedicated vector DB — built from the ground up for vec search. The vec DB for Opensearch is not and so has some limitations on performance. If you’re looking for large datasets (more than a few million) with fast response times (<100ms) you will need a dedicated vector DB.

Some quick comparisons that may be helpful:

  • Pinecone can scale to billions of vectors thanks to approximate search algorithms, Opensearch uses exhaustive search — meaning they calculate the similarity between your query and every single indexed record — with this, you can reach a few million (if you’re lucky, and patient).
  • Pinecone supports hybrid search, including native support for sparse vectors produced by models like SPLADE. Opensearch afaict does not have the same level of support for hybrid.
  • Pinecone has near-instant index updates (p1 and s1 pods), you can write 1M records to the DB and query 0.05 secs later and retrieve those results. Opensearch cannot do this afaik.

I’m also aware of a few customers assessing whether to use Pinecone or Opensearch who found Pinecone to be a fair bit cheaper — but again, that may depend on your use-case and requirements.

2 Likes

Thanks, @jamesbriggs, we elected to go with Pinecone Enterprise!

3 Likes

Hi @jamesbriggs, we are also exploring options and I was confused by your the first point you present here. As I understand Opensearch does offer approximate search. Am I missing something?

1 Like

@jamesbriggs Hi, curious if you could comment on OpenSearch ANN capabilities vs Pinecone. I’ve heard “pinecone was designed as a vector DB from the ground up”, but it’s still unclear at a technical level what it can do that OpenSearch’s latest vector capabilities can’t.

1 Like