Vector DB vs. ElasticSearch/Opensearch?

sophiem · April 28, 2022, 3:01pm

How do Vector databases differ from ElasticSearch/Opersearch?

jamesbriggs · April 29, 2022, 11:18am

With the introduction of vector seasrch in ES and OpenSearch, those can be used as vector databases as well. There are differences between Pinecone and them (eg, filtering, real-time index updates) mainly due to the fact that Pinecone was designed as a vector database from the ground up.

rbgo · June 10, 2022, 8:55am

Hi James, would you be able to discuss a bit more about Pinecone indexing and how it reindexes when we have a large volume of data?
Also, how can we scale the infrastructure requirements as the number of documents increases?

jamesbriggs · June 13, 2022, 11:23am

Hi sure, indexing is done effectively instantly. If you upsert 1M new vectors in a 1B index, those new vectors will be available to your queries within milliseconds. Reindexing is done periodically as the index changes.
At the moment we can help you switch to larger indexes if needed, in the future we plan on supporting the function to increase index size with new pods without the need to create a new index.

othrif · August 21, 2023, 6:53pm

Hi @jamesbriggs , with the introduction of a dedicated vector engine for OpenSearch: Introducing the vector engine for Amazon OpenSearch Serverless, now in preview | AWS Big Data Blog, I would like to understand better what Pinecone offers that OpenSearch does not.
Thanks!

jamesbriggs · September 6, 2023, 9:49pm

hi @othrif — it depends on what you want to do. Pinecone is a dedicated vector DB — built from the ground up for vec search. The vec DB for Opensearch is not and so has some limitations on performance. If you’re looking for large datasets (more than a few million) with fast response times (<100ms) you will need a dedicated vector DB.

Some quick comparisons that may be helpful:

Pinecone can scale to billions of vectors thanks to approximate search algorithms, Opensearch uses exhaustive search — meaning they calculate the similarity between your query and every single indexed record — with this, you can reach a few million (if you’re lucky, and patient).
Pinecone supports hybrid search, including native support for sparse vectors produced by models like SPLADE. Opensearch afaict does not have the same level of support for hybrid.
Pinecone has near-instant index updates (p1 and s1 pods), you can write 1M records to the DB and query 0.05 secs later and retrieve those results. Opensearch cannot do this afaik.

I’m also aware of a few customers assessing whether to use Pinecone or Opensearch who found Pinecone to be a fair bit cheaper — but again, that may depend on your use-case and requirements.

othrif · September 6, 2023, 10:06pm

Thanks, @jamesbriggs, we elected to go with Pinecone Enterprise!

borja.lozano · October 10, 2023, 8:48am

Hi @jamesbriggs, we are also exploring options and I was confused by your the first point you present here. As I understand Opensearch does offer approximate search. Am I missing something?

austinmw89 · January 7, 2024, 1:10am

@jamesbriggs Hi, curious if you could comment on OpenSearch ANN capabilities vs Pinecone. I’ve heard “pinecone was designed as a vector DB from the ground up”, but it’s still unclear at a technical level what it can do that OpenSearch’s latest vector capabilities can’t.