MongoDB $regex operator for metadata filter

arell.code · June 3, 2025, 7:50pm

Hi Pinecone Team,

I’d like to request support for the $regex operator in Pinecone’s metadata filtering.

In my current use case, I store a "city" field for each record, but the values are inconsistent. Some entries contain only the city name (e.g., "Berlin"), while others include more detail like "Berlin, Germany" or "San Francisco, California, United States".

This inconsistency makes exact matching unreliable when processing natural language queries like “people in Germany”, because I can’t confidently filter for "city": "Germany" or even use $in unless I’ve preprocessed every possible variation.

Having access to $regex would allow me to match substrings within the city field and handle these variable formats without extensive re-indexing or client-side filtering. For example:

{ "city": { "$regex": "Germany" } }

This would match "Berlin, Germany" and "Munich, Bavaria, Germany", regardless of how much information is included in the metadata string.

Would love to know if regex support is on your roadmap. This would unlock a lot of flexibility for use cases involving natural language search over semi-structured metadata.

milen · June 6, 2025, 11:49am

Hello and welcome to the Pinecone community forum @arell.code!

Thank you very much for your feature request. It makes a lot of sense to me. That said, implementing such features is usually more complex than it initially seems and often involves careful investigation of edge cases, impact on performance, and perhaps even restructuring how that data is stored under the hood. Thus, at this point, I’ll refrain from making any promises as to whether and when Pinecone would implement the feature. But I want to confirm the product team is aware of your request and is already investigating how it fits into the product roadmap.

What you can do, meanwhile, is preprocess the city field and convert its value to an array. For example, {"city": "San Francisco, California, United States"} would become {"city": ["San Francisco", "California", "United States"]}. Then all of the following filters should match:

{"city": "San Francisco"}
{"city": "California"}
{"city": {"$in": ["San Francisco", "California"]}}
{"$and": [{"city": "San Francisco"}, {"city": "California"}]}

I hope this helps in your particular use case. While it requires some preprocessing work on upsert, it should be significantly faster for querying than a regex match.