Filtering on date metadata?

How can I filter based on date in metadata? can’t use integers based on year, month, day because it’s not valid. (e.g. 2024-01-01 should be greater than 2023-12-31, but it will be false based on month and day separately. can’t use string representation because $gte expects number). tried datetime iso format, still same error.
inserted this:
{‘id’: ‘test’,
‘metadata’: {‘expiry_date’: datetime.datetime(2023, 5, 4, 0, 0)},
‘values’: […]
}

but when querying:
ApiException: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict({‘content-type’: ‘application/json’, ‘date’: ‘Sat, 25 Mar 2023 16:09:43 GMT’, ‘x-envoy-upstream-service-time’: ‘0’, ‘content-length’: ‘119’, ‘server’: ‘envoy’}) HTTP response body: {“code”:3,“message”:“the $gte operator must be followed by a number, got "2022-01-01T00:00:00" instead”,“details”:}

Try storing your date as the UNIX epoch of midnight and zero seconds for the day of. That’s the most efficient way to handle date comparisons.

1 Like

But wouldn’t that void the advice on using low-cardinality datatypes for metadata filters?

High cardinality consumes more memory: Pinecone indexes metadata to allow for filtering. If the metadata contains many unique values — such as a unique identifier for each vector — the index will consume significantly more memory. Consider using selective metadata indexing to avoid indexing high-cardinality metadata that is not needed for filtering.

It would seem that one of the prime chat use cases is “memory recency” … so where you only want to select recent items from memory - or at least to weight those items higher.

1 Like

you are right but i think it’s a tradeoff. if you are omitting the hour, minute and second parts, it would be maximum 365 a year. (not so high cardinality). here’s the code i used:

expiry_date = datetime.now() + timedelta(days=30)
expiry_date = expiry_date.replace(hour=0, minute=0, second=0, microsecond=0)
expiry_date = int(time.mktime(expiry_date.timetuple()))  # unix epochs
    index.upsert(
        [
            (
                _id,
                embedding.embedding,
                {"expiry_date": expiry_date},
            ),
        ]
    )
1 Like

But then don’t you then run into your original problem where you can’t filter by date using pinecone metadata filters??

update:
oooh, i see now. i needed to do a little refresher on how the unix epoch’s worked. got it! thx.

@bigrig, it’s a balancing act. Yes, having high cardinality can impact performance. But Unix epochs give you the most flexibility when using value comparisons (gt, gte, lt, lte).

I’ve found if you set the resolution for the timestamps to be larger than by the second you can get the benefits of easy comparison while avoiding problems with cardinality. It depends on your use case but setting the resolution to the minute, quarter hour, hour, or even day could be the right solution.

Cory

1 Like

How does the syntax for the query look like when we have a metadata of type datetime.date?

Hi @angelommf21

currently metadata of type datetime.date is not supported, you can get around that with a few tricks mentioned above.

To check which datatypes are supported check the documentation: Filtering with metadata

Hope this helps

Thank you for the prompt answer. I had actually checked on that page before. The thing here is that even though I upsert dates as strings and even put the str() around those dates when I fetch them from the db they appear as datetime.date. On the attached prints, look specifically at the parts I circled in red (FYI: ‘data’==‘date’, it’s Portuguese).

Oh… nice! That is actually quite cool if you want to sort the results based on date, but I think this will only work after these vectors are fetched, as you can then cast these vales in your backend code. These values are considered as strings in the metadata, and I am guessing the object is just stringyfied to look like it does. But this isn’t that bad for some implementations. Interesting…

1 Like

You are right, although it says that date is a datatime.date type in my output, if I search by the initial string date it in fact retrieves the vector I want (‘id’ : ‘4’) in the first place.

However, this only happens because I used ‘$eq’, which can be used for strings. If I were to filter by, say ‘year < 2020’, there wouldn’t be a way to do it, right? Since ‘$lt’, ‘$gt’, $lte’ and ‘$gte’ only work for numbers and not for strings. How could one do that kind of data filtering/sorting in a query?

query_by_date

$eq ‘09/13/2018’ returned the correct vector. Now that is strange and awesome! No idea how this worked, but other operators will not work. I suggest using the Unix epochs trick mentioned above :slight_smile: or if you play around with your version and see if any other operators will work.

1 Like