Filtering on date metadata?

How can I filter based on date in metadata? can’t use integers based on year, month, day because it’s not valid. (e.g. 2024-01-01 should be greater than 2023-12-31, but it will be false based on month and day separately. can’t use string representation because $gte expects number). tried datetime iso format, still same error.
inserted this:
{‘id’: ‘test’,
‘metadata’: {‘expiry_date’: datetime.datetime(2023, 5, 4, 0, 0)},
‘values’: […]
}

but when querying:
ApiException: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict({‘content-type’: ‘application/json’, ‘date’: ‘Sat, 25 Mar 2023 16:09:43 GMT’, ‘x-envoy-upstream-service-time’: ‘0’, ‘content-length’: ‘119’, ‘server’: ‘envoy’}) HTTP response body: {“code”:3,“message”:“the $gte operator must be followed by a number, got "2022-01-01T00:00:00" instead”,“details”:}

Try storing your date as the UNIX epoch of midnight and zero seconds for the day of. That’s the most efficient way to handle date comparisons.

1 Like

But wouldn’t that void the advice on using low-cardinality datatypes for metadata filters?

High cardinality consumes more memory: Pinecone indexes metadata to allow for filtering. If the metadata contains many unique values — such as a unique identifier for each vector — the index will consume significantly more memory. Consider using selective metadata indexing to avoid indexing high-cardinality metadata that is not needed for filtering.

It would seem that one of the prime chat use cases is “memory recency” … so where you only want to select recent items from memory - or at least to weight those items higher.

1 Like

you are right but i think it’s a tradeoff. if you are omitting the hour, minute and second parts, it would be maximum 365 a year. (not so high cardinality). here’s the code i used:

expiry_date = datetime.now() + timedelta(days=30)
expiry_date = expiry_date.replace(hour=0, minute=0, second=0, microsecond=0)
expiry_date = int(time.mktime(expiry_date.timetuple()))  # unix epochs
    index.upsert(
        [
            (
                _id,
                embedding.embedding,
                {"expiry_date": expiry_date},
            ),
        ]
    )
1 Like

But then don’t you then run into your original problem where you can’t filter by date using pinecone metadata filters??

update:
oooh, i see now. i needed to do a little refresher on how the unix epoch’s worked. got it! thx.

@bigrig, it’s a balancing act. Yes, having high cardinality can impact performance. But Unix epochs give you the most flexibility when using value comparisons (gt, gte, lt, lte).

I’ve found if you set the resolution for the timestamps to be larger than by the second you can get the benefits of easy comparison while avoiding problems with cardinality. It depends on your use case but setting the resolution to the minute, quarter hour, hour, or even day could be the right solution.

Cory

1 Like