Index.fetch() with serverless and id prefixes causes complexity

personaifydev · May 26, 2024, 9:17am

I currently store vector objects with ids as uuid4. (These map to an external db with the uuid4 as the primary key). Each object also has metadata that links back to the parent document_id (also uuid4). With pods, I could use delete and pass a metadata filter that specifies which document_ids I want to delete, and it would delete all objects associated with those documents. eg (python):

{'document_id': {'$in': [id for id in doc_ids]}}

All is well.

Now with serverless, deleting by metadata filter is not supported. Instead, I need to use id prefixes as mandated by the migration guide.

So I have to migrate and change all object ids to have this structure (hashtag # is the delimeter):
id={document_id}#{object_id}

Now I can delete by using the documented list() with id prefixes:

for ids in index.list(prefix='doc1#', namespace='ns1'):
  print(ids) # ['doc1#chunk1', 'doc1#chunk2', 'doc1#chunk3']
  index.delete(ids=ids, namespace=namespace)

But this causes a problem with index.fetch(ids=ids). Before, if I had a list of object uuid4s, I could call index.fetch() and retrieve them without knowing any other information.

Now with serverless, I need to know the parent document_id to construct the full id before I can call index.fetch(). And it’s worse if you have more hierarchy levels. It requires reconstructing the entire id to fetch.

It seems better to leave the id as a unique object id without trying to cram or encode multiple hierarchy, version, content-typing, other info into the id field. Why not leave those in the metadata field and allow us to delete by metadata filter again?

Another example: If we added a document_type, now I need prior knowledge of the type before I can retrieve the list of objects. Before, I could just retrieve a list of ids and determine the document_type from the response.

I suppose I could store the “full id” in the db or just recreate it dynamically with all the necessary hierarchical information…but seems quite complex. Is there a better way to do this?

Thanks

ZacharyProser · May 29, 2024, 2:37pm

Hi @personaifydev,

Thank you for the feedback.

I agree that it’s a lot cleaner to be able to store hierarchical information in metadata and issue deletions via metadata filters.

I’ll file your feedback so that the correct team sees it, and if I come across a cleaner solution for this in the meantime, I’ll update this thread with it.

Best,
Zack

personaifydev · May 31, 2024, 10:45am

Thank you so much @ZacharyProser, appreciate your response and action.