Parent & Child Vector

smit · May 18, 2023, 11:00am

Hi,

I’m developing something and would like to explore the usage of parent & child vectors but I cant find a lot of information or examples on this. Does anybody have some info?

Also, for each vector, can I store two sets of metadata texts? For instance an (a) extract of text and (b) a summary of the entire document?

Best regards,
Martin

Sean · May 18, 2023, 5:48pm

I’m still coming up to speed on the performance aspects of pinecone, but two possible approaches would be to use namespaces on your index (namespace=“parent”, namespace=“child”):

  xc = index.query(query, top_k=4, include_metadata=True, namespace='parent')

or possibly to use a metadata filter to reduce the children if that is possible.

06.papilla_vices · November 25, 2024, 1:48pm

Hi @smit , did you end up figuring this out? was curious as well. thanks!

bear · November 25, 2024, 3:25pm

@06.papilla_vices can you let us know a bit more about what you’re trying to do so we can recommend a strategy?

With the example above, a useful thing to do might be to use ID prefixing: Manage RAG documents - Pinecone Docs

An ID scheme like that will allow you to relate multiple vectors to one another, e.g. ones that belong to the same chunk.

While you can store both the plain text of the file and a summary of the document as metadata (up to 4kb), for sensitive data we do recommend that you either encrypt it or consider instead storing a reference to the parent document and the chunk text.

06.papilla_vices · November 25, 2024, 3:44pm

hi @bear thanks for your response. I am trying to emulate sentence window retrieval methods and have seen two options - the text stored in metadata or the ID prefixing approach you mentioned. I still have not tried either yet as I was looking for examples of others implementing it.

Maybe you can help me with the following: 1) can you extract the text from metadata? if so, maybe you can share some documentation? I am not storing any sensitive data for my use case. I was not sure if pinecone metadata was only used for metadata or if it could be extracted 2) can you comment on the speed differences between metadata text retrieval vs ID prefixing? I had seen a prior post say that extracting via ID prefixing was so-so with respect to speed given that pinecone was not intended to be used as such.

bear · November 25, 2024, 5:10pm

Re: extracting the text from metadata, the metadata will be returned as part of your query result, like this: Query data - Pinecone Docs

so you would be able to pull it from the payload, yes.

You’re right that this isn’t what ID prefixing is for– thanks for telling us more about what you’re trying to do! I would store two fields in metadata- the embedding chunk and the broadened chunk for feeding to the LLM. That will be more efficient than trying to reconstruct the expanded chunk based on how records relate to one another.

06.papilla_vices · November 25, 2024, 7:17pm

thanks for your response! the embedding chunk would also be replicated in the metadata? why would this be necessary if the broadened chunk is implicitly linked to the embedding chunk already?

bear · November 25, 2024, 8:56pm

Fair point- you could just keep the broadened chunk!