Help understanding multi-level ID prefixes

mike1 · April 4, 2024, 10:16pm

Hello! I’ve been loving Pinecone. I have a question about using multi-level ID prefixes.

Here is how my application is structured:

- organization1
  - space1
    - source1
    - source2
    - source3
  - space 2
    - source1
    - source2
- organization2
...

An organization can have many spaces. A space can have many sources. Each source is broken up and embedded.

I’m creating endpoints for deleting organizations and spaces and want a quick way to delete the vectors.

Currently my IDs look like this: source_{sourceId}# This works great for deleting sources.

However, can I use multi-level prefixes to add an organization ID and a space ID?

So my pinecone ID would look like: organization_{orgId}#space_{spaceId}#source_{sourceId}#

Two main questions:

Does the order of those prefixes matter? org → source, or source → org?
With the new inclusive ID structure, can I now delete all records by organizationId, spaceId, or sourceId by using the below format?

const nextPage = await index.listPaginated({
	prefix: `source_${sourceId}#`, // can I put any prefix here?
        // AND does this work? prefix: `organization_${orgId}#`
        // AND this works too? prefix: `space_${spaceId}#`
	paginationToken: page.pagination.next,
});

OR is it incremental starting from the beginning?

const nextPage = await index.listPaginated({
	prefix: `organization_{orgId}#`,
        // AND need more for this? prefix: `organization_${orgId}#space_${spaceId}#`
        // AND for most specific would need all three? prefix: `organization_{orgId}#space_{spaceId}#source_{sourceId}#`
	paginationToken: page.pagination.next,
});

Some insight would be amazing. Thank you!

mike1 · April 5, 2024, 8:47pm

Following up on this for posterity and help others with the same question. I ran some tests and found some answers.

You CANNOT query by any prefix within the ID.

Take an id of organization_1#space_1#source_1#abc

You cannot query by only knowing the space or source. This will not work:

const response = await index.listPaginated({
	prefix: "source_1#",
});

This will not work either:

const response = await index.listPaginated({
	prefix: "space_1#source_1#",
});

The only way to perform these queries is by starting at the beginning and narrowing specificity within the ID. These will work:

const response = await index.listPaginated({
	prefix: "organization_1#",
});

const response = await index.listPaginated({
	prefix: "organization_1#space_1#",
});

const response = await index.listPaginated({
	prefix: "organization_1#space_1#source_1#",
});

With this in mind, it also answers our other question about structuring multi-level IDs.

We should always start with the wider document so that we’re able to query without knowing the children. This is a good structure:
parent#child#grandchild#...

Without that structure, we wouldn’t be able to query all vectors within an organization.

zeke · April 9, 2024, 6:32pm

@mike1, thank you for sharing your analysis with the community! Your tests all check out with the expected behavior.

mike1 · April 9, 2024, 6:49pm

of course! Always to help the future readers. One additional thing I learned and wanted to point out:

If you use a namespace to create the record, you’ll need to use that namespace to fetch the record, even with ID prefix matching.

By the way Zeke, this might be useful information to add to the existing docs. Seems like an important concept but doesn’t get much love in the docs currently.

zeke · April 10, 2024, 5:49pm

Thanks for following up and for the feedback, @mike1!

In List vector IDs, we state, " The list operation lists the IDs of vectors in a single namespace of a serverless index."

In Using namespaces, we state, “When you don’t specify a namespace name for an operation, Pinecone uses the default namespace name of "" (the empty string).”

An alternative explanation in Operations across all namespaces offers, “All vector operations apply to a single namespace, with one exception: The DescribeIndexStatistics operation.”

Is there another area you feel would be helpful to document the limitation of data plane operations to a singular namespace?