Thank you, fortunately I arrived at that implementation from Pinecone.Net’s github repo. However, I’m still struggling to have a clear picture of how I should group together different indexes… Upon chatting with Pinecone’s bot, which I assume is informed by all their documentation (which is awesome btw), I concluded this can be done with either clusters, namespaces, metadata or a ‘mapping file’. I chose to implement it using metadata because Pinecone’s API doesn’t have any methods that use namespaces, contrary to the index creation method on the direct original Rest implementation (and clusters and mapping files seem like hallucinations, but correct me if I’m wrong). This is how I’m traversing the object with recursion, and creating the embedding and the index for each property found with reflection:
async Task TraversePropertiesAndCreateIndex(object obj, string indexClusterId, PineconeClient? pinecone)
{
if (obj == null)
return;
Type type = obj.GetType();
PropertyInfo[] properties = type.GetProperties();
foreach (PropertyInfo property in properties)
{
if (property.Name == "Password") //ignore passwords
continue;
object value = property.GetValue(obj);
var responseEmbedding = await EmbedValue(value);
string typeGuid = obj.GetType().GUID.ToString();
string propUniqueId = $"{indexClusterId}_{typeGuid}_{property.Name}";
await CreateIndex(responseEmbedding, pinecone, propUniqueId, indexClusterId);
if (value != null && property.PropertyType.Namespace.StartsWith("Domain.Models"))
{
TraversePropertiesAndCreateIndex(value, indexClusterId, pinecone);
}
Console.WriteLine($"{property.Name}: {value}");
}
}
async Task<EmbeddingsResponse> EmbedValue(object? value)
{
using var openAI = new OpenAIClient(_externalApisOptions.OpenAIKey);
var embeddingsRequest = new EmbeddingsRequest(value.ToString(), Model.Embedding_Ada_002);
return await openAI.EmbeddingsEndpoint.CreateEmbeddingAsync(embeddingsRequest);
}
async Task CreateIndex(EmbeddingsResponse responseEmbedding, PineconeClient pinecone, string propUniqueId, string indexClusterId)
{
var embeddings = responseEmbedding.Data.Select(x => x.Embedding).ToList();
var indexDetails = new IndexDetails()
{
Name = propUniqueId,
Dimension = 1536, //the supposed number of dimensions of the OpenAI embeddings
Metric = Metric.Cosine,
};
await pinecone.CreateIndex(indexDetails);
var wasCreated = (await pinecone.ListIndexes()).Contains(propUniqueId);
if (wasCreated)
{
float[] embeddingsFloatArray = ConvertToFloatArray(embeddings);
var index = await pinecone.GetIndex(propUniqueId);
var vectors = new[]
{
new Vector
{
Id = propUniqueId,
Values = embeddingsFloatArray,
Metadata = new MetadataMap
{
["group"] = $"{indexClusterId}",
}
}
};
await index.Upsert(vectors);
}
}
Above, indexClusterId is going to be the object’s pointer to all embedded vectors, because to be honest I’m pretty clueless as to how to include namespaces. The Name of the Embedding and the Id of the Vector are the same, as I’ve made it a unique value and had no idea what should I use the vector Id for… I’m crossing my fingers at this point! This is how I will send over the Query:
async Task<Vector> QueryIndex(EmbeddingsResponse responseEmbedding, PineconeClient pinecone, string indexClusterId)
{
var embeddings = responseEmbedding.Data.Select(x => x.Embedding).ToList();
var indexName = $"query_{Guid.NewGuid().ToString()}";
var indexDetails = new IndexDetails()
{
Name = indexName,
Dimension = 1536,
Metric = Metric.Cosine,
};
await pinecone.CreateIndex(indexDetails);
var wasCreated = (await pinecone.ListIndexes()).Contains(indexName);
if (wasCreated)
{
float[] embeddingsFloatArray = ConvertToFloatArray(embeddings);
var index = await pinecone.GetIndex(indexName);
var vectors = new[]
{
new Vector
{
Id = indexName,
Values = embeddingsFloatArray,
Metadata = new MetadataMap
{
["group"] = indexClusterId,
}
}
};
var queryIndexId = await index.Upsert(vectors);
var responses = await index.Query(queryIndexId.ToString(), topK: 10, includeValues: true, includeMetadata: true) ;
//cleanup
await index.Delete(new[] { indexName });
var doesQueryIndexStillExist = (await pinecone.ListIndexes()).Contains(indexName);
return responses;
}
}
I’m still in the process of testing this, and I will post any further updates, but I appreciate any pointers to how to correctly implement RAG to store and query complex objects that are unique to each user or tips for any improvements. Thanks!