I created index and ingested 225 vectors. I do it in batches of 100 (per documentation recommendation). I know that upsert calls are successful - they return to me number of vectors ingested. At the end of ingestion I call index.DescribeStats() and it return number of vectors in the index: 100
I wait for 5-10 minutes but the result is the same: 100 instead of 225.
At the same time when I login to Pinecone site and go my index - it lists only 50 vectors.
Very confusing… Is anybody here can explain it to me all these discrepancies?
BTW, my metadata includes the text of the paragraph that the vector represents, so it is not a lot of data…
EDIT:
So I went and ingested entire 225 vectors at once. This is the number that ‘upsert’ returned.
But again, the Pinecone site displays only 50… Need explanation, please.
can you provide the way you are upserting the vectors? Over api, langchain or any other means? A similar problem that I had was using the same vector IDs for multiple vectors. Upserting a vector with the same ID as an existing one will overwrite it. How are you creating the IDs (if you even are)?
My ID for a vector is basically a vector counter, so it is unique.
I have a C# library I wrote. Here is the ingestion function below.
Some of the ID values among 50 that are displayed on the Pinecone site have values like 112, 132, etc. That is, there were more than 50 that got there, so why it displays only random 50?
EDIT: Eurika!
Today in the morning I ran my app again. On startup it connects to the index and calls for index stats. And it displays on my app that there are 220 vectors in the index. So, my conclusion is that
Pinecone needs to do something with its site - at least to display the number of vectors the index has.
The delay between the right number of vectors and fullness returned by API is too long (unknown to me at this point). But from the user/customer point of view it is not acceptable.
public async Task<uint> IngestVectors(IndexClient<RestTransport> idx, List<ISegment> segList, string indexNamespace = null)
{
using (Log.VerboseCall())
{
if (idx == null)
throw new ArgumentNullException(nameof(idx));
if (idx.Index.Status.IsReady == false)
throw new Exception("Index is not in Ready state");
if (segList == null || segList.Count == 0)
throw new ArgumentException("Vector list is null or empty");
try
{
mIndex = idx;
List<Vector> vectors = new List<Vector>();
for (int i = 0; i < segList.Count; i++)
{
if (segList[i] == null || String.IsNullOrWhiteSpace(segList[i].SegmentText))
continue;
Vector vec = new Vector()
{
Id = i.ToString(),
Values = segList[i].TextVector,
Metadata = new MetadataMap()
};
vec.Metadata.Add("segmentId", segList[i].segmentId);
vec.Metadata.Add("SegmentText", segList[i].SegmentText);
if (!String.IsNullOrEmpty(segList[i].Source.Trim()))
vec.Metadata.Add("Source", segList[i].Source);
if (segList[i].SrcType >= 0)
vec.Metadata.Add("SrcType", segList[i].SrcType.ToString());
vec.Metadata.Add("TimeIngested", segList[i].TimeIngested);
vec.Metadata.Add("TimeSrcModified", segList[i].TimeSrcModified);
vectors.Add(vec);
}
uint itemsNum = await mIndex.Upsert(vectors, indexNamespace);
Log.VerboseFormat("Successfully ingested {0} vectors into Index: {1}", itemsNum, mIndex.Index.Details.Name);
return itemsNum;
}
catch (Exception ex)
{
Log.Verbose(ex);
}
return 0;
}
}
It does seem very strange, that it would take a whole day just to refresh this data… Usually I can keep track of my upserts “live” when they happen. Glad you got it working as your code looks good
Well, it may take less than a day - no idea when it became available after 8 PM and before 10 AM…
I am glad it is not my code, but the problem still exists - how my users will operate this is still a big question - that is, a problem… I wish somebody from Pinecone team could address this, although somewhere in the documents it says that vectors are not immediately become consumable… IMO, it is a huge show stopper.