I uploaded a dataset with 150 vectors, with setDimension=16 elements, metric=“euclidean”
indexName = "c1"
setDimension=16
pinecone.create_index(indexName, dimension=setDimension, metric="euclidean")
I confirmed using the online console at Pinecone Console that I have 150 vectors, Dimensions=16, and my metric is indeed set to euclidean.
Now when I do a query asking for 3 matches,
res = index.query(
vector=testVec,
top_k=3,
include_values=True
)
I do get three items back, but with “score” values I do not understand. I thought it would be simply the Euclidean distance or L2 norm (v1-v2) between my test vector and the query result vector, but I calculated the norm myself using this code:
# get Euclidean distance between vectors at index i1 and i2
def getDist(i1, i2):
tvec1 = emVectors[i1,1:cols]
tvec2 = emVectors[i2,1:cols]
dist = np.linalg.norm(tvec2-tvec1)
return dist
and found that is not true. I thought the matches would be presented in order of match, and they are in fact shown from low ‘score’ to high ‘score’, but they are not in order by actual Euclidean distance.
Also my test vector is actually a member of the dataset, but that element is not one of those returned, despite it being obviously the best match as the (vTest - vResult) distance would be zero. Am I misunderstanding how this is supposed to work?