I have populated 25k vectors in Pinecone, with metadata and embeddings of dimension 1536 (openAI ada 002 embeddings).
I generate embeddings and insert them into pinecone with the following code:
from openai.embeddings_utils import get_embedding
def send_embeddings_to_pinecone(table:pd.DataFrame, batch_size:int=100):
for i in tqdm(range(0, len(table['text']), batch_size, position=0, leave=True)):
# set end position of batch
i_end = min(i+batch_size, len(table['text']))
# get batch of lines and IDs
ids_batch = [n for n in range(i, i_end)]
#insertion for batch
to_upsert = []
#indices
for i in ids_batch:
record = table.iloc[i]
#1. get embedding
embedding = get_embedding(record['text'], engine=embedding_model)
#2. create metadata
metadata = {"app label": record['label_app'],
... (more metadata)
}
vector = {"id":record["index"],
"metadata": metadata,
"values": embedding}
to_upsert.append(vector)
# upsert to Pinecone
index.upsert(vectors=to_upsert)
However, queries to my database seem to only search the metadata and nothing about the embedding vectors is stored. For example, an output is below: the ‘values’ arrays are all totally empty. This doesn’t even seem possible to me since my vectors should all have a certain non-zero dimension for this index.
How can I solve this?
Thanks!