Metadata size exceeds the limit - but I have not included any metadata in my index yet

I am getting an ApiException error indicating metadata size is too large, but I have not created or added any metadata to the index.

Running this line of code:

docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)

is generating this error:

ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'content-type': 'application/json', 'date': 'Wed, 10 May 2023 17:02:04 GMT', 'x-envoy-upstream-service-time': '0', 'content-length': '115', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"metadata size is 64839 bytes, which exceeds the limit of 40960 bytes per vector","details":[]}

+1 same here!

“code”:3,“message”:“metadata size is 155002 bytes, which exceeds the limit of 40960 bytes per vector”,“details”:}

this one I think I figured out, it wasn’t the size of the metadata per se, it was the size of the vectors I was sending per call that gives that error.

Can you let me know what you changed to fix it?

yeah my batch size. i was reading in a giant text file and breaking it into chunks. And it was working fine. Then I made the chunks way too big and got that error. So I went back to the first size chunks.

Out of curiousity, what embedding model are you all using? all-MiniLM-L6-v2?

I am using the OpenAIEmbeddings model.
What I don’t understand is that it seems this error would be thrown when I create the embeddings - not when I search the database. Of course, there is the distinct possibility that I am going about this all wrong and I don’t fully understand the process. Any guidance would be greatly appreciated.

If you’ve been able to embed your docs with OpenAI’s embedding and upsert them into the pinecone index, I would think you’ve got the plumbing looking good for the OpenAI embedding of your query. Can you post your upsert/query code (obviously removing your API keys!).

index_name = ‘fusion’
docsearch = Pinecone.from_documents(docs, embeddings, index_name=index_name)

Oh, you need to chunk your metadata. There are restrictions on a per-row level. Try making your document smaller or chunking it. I’m going to assume you’re using langchain rather than directly calling the pinecone APIs, right? Try a small document and see if you still get the error (as the error message indicates, try a doc with a size of less than 40k). Langchain is likely storing the document in the metadata field.

multi-qa-MiniLM-L6-cos-v1