"code":3,"message":"Dense vectors must contain at least one non-zero value"

pierre.curie.godinot · January 31, 2024, 1:40pm

Hello everyone,

Am trying to upsert and preprocess docs using haystack and pinecone.

from haystack.utils import fetch_archive_from_http


# This fetches some sample files to work with
doc_dir = "data/tutorial8"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/preprocessing_tutorial8.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

all_docs = convert_files_to_docs(dir_path=doc_dir)

preprocessor = PreProcessor(
    clean_empty_lines=True,
    clean_whitespace=True,
    split_by="word",
    split_length=100,
    split_respect_sentence_boundary=True
)

docs_default = preprocessor.process(all_docs)  #create a dictionary with the data in the 'content' key

document_store.write_documents(docs_default)  #need a dictionary as arg

Then i got this error : ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({‘content-type’: ‘application/json’, ‘Content-Length’: ‘155’, ‘x-pinecone-request-latency-ms’: ‘136’, ‘date’: ‘Wed, 31 Jan 2024 13:33:43 GMT’, ‘x-envoy-upstream-service-time’: ‘32’, ‘server’: ‘envoy’, ‘Via’: ‘1.1 google’, ‘Alt-Svc’: ‘h3=“:443”; ma=2592000,h3-29=“:443”; ma=2592000’})
HTTP response body: {“code”:3,“message”:“Dense vectors must contain at least one non-zero value. Vector ID 1f6ca8a2bd6c9903813607120d8d48bc contains only zeros.”,“details”:}

But when i do this :

from pprint import pprint

pprint(docs_default[0])

its return : <Document: {‘content’: 'BERT: Pre-training of Deep Bidirectional Transformers for\nLanguage Understanding\nJacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova\nGoogle AI Language[n{jacobdevlin,mingweichang,kentonl,kristout}@google.com](mailto:n%7Bjacobdevlin,mingweichang,kentonl,kristout%7D@google.com)\nAbstract\nWe introduce a new language representa-\ntion model called BERT, which stands for\nBidirectional Encoder Representations from\nTransformers. Unlike recent language repre-\nsentation models (Peters et al., 2018a; Rad-\nford et al., 2018), BERT is designed to pre-\ntrain deep bidirectional representations from\nunlabeled text by jointly conditioning on both\nleft and right context in all layers. ', ‘content_type’: ‘text’, ‘score’: None, ‘meta’: {‘name’: ‘bert.pdf’, ‘_split_id’: 0}, ‘id_hash_keys’: [‘content’], ‘embedding’: None, ‘id’: ‘1f6ca8a2bd6c9903813607120d8d48bc’}>

So i really don’t get why this vector is containing only zero values.

jesse · January 31, 2024, 4:21pm

It does look like there’s no vector in the data you’re sending to Pinecone. You might want to check how the documents are getting preprocessed before upsert into Pinecone.

pierre.curie.godinot · February 1, 2024, 11:31pm

Actually, we can generate embeddings for our context passages using the retriever later in the code. All we need to do is pass the retriever to update_embeddings method in the document store. This will generate embeddings and upsert it to Pinecone Index.

system · February 15, 2024, 11:32pm

This topic was automatically closed 14 days after the last reply. New replies are no longer allowed.