def create_index_for_docs(pdf_path:str ):
assert os.path.isfile(pdf_path), f"File does not exist: {pdf_path}"
index_name = "hybridsearch-bm25"
text = extract_text(pdf_path)
text_spliter = CustomSemanticChunker(embeddings,breakpoint_threshold_type="standard_deviation")
docs = text_spliter.create_documents(texts = [text])
docs = split_documents(docs, threshhold, overlap)
texts,metadata,ids = zip(*[(i.page_content,i.metadata,str(idx)) for idx,i in enumerate(docs)])
index = create_pinecone_index(index_name)
sparseEncoder = BM25Encoder()
sparseEncoder.fit(texts)
embedding = OpenAIEmbeddings()
retrivers = PineconeHybridSearchRetriever( embeddings=embedding, sparse_encoder=sparseEncoder, index=index,alpha=0.75,top_k=7)
retrivers.add_texts(texts = list(texts),ids=ids,metadatas=metadata)
return retrivers
error:
PineconeApiException Traceback (most recent call last)
<ipython-input-29-254f56e85d5e> in <cell line: 1>()
----> 1 retrivers= create_index_for_docs(pdf_path=path)
14 frames
/usr/local/lib/python3.10/dist-packages/pinecone/core/client/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
259 raise ServiceException(http_resp=r)
260
--> 261 raise PineconeApiException(http_resp=r)
262
263 return r
PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 17 Jul 2024 09:14:06 GMT', 'Content-Type': 'application/json', 'Content-Length': '81', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '125', 'x-pinecone-request-id': '6677666741385021616', 'x-envoy-upstream-service-time': '3', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Sparse vector must contain at least one value","details":[]}
Hi @sreeragunandha.t2021 and welcome to the Pinecone community forums!
Thank you for your question.
The error message indicates that the sparse vector must contain at least one value, which suggests that the sparse vector being passed is empty or incorrectly formatted.
To debug or fix this issue:
- Ensure the sparse vector is not empty and contains valid values.
- Verify the format and structure of the sparse vector being passed.
- Check the data being added for any anomalies or missing values.
I would start by logging to STDOUT most of the variables you have defined there to see what is undefined or malformed.
Hope that helps!
Best,
Zack