Error create will adding upsert with sparse vector

def create_index_for_docs(pdf_path:str ):
  assert os.path.isfile(pdf_path), f"File does not exist: {pdf_path}"
  index_name = "hybridsearch-bm25"
  text = extract_text(pdf_path)
  text_spliter = CustomSemanticChunker(embeddings,breakpoint_threshold_type="standard_deviation")
  docs = text_spliter.create_documents(texts = [text])
  docs = split_documents(docs, threshhold, overlap)
  texts,metadata,ids = zip(*[(i.page_content,i.metadata,str(idx)) for idx,i in enumerate(docs)])
  index = create_pinecone_index(index_name)
  sparseEncoder = BM25Encoder()
  sparseEncoder.fit(texts)
  embedding = OpenAIEmbeddings()
  retrivers = PineconeHybridSearchRetriever( embeddings=embedding, sparse_encoder=sparseEncoder, index=index,alpha=0.75,top_k=7)
  retrivers.add_texts(texts = list(texts),ids=ids,metadatas=metadata)
  return retrivers
error: 
PineconeApiException                      Traceback (most recent call last)
<ipython-input-29-254f56e85d5e> in <cell line: 1>()
----> 1 retrivers= create_index_for_docs(pdf_path=path)

14 frames
/usr/local/lib/python3.10/dist-packages/pinecone/core/client/rest.py in request(self, method, url, query_params, headers, body, post_params, _preload_content, _request_timeout)
    259                 raise ServiceException(http_resp=r)
    260 
--> 261             raise PineconeApiException(http_resp=r)
    262 
    263         return r

PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Wed, 17 Jul 2024 09:14:06 GMT', 'Content-Type': 'application/json', 'Content-Length': '81', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '125', 'x-pinecone-request-id': '6677666741385021616', 'x-envoy-upstream-service-time': '3', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Sparse vector must contain at least one value","details":[]}

Hi @sreeragunandha.t2021 and welcome to the Pinecone community forums!

Thank you for your question.

The error message indicates that the sparse vector must contain at least one value, which suggests that the sparse vector being passed is empty or incorrectly formatted.

To debug or fix this issue:

  1. Ensure the sparse vector is not empty and contains valid values.
  2. Verify the format and structure of the sparse vector being passed.
  3. Check the data being added for any anomalies or missing values.

I would start by logging to STDOUT most of the variables you have defined there to see what is undefined or malformed.

Hope that helps!

Best,
Zack