I need help regarding this matter, I’ve deleted the index just to make sure as well I have a pdf data that is over a ~5000 page and i’ve split the document via haystack using document splitter indexing.add_component("splitter", DocumentSplitter(split_by="page", split_length=1))
I’ve also tried
split_by="sentence"
with split_lenth=5
I’ve also tried using a 20 page pdf with little to none texts
Hi @notme_MA, and welcome to the Pinecone forums! Thanks for your question.
Could you please share all of your relevant code so that we can better assist you in debugging the issue?
It’s difficult to tell what might be going wrong from your description.
General tips and advice based on what I can surmise might be happening:
- Can you follow one of haystacks examples or notebooks that splits a PDF and start out with a smaller sample PDF to sanity check you’re getting reasonable output?
- If you then modify that test code to use your PDF, what happens? Can you share the output or error messages you’re seeing as well?
- Check if haystack has any hard limits on PDF size
Hope this helps and looking forward to your reply.
Best,
Zack
I am facing a similar sort of issue .
pinecone.core.client.exceptions.ApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({‘content-type’: ‘text/plain; charset=utf-8’, ‘x-pinecone-api-version’: ‘2024-04’, ‘access-control-allow-origin’: ‘', ‘vary’: ‘origin,access-control-request-method,access-control-request-headers’, ‘access-control-expose-headers’: '’, ‘X-Cloud-Trace-Context’: ‘4e755e566fdd488e46535f8debbaee48’, ‘Date’: ‘Sat, 20 Jul 2024 16:49:11 GMT’, ‘Server’: ‘Google Frontend’, ‘Content-Length’: ‘136’, ‘Via’: ‘1.1 google’, ‘Alt-Svc’: ‘h3=“:443”; ma=2592000,h3-29=“:443”; ma=2592000’})
HTTP response body: Request failed. You’ve reach the max pod-based indexes allowed in project Chatbot (0). To add more pod-based indexes, upgrade your plan.
I have deleted all the indexes present in the project , Still I am getting the same error.