Getting PineconeApiException: (400) Reason: Bad Request HTTP response headers: HTTPHeaderDict?

mishraatharva · August 22, 2024, 8:38am

I am using this link to learn langhain and pinecone. Just copy pasted the code as bellow:

MY CODE:

import getpass

import os

import time

from pinecone import Pinecone

os.environ['HF_TOKEN'] = os.getenv('HF_TOKEN')

pinecone_api_key = os.environ.get("PINECONE_API_KEY")

pc = Pinecone(api_key=pinecone_api_key)

index_name = "langchain-pinecone-learning"  # change if desired

existing_indexes = [index_info["name"] for index_info in pc.list_indexes()]

if index_name not in existing_indexes:
    pc.create_index(
        name=index_name,
        dimension=3072,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1"),
    )
    while not pc.describe_index(index_name).status["ready"]:
        time.sleep(1)

index = pc.Index(index_name)
index

embeddings=(
    OllamaEmbeddings(model="gemma:2b")  ##by default it ues llama2
)

from langchain_pinecone import PineconeVectorStore

vector_store = PineconeVectorStore(index=index, embedding=embeddings)

vector_store

from uuid import uuid4

from langchain_core.documents import Document

document_1 = Document(
    page_content="I had chocalate chip pancakes and scrambled eggs for breakfast this morning.",
    metadata={"source": "tweet"},
)

document_2 = Document(
    page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
    metadata={"source": "news"},
)

document_3 = Document(
    page_content="Building an exciting new project with LangChain - come check it out!",
    metadata={"source": "tweet"},
)

document_4 = Document(
    page_content="Robbers broke into the city bank and stole $1 million in cash.",
    metadata={"source": "news"},
)

document_5 = Document(
    page_content="Wow! That was an amazing movie. I can't wait to see it again.",
    metadata={"source": "tweet"},
)

document_6 = Document(
    page_content="Is the new iPhone worth the price? Read this review to find out.",
    metadata={"source": "website"},
)

document_7 = Document(
    page_content="The top 10 soccer players in the world right now.",
    metadata={"source": "website"},
)

document_8 = Document(
    page_content="LangGraph is the best framework for building stateful, agentic applications!",
    metadata={"source": "tweet"},
)

document_9 = Document(
    page_content="The stock market is down 500 points today due to fears of a recession.",
    metadata={"source": "news"},
)

document_10 = Document(
    page_content="I have a bad feeling I am going to get deleted :(",
    metadata={"source": "tweet"},
)

documents = [
    document_1,
    document_2,
    document_3,
    document_4,
    document_5,
    document_6,
    document_7,
    document_8,
    document_9,
    document_10,
]

uuids = [str(uuid4()) for _ in range(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)

# vector_store.from_documents(documents,embeddings)

Now I am here I am 2 errors:

First:

if using vector_store.from_documents(documents,embeddings)

I am getting following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[23], line 73
     68 uuids = [str(uuid4()) for _ in range(len(documents))]
     70 # vector_store.add_documents(documents=documents, ids=uuids)
---> 73 vector_store.from_documents(documents,embeddings)

File u:\GENERATIVE_AI\genv\lib\site-packages\langchain_core\vectorstores\base.py:833, in VectorStore.from_documents(cls, documents, embedding, **kwargs)
    831 texts = [d.page_content for d in documents]
    832 metadatas = [d.metadata for d in documents]
--> 833 return cls.from_texts(texts, embedding, metadatas=metadatas, **kwargs)

File u:\GENERATIVE_AI\genv\lib\site-packages\langchain_pinecone\vectorstores.py:453, in PineconeVectorStore.from_texts(cls, texts, embedding, metadatas, ids, batch_size, text_key, namespace, index_name, upsert_kwargs, pool_threads, embeddings_chunk_size, async_req, id_prefix, **kwargs)
    407 @classmethod
    408 def from_texts(
    409     cls,
   (...)
    424     **kwargs: Any,
    425 ) -> PineconeVectorStore:
    426     """Construct Pinecone wrapper from raw documents.
    427 
    428     This is a user friendly interface that:
   (...)
    451             )
    452     """
...
    403         f"Did you mean one of the following indexes: {', '.join(index_names)}"
    404     )
    405 return index

ValueError: Index 'None' not found in your Pinecone project. Did you mean one of the following indexes: langchain-pinecone-learning
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Second:

if using vector_store.add_documents(documents=documents, ids=uuids) i am getting following error:

---------------------------------------------------------------------------
PineconeApiException                      Traceback (most recent call last)
Cell In[24], line 70
     55 documents = [
     56     document_1,
     57     document_2,
   (...)
     65     document_10,
     66 ]
     68 uuids = [str(uuid4()) for _ in range(len(documents))]
---> 70 vector_store.add_documents(documents=documents, ids=uuids)
     73 # vector_store.from_documents(documents,embeddings)

File u:\GENERATIVE_AI\genv\lib\site-packages\langchain_core\vectorstores\base.py:282, in VectorStore.add_documents(self, documents, **kwargs)
    280     texts = [doc.page_content for doc in documents]
    281     metadatas = [doc.metadata for doc in documents]
--> 282     return self.add_texts(texts, metadatas, **kwargs)
    283 raise NotImplementedError(
    284     f"`add_documents` and `add_texts` has not been implemented "
    285     f"for {self.__class__.__name__} "
    286 )

File u:\GENERATIVE_AI\genv\lib\site-packages\langchain_pinecone\vectorstores.py:175, in PineconeVectorStore.add_texts(self, texts, metadatas, ids, namespace, batch_size, embedding_chunk_size, async_req, id_prefix, **kwargs)
    164 if async_req:
    165     # Runs the pinecone upsert asynchronously.
...
PineconeApiException: (400)
Reason: Bad Request
HTTP response headers: HTTPHeaderDict({'Date': 'Thu, 22 Aug 2024 08:37:54 GMT', 'Content-Type': 'application/json', 'Content-Length': '104', 'Connection': 'keep-alive', 'x-pinecone-request-latency-ms': '1094', 'x-pinecone-request-id': '4162547469429931929', 'x-envoy-upstream-service-time': '36', 'server': 'envoy'})
HTTP response body: {"code":3,"message":"Vector dimension 2048 does not match the dimension of the index 4072","details":[]}
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...

Thanks in advance.
I hope to hear from you soon.

ZacharyProser · August 26, 2024, 2:46pm

Hi @mishraatharva,

Thanks for your question.

I’d strongly recommend starting with our official LangChain guide, especially if you want to learn.

It looks like your first error is occurring because your index name is not defined.

Your second error is a dimensionality mismatch. You need to set your Pinecone index to have the same number of dimensions that your embedding model outputs.

For example, if you were using OpenAI’s text-embedding-3-small, which outputs 1536 dimensions, you would need to set your Pinecone index to 1536 dimensions when creating it.

It appears there’s a mismatch between the embedding model you’re using and the dimensionality set on your index.

Hope that helps!

Best,
Zack