Hi @jenna, thanks for the quick response!
Here are the details you asked for:
Context
I’m using the langchain_pinecone.vectorstores.PineconeVectorStore with LangChain + Bedrock embeddings. The retrieval logic works fine most of the time, but now it’s returning documents from wrong namespaces, both locally and in Docker.
I’ve confirmed:
- The correct namespace is being passed (
print() before and after creation).
- The index is correct.
- Docker image was rebuilt without cache using
--no-cache.
- Pinecone API keys, env, and access config are correctly set in both local and Docker environments.
Following is the code I am using:
from langchain_pinecone.vectorstores import PineconeVectorStore
from langchain_aws import ChatBedrock, BedrockEmbeddings
from pinecone import Pinecone
import os
import boto3
PINECONE_API_KEY = “<REDACTED_API_KEY>”
INDEX_NAME = “”
NAMESPACE = “”
os.environ[“PINECONE_API_KEY”] = PINECONE_API_KEY
client = boto3.client(
service_name=“bedrock-runtime”,
aws_access_key_id=“”,
aws_secret_access_key=“”,
region_name=“us-east-1”
)
embedding = BedrockEmbeddings(client=client, model_id=‘amazon.titan-embed-text-v2:0’)
pc = Pinecone(api_key=PINECONE_API_KEY)
index = pc.Index(INDEX_NAME)
def create_retriever(INDEX_NAME, NAMESPACE):
print(“[create_retriever] Using namespace:”, NAMESPACE)
vectorstore = PineconeVectorStore(
index_name=INDEX_NAME,
embedding=embedding,
namespace=NAMESPACE
)
retriever = vectorstore.as_retriever(search_kwargs={“k”: 5})
return retriever, vectorstore
def inspect_index(index_name, namespace):
index = pc.Index(index_name)
stats = index.describe_index_stats()
print(“\nIndex stats:”)
print(“Available namespaces:”, list(stats[“namespaces”].keys()))
if namespace in stats[“namespaces”]:
print(f"Vector count in ‘{namespace}’:“, stats[“namespaces”][namespace][“vector_count”])
else:
print(f"Namespace ‘{namespace}’ not found.”)
def test_retriever_flow():
retriever, vectorstore = create_retriever(INDEX_NAME, NAMESPACE)
print(“Namespace in retriever:”, vectorstore._namespace)
inspect_index(INDEX_NAME, NAMESPACE)
query = “Why is this product a good choice for users?”
print(“\nFetching documents for query:”, query)
docs = retriever.invoke(query)
print(“\n Retrieved documents:”, docs)
if name == “main”:
test_retriever_flow()
Questions
- Could Pinecone be serving cached or stale results across namespaces?
- Is there any known issue with
PineconeVectorStore or LangChain’s integration that may ignore or override the namespace?
- Could this be related to an eventual consistency delay or misrouting internally?
Issue 2: Index Has Vectors, But Retrieval Returns No Documents
Following up on my previous post regarding namespace mismatch — I’ve also noticed another issue. The retriever now correctly points to the expected namespace ("HMP Data"), and describe_index_stats() confirms that there are 332 vectors in it.
Note: This behavior noticed When I ran code outside the application… with python file
Problem:
Even though the namespace has vectors, running a query like "what is pbm" returns no documents.
Fetching documents for query: what is pbm
Retrieved documents: []
No documents found.
Debugging So Far:
- Namespace and index name are correct (confirmed via logs & pinecone UI).
- Retrieval is done using LangChain’s
PineconeVectorStore.as_retriever(k=5).
- No filters are applied.
- Embeddings are created using
amazon.titan-embed-text-v2:0 via BedrockEmbeddings.
describe_index_stats() shows 332 vectors under "HMP Data".
- Tried different phrasings of queries & with different queries as well — no luck.
- Embedding dimension and index dimension is 1024.