Vectors are sent to Pinecone, but seem to arrive empty

Kickball · April 10, 2023, 11:18pm

I’m trying to split pdf documents into document chunks (using langchain) then convert them to OpenAI embeddings and store them in my Pinecone Index. I’m having trouble with the storing part. The PDFs are being split properly and when I log the embeddings locally they seem to be generating properly, but when I bring it all together and send it to Pinecone, my vectors appear empty in the index dashboard. The expected amount arrives and they are sorted into the proper namespaces, but when I query the vector store, they show up empty. I’ve tried deleting my index and creating a new one but no luck.
Also, if anyone knows how to change my index’s region, it’s less important but some tips would be greatly appreciated.
Thank you!

Here’s my code:

        const filePath = "path/to/file";

        // Read the PDF file
        const pdfData = fs.readFileSync(filePath);

        // Parse the PDF and extract the text content
        const pdfDocument = await pdfParse(pdfData);
        const text = pdfDocument.text;

        // Split the text into 1000 character chunks
        const textSplitter = new RecursiveCharacterTextSplitter({
            chunkSize: 1000,

        });
        const chunks = await textSplitter.createDocuments([text]); // turns an array of strings into documents

        console.log('creating vector store...');
        const embeddings = new OpenAIEmbeddings();

        const pinecone = await initPinecone();
        const index = pinecone.Index(process.env.PINECONE_INDEX_NAME);

        // // embed the PDF documents
        await PineconeStore.fromDocuments(chunks, embeddings, {
            pineconeIndex: index,
            namespace: 'test-pdf5',
            // textKey: 'pageContent',
        });

But I’ve also tried the example code from the langchain docs and an example project I found on youtube and both of these have the same issue.

Here’s the example from the docs:

        const client = new PineconeClient();
        await client.init({
            apiKey: process.env.PINECONE_API_KEY,
            environment: process.env.PINECONE_ENVIRONMENT,
        });
        const pineconeIndex = client.Index(process.env.PINECONE_INDEX_NAME);

        const docs = [
            new Document({
                metadata: { foo: "bar" },
                pageContent: "pinecone is a vector db",
            }),
            new Document({
                metadata: { foo: "bar" },
                pageContent: "the quick brown fox jumped over the lazy dog",
            }),
            new Document({
                metadata: { baz: "qux" },
                pageContent: "lorem ipsum dolor sit amet",
            }),
            new Document({
                metadata: { baz: "qux" },
                pageContent: "pinecones are the woody fruiting body and of a pine tree",
            }),
        ];

        await PineconeStore.fromDocuments(docs, new OpenAIEmbeddings(), {
            pineconeIndex,
            namespace: "test",
        });

And here’s a link to the youtube repo

Cory_Pinecone · April 11, 2023, 4:16am

Hi @Kickball,

I checked our backend, and I do see vectors in your index. For privacy reasons, I can’t see their values, but I do see that there are some there. So your vectors are being stored in Pinecone.

If you open the Index Info section in the console (just above Metrics), you should see the total vector counts for each namespace in your index.

What happens when you query against the index? Not fetch, but issue a query with text that’s been run through the same model. You can use the example from the Langchain docs to do so.

Kickball · April 11, 2023, 2:33pm

Thanks for the reply! When I search the db using meta filters (like the example in the documentation) it returns an empty array.

This is my search code:

        const vectorStore = await PineconeStore.fromExistingIndex(
            new OpenAIEmbeddings(),
            { pineconeIndex }
        );

        const results = await vectorStore.similaritySearch("pinecone", 1, {
            foo: "bar",
        });
        console.log('res:' + results);

Which, if I’m not wrong, should be returning these documents:

            new Document({
                metadata: { foo: "bar" },
                pageContent: "pinecone is a vector db",
            }),
            new Document({
                metadata: { foo: "bar" },
                pageContent: "the quick brown fox jumped over the lazy dog",
            }),

The expected amount of vectors are stored (4) under the correct namespace (‘test’) but even from a search they come back empty.

Untitled

Kickball · April 11, 2023, 2:47pm

I also ran the text search with this code:

        const vectorStore = await PineconeStore.fromExistingIndex(
            new OpenAIEmbeddings(),
            { pineconeIndex }
        );

        const model = new OpenAI();
        const chain = VectorDBQAChain.fromLLM(model, vectorStore, {
            k: 1,
            returnSourceDocuments: true,
        });
        const response = await chain.call({ query: "What is pinecone?" });
        console.log(response);

and got this result:

{
  text: " A pinecone is a type of fruit produced by pine trees. It is an ovulate cone, typically consisting of overlapping scales that contain the tree's seeds.",
  sourceDocuments: []
}

It seems to be having the same issue as the meta filter.

DannyPreye · January 24, 2025, 11:07pm

Hi @Kickball did you later find a solution to this? I’m asking because I facing the same issue