Help pushing vectors to pinecone DB in Next JS

dattasumit2019 · June 28, 2024, 2:41am

So I need help pushing vectors into the db, currently something is wrong with the await function here, I am getting an error.

Pastebin link for my file that uploads to pinecode: import { Index, Pinecone, PineconeRecord, RecordMetadata,} from "@ - Pastebin.com

This is the error I’m getting right now:
Object literal may only specify known properties, and ‘vectors’ does not exist in type ‘PineconeRecord’.

Code if pastebin doesn’t work:
import {
Index,
Pinecone,
PineconeRecord,
RecordMetadata,
} from “@pinecone-database/pinecone”;
import { downloadFromS3 } from “./s3.server”;
import { PDFLoader } from “@langchain/community/document_loaders/fs/pdf”;
import {
Document,
RecursiveCharacterTextSplitter,
} from “@pinecone-database/doc-splitter”;
import { getEmbeddings } from “./embeddings”;
import { Vector } from “@pinecone-database/pinecone/dist/pinecone-generated-ts-fetch”;
import md5 from “md5”;
import { convertToAscii } from “./utils”;

let pinecone: Pinecone | null = null;
const api = process.env.PINECONE_API_KEY || “”;

export const getPineconeClient = () => {
if (!pinecone) {
pinecone = new Pinecone({
apiKey: api,
});
}
return pinecone;
};

type PDFPage = {
pageContent: string;
metadata: {
loc: { pageNumber: number };
};
};

export async function loadS3IntoPinecone(fileKey: string) {
try {
// 1. Obtain the PDF
console.log(“Downloading PDF from S3…”);
const file_name = await downloadFromS3(fileKey);

if (!file_name) {
  throw new Error("File not found");
}

// 2. Download and read PDF
console.log("Reading PDF...");
const loader = new PDFLoader(file_name as string);
const pages = (await loader.load()) as PDFPage[];

// 3. Split and segment the PDF
console.log("Splitting PDF...");
const documents = await Promise.all(pages.map(prepareDocument));

// 4. Vectorize and embed individual docs
console.log("Embedding documents...");
const vectors = await Promise.all(documents.flat().map(embedDocument));

// 5. Upload to Pinecone
const client = await getPineconeClient();
const pineconeIndex = client.Index("teachtalk");

console.log("inserting vectors into pinecone");


const namespace = convertToAscii(fileKey);

// Push vectors to Pinecone index
await pineconeIndex.upsert({
  vectors: vectors as PineconeRecord<RecordMetadata>[],
  namespace: namespace,
});

console.log("Upload complete");

} catch (error) {
console.error(“Error in loadS3IntoPinecone”, error);
}
}

async function embedDocument(doc: Document): Promise {
try {
const embeddings = await getEmbeddings(doc.pageContent);
const hash = md5(doc.pageContent);

return {
  id: hash,
  values: embeddings,
  metadata: {
    text: doc.metadata.text,
    pageNumber: doc.metadata.pageNumber,
  },
} as PineconeRecord;

} catch (error) {
console.error(“Error in embedding document”, error);
throw error; // Ensure errors are propagated
}
}

// Converts to bytes then to a string
export const truncateStringByBytes = (str: string, bytes: number) => {
const enc = new TextEncoder();
return new TextDecoder(“utf-8”).decode(enc.encode(str).slice(0, bytes));
};

async function prepareDocument(page: PDFPage): Promise<Document> {
let { pageContent, metadata } = page;

// Replace empty line with space
pageContent = pageContent.replace(/\n/g, “”);

// Split the docs
const splitter = new RecursiveCharacterTextSplitter();
const docs = await splitter.splitDocuments([
new Document({
pageContent,
metadata: {
pageNumber: metadata.loc.pageNumber,
text: truncateStringByBytes(pageContent, 36000),
},
}),
]);

return docs;
}

ZacharyProser · June 28, 2024, 1:57pm

Hi @dattasumit2019, and Welcome to the Pinecone community forums!

Thanks for your question.

I think the issue relates to the typing of your embedDocument method:

Try updating it like this:

async function embedDocument(doc: Document): Promise<PineconeRecord<RecordMetadata>> {
  try {
    const embeddings = await getEmbeddings(doc.pageContent);
    const hash = md5(doc.pageContent);

    return {
      id: hash,
      values: embeddings,
      metadata: {
        text: doc.metadata.text,
        pageNumber: doc.metadata.pageNumber,
      },
    };
  } catch (error) {
    console.error("Error in embedding document", error);
    throw error;
  }
}

Now, in your loadS3IntoPinecone function, make sure your upsert call looks like this:

await pineconeIndex.upsert({
  vectors: vectors,
  namespace: namespace,
});

Give that a shot and let me know how it goes. If you’re still encountering an error after making these changes, please let us know:

The exact error message you’re seeing (if it’s different from the original one).
The version of the Pinecone SDK you’re using.
Any other relevant parts of your code that might be interacting with these functions.

Hope this helps!

Best,
Zack

dattasumit2019 · June 29, 2024, 7:44pm

tysmmm for the help!

Ok so I tried your solution and it had a same error :(, but that’s ok because I tweaked a few things to get this code:
Embed Document

async function embedDocument(doc: Document) {
  try {
    console.log(
      `Embedding document with page content: ${doc.pageContent.substring(
        0,
        100
      )}...`
    );
    const embeddings = await getEmbeddings(doc.pageContent);
    const hash = md5(doc.pageContent);

    return {
      id: hash,
      values: embeddings,
      metadata: {
        text: doc.metadata.text,
        pageNumber: doc.metadata.pageNumber,
      },
    } as PineconeRecord;
  } catch (error) {
    console.log("error embedding document", error);
    throw error;
  }
}

Upserting Code:

// 4. upload to pinecone
  const client = await getPineconeClient();
  const pineconeIndex = await client.index("teachtalk");
  const namespace = pineconeIndex.namespace(convertToAscii(fileKey));

  console.log("inserting vectors into pinecone");
  try {
    await namespace.upsert(vectors);
    console.log("vectors: " + vectors.length);
    return documents[0];
  } catch (error) {
    console.log("error inserting vectors into pinecone", error);
    throw error;
  }

So far it has no errors in the problems tab of vs code, but when the application is run, the terminal gives a weird error

Number of pages loaded from PDF: 0
Number of vectors created: 0
inserting vectors into pinecone
error inserting vectors into pinecone PineconeBadRequestError: No vectors provided for upsert request
    at mapHttpStatusError (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/http.js:179:20)
    at eval (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:65:69)
    at step (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:33:23)
    at Object.eval [as next] (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:14:53)
    at fulfilled (webpack-internal:///(rsc)/./node_modules/@pinecone-database/pinecone/dist/errors/handling.js:5:58)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5) {
  cause: undefined
}

(I got this from debugging statements)
So some part of the code isn’t working, the rest of the code is unchanged

ZacharyProser · July 1, 2024, 2:48pm

Hi @dattasumit2019 and thanks for your reply!

Could you modify your console.log statement in the catch block of your try/catch to print out what vectors looks like? I suspect there could still be a formatting issue…

console.log(`error inserting vectors: %o into pinecone`, vectors, error);

It’s also curious to me that your other log statements suggest no content was loaded from a target PDF file - could you also share your filesystem layout (where are the docs) and the code you’re using to load and chunk your documents as well?

Best,
Zack