Ingestion into Pinecone has "HTTP POST Failed" error halfway through the ingestion process

mark2 · July 11, 2023, 10:41am

So, some vectors are written into Pinecone and I can see the count on the Pinecone dashboard, but then after a while the ingestion is terminated for some or other reason. Help needed please.

kbkev78 · July 12, 2023, 11:27pm

You will need to find an error message and post it here, and preferably also your code. It may not be Pinecone generating an error. It could be something happening in your code. For example, uploading vectors is often a lot of transactions, so maybe your local code is not handling concurrency properly, like getting socket exhaustion or something.

mark2 · July 17, 2023, 8:52pm

After upserting 5646 vectors I get this message in my terminal:

┌ POST /api/langchain 500 in 208910ms
│
├──── GET https://controller.asia-…/actions/whoami 200 in 1216ms (cache: MISS)
│
└── 1 level ── POST https://langchain-589e0b…/vectors/upsert 200 in 4364ms (cache: MISS)
And an “Internal Server Error”

My js code is as follows:
import { ChatOpenAI } from “langchain/chat_models/openai”;
import { HumanChatMessage } from “langchain/schema”;
import { RecursiveCharacterTextSplitter } from ‘langchain/text_splitter’;
import { OpenAIEmbeddings } from ‘langchain/embeddings/openai’;
import { PineconeStore } from ‘langchain/vectorstores/pinecone’;
import { PDFLoader } from ‘langchain/document_loaders/fs/pdf’;
import { DirectoryLoader } from ‘langchain/document_loaders/fs/directory’;
import { PineconeClient } from “@pinecone-database/pinecone”;

const runLLMChain = async (prompt) => {

console.log(‘runLLMChain’);

const encoder = new TextEncoder();

const stream = new TransformStream();
const writer = stream.writable.getWriter();

const client = new PineconeClient();
await client.init({
apiKey: process.env.PINECONE_API_KEY,
environment: process.env.PINECONE_ENVIRONMENT,
});
const pineconeIndex = client.Index(process.env.PINECONE_INDEX);

const vectorStore = await PineconeStore.fromExistingIndex(
new OpenAIEmbeddings(),
{ pineconeIndex }
);
console.log(vectorStore[0])
console.log(“Now here”);
/* Search the vector DB independently with meta filters */
// console.log(prompt);
// const results = await vectorStore.similaritySearch(prompt, 1);
// console.log(results);
// const topResult = result.results[0].vector; // Assuming you want to generate text based on the top result

//model.call([new HumanChatMessage(topResult)]);
model.call([new HumanChatMessage(prompt)]);

return stream.readable;
};

const runPineconeIngest = async (filePath) => {

console.log(‘runPineconeIngest’);

try {
console.log(filePath); // Print file path
const loader = new PDFLoader(filePath);
console.log(‘Step 1’);
const rawDocs = await loader.load();
console.log(‘Step 2’);

/* Split text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 900,
chunkOverlap: 100,
});
console.log(‘Step 3’);

const docs = await textSplitter.splitDocuments(rawDocs);
console.log(docs[docs.length - 1]);
console.log(‘Step 4’);

console.log(‘Creating vector store…’);
/create and store the embeddings in the vectorStore/
const embeddings = new OpenAIEmbeddings();
console.log(‘Step 5’);

const pinecone = new PineconeClient();
await pinecone.init({
environment: process.env.PINECONE_ENVIRONMENT,
apiKey: process.env.PINECONE_API_KEY
});
const index = pinecone.Index(process.env.PINECONE_INDEX_NAME)
console.log(‘Step 6’);

console.log("PINECONE_NAME_SPACE = " + process.env.PINECONE_NAME_SPACE);
console.log("PINECONE_INDEX_NAME = " + process.env.PINECONE_INDEX_NAME);
//embed the PDF documents
await PineconeStore.fromDocuments(docs, embeddings, {
pineconeIndex: index, //process.env.PINECONE_INDEX_NAME,
namespace: process.env.PINECONE_NAME_SPACE,
textKey: ‘text’,
});
console.log(“Step 7”) // Crash on the above step (I never see “Step 7” logged)

} catch (error) {
console.log(‘Error in runPineconeIngest:’, error);
//throw new Error(‘Failed to ingest your data’);
return “Error during Ingestion”;
};

console.log(“Step 8”);
return “Ingestion completed”;

};

export async function POST(req) {

const { prompt, filePath } = await req.json();
console.log(prompt, filePath);
console.log(“Here”);
if (prompt != null && prompt != “”) {
console.log(“POST-runLLMChain”);
const stream = runLLMChain(prompt);
return new Response(await stream);
} else {
console.log(“POST-runPineconeIngest”);
const result = await runPineconeIngest(filePath);
console.log(“Step 8”)
console.log(result);
console.log(“Step 9”)
return JSON.stringify({“result”: result});;
}
};

mark2 · July 22, 2023, 1:05pm

I’ve checked and all the vectors are actually created, so the 500: Internal Server Error seems to be a red herring. I think this issue is solved