Failed to ingest data

ishaqjan619 · May 28, 2024, 9:55am

I have been trying a lot to ingest my pdf files into serverless index of pinecone of free trail I get this error

error [ErrorWithoutStackTrace: PineconeClient: Error calling upsert: ErrorWithoutStackTrace: PineconeClient: Error calling upsertRaw: FetchError: The request failed and the interceptors did not return an alternative response]

this error doesn’t come on the pinecone paid pods which I am using, It happens when I switch to the aws serverless index and the data is not getting ingested, I can share my code of the ingest.ts which is below;

import { RecursiveCharacterTextSplitter } from ‘langchain/text_splitter’;
import { OpenAIEmbeddings } from ‘langchain/embeddings/openai’;
import { PineconeStore } from ‘langchain/vectorstores/pinecone’;
import { pinecone } from ‘@/utils/pinecone-client’;
import { CustomPDFLoader } from ‘@/utils/customPDFLoader’;
import { PINECONE_INDEX_NAME, PINECONE_NAME_SPACE } from ‘@/config/pinecone’;
import { DirectoryLoader } from ‘langchain/document_loaders/fs/directory’;

/* Name of directory to retrieve your files from */
const filePath = ‘docs’;

export const run = async () => {
try {
/* Load raw docs from all files in the directory */
const directoryLoader = new DirectoryLoader(filePath, {
‘.pdf’: (path) => new CustomPDFLoader(path),
});

const rawDocs = await directoryLoader.load();

// Extracting the file name using regular expressions and updating metadata
const processedDocs = rawDocs.map(doc => {
  const fileName = doc.metadata.source.match(/[^\\\/]+$/)?.[0] || doc.metadata.source;
  const modifiedMetadata = { ...doc.metadata, source: fileName };
  return { ...doc, metadata: modifiedMetadata };
});

/* Split text into chunks */
const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});

const docs = await textSplitter.splitDocuments(processedDocs);
console.log('split docs', docs);

console.log('creating vector store...');
/* Create and store the embeddings in the vectorStore */
const embeddings = new OpenAIEmbeddings();
const index = pinecone.Index(PINECONE_INDEX_NAME); // Change to your own index name

// Embed the PDF documents
await PineconeStore.fromDocuments(docs, embeddings, {
  pineconeIndex: index,
  namespace: PINECONE_NAME_SPACE,
  textKey: 'text',
});

} catch (error) {
console.log(‘error’, error);

throw new Error('Failed to ingest your data');

}
};

(async () => {
await run();
console.log(‘ingestion complete’);
})();

ZacharyProser · May 28, 2024, 5:32pm

Hi @ishaqjan619, and welcome to the Pinecone community forums!

Thanks for your question, and sorry you’re running into this issue.

Could you please share the code where you’re creating your serverless Pinecone index?

This will help us further debug what might be going wrong.

Best,
Zack

ishaqjan619 · May 28, 2024, 6:31pm

I do not use any code to create the pinecone serverless index, I just create it it manually through the pinecone vector store from their specifying all the details;

However I am using the below code for ingestion the data in nodejs;