Issues with the Pinecone-Langchain Tutorial

CodingNoob · May 11, 2024, 3:00pm

Hello everyone,

I recently encountered an issue with the Pinecone Langchain Tutorial https://docs.pinecone.io/integrations/langchain#1-set-up-your-environment. First of all, I’d like to thank zack.p for the great recommendation

The setup of my environment went quite well, but when I tried to set up the knowledgebase, an issue occured.

For better understanding, here is my current code:

import os

# Laden der Umgebungsvariablen
pinecone_api_key = os.environ.get('PINECONE_API_KEY')
openai_api_key = os.environ.get('OPENAI_API_KEY')

import pinecone_datasets  
dataset = pinecone_datasets.load_dataset('wikipedia-simple-text-embedding-ada-002-100K')  
len(dataset)  

# Response:
# 100000

When I tried to execute my code, the following error message occured:

gcsfs.retry.HttpError: Invalid bucket name: ‘pinecone-datasets-dev\wikipedia-simple-text-embedding-ada-002-100K’, 400

I’ve got all the necessary dependencies described in this tutorial installed, so I don’t know why this is happening. I also tried different predefined datasets, but none of them worked.

Your help will be appreciated!

Thanks a lot in advance and have a nice day,

Coding Noob

ZacharyProser · May 13, 2024, 12:52pm

Hi @CodingNoob, thanks for your question!

I just attempted to reproduce this issue in a fresh Notebook and was unable to:

I’m curious if the issue is still occurring? Are you able to isolate the code that may be throwing the error and see if it’s perhaps unrelated to the code where you’re loading the dataset?

The fact that your dataset length comes back as 100,000 makes me think the loading actually succeeded, but that error may be coming from elsewhere in your code.

Hope this helps!

Best,
Zack