Unable to list Pinecone Public Datasets

I run this code in a notebook:

import pinecone_datasets

pinecone_datasets.list_datasets()

I get this error:

{
	"name": "TypeError",
	"message": "'GCSFile' object is not subscriptable",
	"stack": "---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
File ~/miniforge3/lib/python3.10/site-packages/pinecone_datasets/catalog.py:87, in Catalog.load(**kwargs)
     86 try:
---> 87     this_dataset = DatasetMetadata(**this_dataset_json)
     88     collected_datasets.append(this_dataset)

File ~/miniforge3/lib/python3.10/site-packages/pydantic/main.py:164, in BaseModel.__init__(__pydantic_self__, **data)
    163 __tracebackhide__ = True
--> 164 __pydantic_self__.__pydantic_validator__.validate_python(data, self_instance=__pydantic_self__)

ValidationError: 4 validation errors for DatasetMetadata
license
  Field required [type=missing, input_value={'name': 'ANN_DEEP1B_d96_...one, 'tokenizer': None}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing
description
  Field required [type=missing, input_value={'name': 'ANN_DEEP1B_d96_...one, 'tokenizer': None}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing
tags
  Field required [type=missing, input_value={'name': 'ANN_DEEP1B_d96_...one, 'tokenizer': None}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing
args
  Field required [type=missing, input_value={'name': 'ANN_DEEP1B_d96_...one, 'tokenizer': None}}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.5/v/missing

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
Cell In[7], line 3
      1 import pinecone_datasets
----> 3 pinecone_datasets.list_datasets()

File ~/miniforge3/lib/python3.10/site-packages/pinecone_datasets/public.py:30, in list_datasets(as_df, **kwargs)
      9 \"\"\"
     10 List all datasets in the catalog, optionally as a pandas DataFrame.
     11 Catalog is set using the `DATASETS_CATALOG_BASEPATH` environment variable.
   (...)
     27 
     28 \"\"\"
     29 global catalog
---> 30 catalog = Catalog.load(**kwargs)
     31 return catalog.list_datasets(as_df=as_df)

File ~/miniforge3/lib/python3.10/site-packages/pinecone_datasets/catalog.py:91, in Catalog.load(**kwargs)
     88             collected_datasets.append(this_dataset)
     89         except ValidationError:
     90             warnings.warn(
---> 91                 f\"metadata file for dataset: {f['name']} is not valid, skipping\"
     92             )
     93 except FileNotFoundError:
     94     pass

TypeError: 'GCSFile' object is not subscriptable"
}

Please let me know how I can fix this.
These are my library versions:
fsspec-2023.12.2
pinecone-client-3.2.2
pinecone-datasets-0.7.0
pyarrow-11.0.0
pydantic-1.10.15
urllib3-2.0.7

Hi @dhruv.anand and welcome to the Pinecone forums!

Thank you for your question.

I’m unable to reproduce this in a fresh Notebook (using Google Colab):

Could you please show me how you’re installing the pinecone_datasets library and which version you’re pulling down?

Which platform, if any, are you using to run your Notebook?

It looks like the list_datasets command worked for me with version pinecone_datasets-0.7.0

Hi @zack.p, it seems the error was because of some library versions for some of the dependencies.

This was the right set of versions that worked:
aiobotocore==2.12.3 aioitertools==0.11.0 botocore==1.34.69 fsspec==2023.12.2 gcsfs==2023.12.2.post1 google-api-core==2.19.0 google-cloud-core==2.4.1 google-cloud-storage==2.16.0 google-crc32c==1.5.0 google-resumable-media==2.7.0 googleapis-common-protos==1.63.0 pinecone-client==3.2.2 pinecone-datasets==0.7.0 proto-plus==1.23.0 pyarrow==11.0.0 pydantic==1.10.15 s3fs==2023.12.2 wrapt==1.16.0

Thanks for giving it a shot!

1 Like

@dhruv.anand,

Awesome - glad to hear you got it working!

No problem :grinning:

All the best,
Zack