403 when installing spark/pinecone connector on databricks

nick.resnick · March 1, 2025, 7:48pm

I’m getting a 403 permission denied error when trying to install the spark/pinecone connector on my databricks cluster, according to these docs: Databricks - Pinecone Docs.

I’m confused because the docs say to use an s3 path to install pinecone, but any s3 path I provide will point to the s3 account linked to my databricks workspace, presumably not pinecone’s s3?

Full error:

Library installation attempted on the driver node of cluster 1115-203734-kttv6dqs and failed. Please refer to the following error message or contact Databricks support. Error code: FAULT_OTHER, error message: java.util.concurrent.ExecutionException: java.nio.file.AccessDeniedException: s3a://pinecone-jars/1.1.0/spark-pinecone-uberjar.jar: getFileStatus on s3a://pinecone-jars/1.1.0/spark-pinecone-uberjar.jar: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://pinecone-jars.s3.us-east-1.amazonaws.com 1.1.0/spark-pinecone-uberjar.jar {} Hadoop 3.3.6, aws-sdk-java/1.12.390 Linux/5.15.0-1075-aws OpenJDK_64-Bit_Server_VM/25.412-b08 java/1.8.0_412 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy com.amazonaws.services.s3.model.GetObjectMetadataRequest; Request ID: 80FZQTEMSK8GYNJT, Extended Request ID: /DARMKvNQvlFHCNVGV+2hhgZS386tBMueWuhqaSBCti85/drVmumdMxV2vyfLnLZsnb105C9rVU=, Cloud Provider: AWS, Instance ID: i-01c99aa68497ca0d8 credentials-provider: com.amazonaws.auth.BasicSessionCredentials credential-header: AWS4-HMAC-SHA256 Credential=REDACTED_ACCESS_KEY(32a0c8de)/20250301/us-east-1/s3/aws4_request signature-present: true (Service: Amazon S3; Status Code: 403; Error Code: 403 Forbidden; Request ID: 80FZQTEMSK8GYNJT; S3 Extended Request ID: /DARMKvNQvlFHCNVGV+2hhgZS386tBMueWuhqaSBCti85/drVmumdMxV2vyfLnLZsnb105C9rVU=; Proxy: null), S3 Extended Request ID: /DARMKvNQvlFHCNVGV+2hhgZS386tBMueWuhqaSBCti85/drVmumdMxV2vyfLnLZsnb105C9rVU=:403 Forbidden

Cory_Pinecone · June 17, 2025, 9:15pm

Hi @nick.resnick,

This is odd, that bucket should be world readable. I tested accessing the bucket, both anonymously and with my personal AWS credentials, and was able to do so without issue. So the bucket is definitely publicly available.

It’s possible this is due to how Databricks handles S3 connections. It can sometimes send access credentials even when not needed, like in this case. Which can then cause confusion with AWS. There are two potential workarounds.

First, use a different URI for the bucket.

https://pinecone-jars.s3.us-east-1.amazonaws.com/1.1.0/spark-pinecone-uberjar.jar

That way, you’re going straight to the source via HTTPS and not using Databricks or AWS libraries.

The other is to download the file locally then copy the file to your DBFS for access within you runtime. This assumes you already have the AWS CLI utility installed.

aws cp s3://pinecone-jars/1.1.0/spark-pinecone-uberjar.jar /tmp

Using the direct HTTPS connection would be the simplest and how I would proceed.

Let us know if this works for you or if you run into any other issues.

Cory