Grpc StatusRuntimeException: UNAVAILABLE: io exception using Spark

shyamSiyer · July 10, 2024, 10:34am

If I try to upsert to the pinecone index (Pod) using the following method (referred from Databricks - Pinecone Docs)

(
            df.write
            .option("pinecone.apiKey", api_key)
            .option("pinecone.indexName", index_name)
            .option("pinecone.projectName", project_name)
            .option("pinecone.environment", environment)
            .format("io.pinecone.spark.pinecone.Pinecone")
            .mode("append")
            .save()
        )

I am getting

org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 4584.0 failed 4 times, most recent failure: Lost task 5.3 in stage 4584.0 (TID 1437233) (executor 683): io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
---------------------------------------------------------------------------
Py4JJavaError                             Traceback (most recent call last)
File <command-2093907735488114>, line 24
     14 for attempt in range(max_retries):
     15     try:
     16         (
     17             df.write
     18             .option("pinecone.apiKey", api_key)
     19             .option("pinecone.indexName", index_name)
     20             .option("pinecone.projectName", project_name)
     21             .option("pinecone.environment", environment)
     22             .format("io.pinecone.spark.pinecone.Pinecone")
     23             .mode("append")
---> 24             .save()
     25         )
     26         break  # Exit the loop if the write operation is successful
     27     except Exception as e:

File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function.<locals>.wrapper(*args, **kwargs)
     45 start = time.perf_counter()
     46 try:
---> 47     res = func(*args, **kwargs)
     48     logger.log_success(
     49         module_name, class_name, function_name, time.perf_counter() - start, signature
     50     )
     51     return res

File /databricks/spark/python/pyspark/sql/readwriter.py:1679, in DataFrameWriter.save(self, path, format, mode, partitionBy, **options)
   1677     self.format(format)
   1678 if path is None:
-> 1679     self._jwrite.save()
   1680 else:
   1681     self._jwrite.save(path)

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.__call__(self, *args)
   1349 command = proto.CALL_COMMAND_NAME +\
   1350     self.command_header +\
   1351     args_command +\
   1352     proto.END_COMMAND_PART
   1354 answer = self.gateway_client.send_command(command)
-> 1355 return_value = get_return_value(
   1356     answer, self.gateway_client, self.target_id, self.name)
   1358 for temp_arg in temp_args:
   1359     if hasattr(temp_arg, "_detach"):

File /databricks/spark/python/pyspark/errors/exceptions/captured.py:188, in capture_sql_exception.<locals>.deco(*a, **kw)
    186 def deco(*a: Any, **kw: Any) -> Any:
    187     try:
--> 188         return f(*a, **kw)
    189     except Py4JJavaError as e:
    190         converted = convert_exception(e.java_exception)

File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
    324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
    325 if answer[1] == REFERENCE_TYPE:
--> 326     raise Py4JJavaError(
    327         "An error occurred while calling {0}{1}{2}.\n".
    328         format(target_id, ".", name), value)
    329 else:
    330     raise Py4JError(
    331         "An error occurred while calling {0}{1}{2}. Trace:\n{3}\n".
    332         format(target_id, ".", name, value))

Py4JJavaError: An error occurred while calling o478.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in stage 4584.0 failed 4 times, most recent failure: Lost task 5.3 in stage 4584.0 (TID 1437233) ([host IP] executor 683): io.grpc.StatusRuntimeException: UNAVAILABLE: io exception
Channel Pipeline: [SslHandler#0, ProtocolNegotiators$ClientTlsHandler#0, WriteBufferingAndExceptionHandler#0, DefaultChannelPipeline$TailContext#0]
	at io.grpc.stub.ClientCalls.toStatusRuntimeException(ClientCalls.java:262)
	at io.grpc.stub.ClientCalls.getUnchecked(ClientCalls.java:243)
	at io.grpc.stub.ClientCalls.blockingUnaryCall(ClientCalls.java:156)
	at

ZacharyProser · July 10, 2024, 6:29pm

Hi @shyamSiyer and welcome to the Pinecone community forums!

Thank you for your question.

Could you please share all your relevant code, being careful not to include any secrets such as your Pinecone API key?

In the meantime, I’m reaching out to a team member who worked on the databricks integration.

Best,
Zack

ZacharyProser · July 11, 2024, 12:25am

Hi @shyamSiyer,

I heard back from one of our engineers closest to the connector that passing projectname and env is not valid anymore.

Could you please share:

The version of the Spark connector you’re using
The link to the documentation that you’re following currently?

Best,
Zack

shyamSiyer · July 11, 2024, 9:52am

But that passing without projectname and env is erroring out. They are asking for project name and env.
Databricks Runtime: 14.2.x-gpu-ml-scala2.12
Spark Version: 3.5.0
The link I am following is already mentioned (Databricks - Pinecone Docs), the only difference is that I am using model all-mpnet-base-v2.

rohan.s · July 11, 2024, 7:57pm

Hi @shyamSiyer,

I lead the spark connector work and I’m curious to know what spark connector version you’re using i.e. which pinecone’s spark connector assembly jar you imported via s3 bucket in the databricks environment?

Also I dont see projectName and environment on the Databricks-Pinecone Docs, so would you be kind enough to provide the exact link to that doc? The doc mentions the following:

embeddings_df.write  
    .option("pinecone.apiKey", api_key) 
    .option("pinecone.indexName", index_name)  
    .format("io.pinecone.spark.pinecone.Pinecone")  
    .mode("append")  
    .save()

Lastly, would you please provide the whole codebase and not just the df.write command.

Thanks

shyamSiyer · July 12, 2024, 4:34am

Hi Rohan
I am using the spark-pinecone-uberjar from dbfs (dbfs:/FileStore/lib/spark-pinecone-uberjar.jar). I am using a databricks notebook and these installations have already been done on the provided compute machine, just for your information. But when I try to install a new jar file on my machine using the s3 path shown on the site (databricks compute machine), I am getting a permission error. Here is the code (Google Colab)
Details on jar file
Manifest-Version: 1.0
Implementation-Title: spark-test
Implementation-Version: 0.1.0-SNAPSHOT
Specification-Vendor: spark-test
Specification-Title: spark-test
Implementation-Vendor-Id: spark-test
Specification-Version: 0.1.0-SNAPSHOT
Main-Class: org.example.pinecone.MainJob
Implementation-Vendor: spark-test

rohan.s · July 12, 2024, 7:23pm

Hi @shyamSiyer,
I’m assuming your compute was setup by manually uploading the assembly/uber jar to the dbfs in dbr environment? If so, you can upload a new one since projectName and environment were accepted before v1.0.0 release, so please use the latest assembly jar.

If you’re trying to use the s3 bucket to import the jar, please follow the instructions on the README.

Also note that the dbr environment does require importing the assembly jar because of the way dependency management works in dbr.

Thanks!

shyamSiyer · July 15, 2024, 4:28am

Thank you for your reply. I tried to install s3://pinecone-jars/1.1.0/spark-pinecone-uberjar.jar. This is the error I am facing.

DRIVER_LIBRARY_INSTALLATION_FAILURE. Error Message: java.util.concurrent.ExecutionException: java.nio.file.AccessDeniedException: s3a://pinecone-jars/1.1.0/spark-pinecone-uberjar.jar: getFileStatus on s3a://pinecone-jars/1.1.0/spark-pinecone-uberjar.jar: com.amazonaws.services.s3.model.AmazonS3Exception: Forbidden; request: HEAD https://pinecone-jars.s3.us-east-1.amazonaws.com 1.1.0/spark-pinecone-uberjar.jar {} Hadoop 3.3.6, aws-sdk-java/1.12.610 Linux/5.15.0-1063-aws OpenJDK_64-Bit_Server_VM/25.392-b08 java/1.8.0_392 scala/2.12.15 kotlin/1.6.0 vendor/Azul_Systems,_Inc. cfg/retry-mode/legacy

rohan.s · July 15, 2024, 1:31pm

Hi,

I have attached the screenshot for installing the uber jar via s3 and I am successfully able to upsert data without any read access error. So would you please let me know how you tried installing the library and if installing it in the way described below is still giving you issues?

Thanks!

shyamSiyer · July 17, 2024, 9:25am

Thank you for your reply. I am now able to install it

But now I am getting an error like this

Py4JJavaError: An error occurred while calling o439.save.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 6.0 failed 4 times, most recent failure: Lost task 3.3 in stage 6.0 (TID 31) (ip-10-2-54-106.ec2.internal executor driver): java.lang.NoClassDefFoundError: Could not initialize class io.grpc.netty.NettyChannelBuilder
at io.pinecone.configs.PineconeConnection.buildChannel(PineconeConnection.java:139)
at io.pinecone.configs.PineconeConnection.(PineconeConnection.java:80)
at io.pinecone.spark.pinecone.PineconeDataWriter.(PineconeDataWriter.scala:23)
at io.pinecone.spark.pinecone.PineconeDataWriterFactory.createWriter(PineconeDataWriterFactory.scala:10)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:494)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:483)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:570)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:446)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:190)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:155)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:149)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:101)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:1013)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:106)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1016)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:903)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)

Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:3903)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:3822)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:3809)
at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:3809)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1680)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1665)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1665)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:4149)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4061)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:4049)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:55)
at org.apache.spark.scheduler.DAGScheduler.$anonfun$runJob$1(DAGScheduler.scala:1329)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:1317)
at org.apache.spark.SparkContext.runJobInternal(SparkContext.scala:3038)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:3021)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2(WriteToDataSourceV2Exec.scala:443)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.writeWithV2$(WriteToDataSourceV2Exec.scala:417)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.writeWithV2(WriteToDataSourceV2Exec.scala:277)
at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run(WriteToDataSourceV2Exec.scala:395)
at org.apache.spark.sql.execution.datasources.v2.V2ExistingTableWriteExec.run$(WriteToDataSourceV2Exec.scala:394)
at org.apache.spark.sql.execution.datasources.v2.AppendDataExec.run(WriteToDataSourceV2Exec.scala:277)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$2(V2CommandExec.scala:48)
at org.apache.spark.sql.execution.SparkPlan.runCommandWithAetherOff(SparkPlan.scala:180)
at org.apache.spark.sql.execution.SparkPlan.runCommandInAetherOrSpark(SparkPlan.scala:191)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.$anonfun$result$1(V2CommandExec.scala:48)
at com.databricks.spark.util.FrameProfiler$.record(FrameProfiler.scala:94)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:47)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:45)
at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:56)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$5(QueryExecution.scala:381)
at com.databricks.util.LexicalThreadLocal$Handle.runWith(LexicalThreadLocal.scala:63)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$4(QueryExecution.scala:381)
at org.apache.spark.sql.catalyst.QueryPlanningTracker$.withTracker(QueryPlanningTracker.scala:167)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$3(QueryExecution.scala:381)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$9(SQLExecution.scala:388)
at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:701)
at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId0$1(SQLExecution.scala:277)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:1175)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId0(SQLExecution.scala:164)
at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:638)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$2(QueryExecution.scala:377)
at org.apache.spark.sql.execution.QueryExecution$.withInternalError(QueryExecution.scala:1160)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.$anonfun$applyOrElse$1(QueryExecution.scala:373)
at org.apache.spark.sql.execution.QueryExecution.org$apache$spark$sql$execution$QueryExecution$$withMVTagsIfNecessary(QueryExecution.scala:324)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:370)
at org.apache.spark.sql.execution.QueryExecution$$anonfun$$nestedInanonfun$eagerlyExecuteCommands$1$1.applyOrElse(QueryExecution.scala:346)
at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:477)
at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:83)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:477)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:379)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:375)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:40)
at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:453)
at org.apache.spark.sql.execution.QueryExecution.$anonfun$eagerlyExecuteCommands$1(QueryExecution.scala:346)
at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:436)
at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:346)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:283)
at org.apache.spark.sql.execution.QueryExecution.commandExecuted(QueryExecution.scala:280)
at org.apache.spark.sql.execution.QueryExecution.assertCommandExecuted(QueryExecution.scala:446)
at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:1041)
at org.apache.spark.sql.DataFrameWriter.saveInternal(DataFrameWriter.scala:355)
at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:272)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:397)
at py4j.Gateway.invoke(Gateway.java:306)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:199)
at py4j.ClientServerConnection.run(ClientServerConnection.java:119)
at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class io.grpc.netty.NettyChannelBuilder
at io.pinecone.configs.PineconeConnection.buildChannel(PineconeConnection.java:139)
at io.pinecone.configs.PineconeConnection.(PineconeConnection.java:80)
at io.pinecone.spark.pinecone.PineconeDataWriter.(PineconeDataWriter.scala:23)
at io.pinecone.spark.pinecone.PineconeDataWriterFactory.createWriter(PineconeDataWriterFactory.scala:10)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run(WriteToDataSourceV2Exec.scala:494)
at org.apache.spark.sql.execution.datasources.v2.WritingSparkTask.run$(WriteToDataSourceV2Exec.scala:483)
at org.apache.spark.sql.execution.datasources.v2.DataWritingSparkTask$.run(WriteToDataSourceV2Exec.scala:570)
at org.apache.spark.sql.execution.datasources.v2.V2TableWriteExec.$anonfun$writeWithV2$2(WriteToDataSourceV2Exec.scala:446)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$3(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.$anonfun$runTask$1(ResultTask.scala:82)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:201)
at org.apache.spark.scheduler.Task.doRunTask(Task.scala:190)
at org.apache.spark.scheduler.Task.$anonfun$run$5(Task.scala:155)
at com.databricks.unity.UCSEphemeralState$Handle.runWith(UCSEphemeralState.scala:45)
at com.databricks.unity.HandleImpl.runWith(UCSHandle.scala:103)
at com.databricks.unity.HandleImpl.$anonfun$runWithAndClose$1(UCSHandle.scala:108)
at scala.util.Using$.resource(Using.scala:269)
at com.databricks.unity.HandleImpl.runWithAndClose(UCSHandle.scala:107)
at org.apache.spark.scheduler.Task.$anonfun$run$1(Task.scala:149)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.scheduler.Task.run(Task.scala:101)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$10(Executor.scala:1013)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally(SparkErrorUtils.scala:64)
at org.apache.spark.util.SparkErrorUtils.tryWithSafeFinally$(SparkErrorUtils.scala:61)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:106)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:1016)
at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at com.databricks.spark.util.ExecutorFrameProfiler$.record(ExecutorFrameProfiler.scala:110)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:903)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
… 1 more
File , line 13
5 from pyspark.sql.functions import lit
6 embeddings_df = embeddings_df.withColumn(“sparse_values”, lit(None))
7 (
8 embeddings_df.write
9 .option(“pinecone.apiKey”, api_key)
10 .option(“pinecone.indexName”, index_name)
11 .format(“io.pinecone.spark.pinecone.Pinecone”)
12 .mode(“append”)
—> 13 .save()
14 )
File /databricks/spark/python/pyspark/instrumentation_utils.py:47, in _wrap_function..wrapper(*args, **kwargs)
45 start = time.perf_counter()
46 try:
—> 47 res = func(*args, **kwargs)
48 logger.log_success(
49 module_name, class_name, function_name, time.perf_counter() - start, signature
50 )
51 return res
File /databricks/spark/python/pyspark/sql/readwriter.py:1731, in DataFrameWriter.save(self, path, format, mode, partitionBy, **options)
1729 self.format(format)
1730 if path is None:
→ 1731 self._jwrite.save()
1732 else:
1733 self._jwrite.save(path)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/java_gateway.py:1355, in JavaMember.call(self, *args)
1349 command = proto.CALL_COMMAND_NAME +
1350 self.command_header +
1351 args_command +
1352 proto.END_COMMAND_PART
1354 answer = self.gateway_client.send_command(command)
→ 1355 return_value = get_return_value(
1356 answer, self.gateway_client, self.target_id, self.name)
1358 for temp_arg in temp_args:
1359 if hasattr(temp_arg, “_detach”):
File /databricks/spark/python/pyspark/errors/exceptions/captured.py:248, in capture_sql_exception..deco(*a, **kw)
245 from py4j.protocol import Py4JJavaError
247 try:
→ 248 return f(*a, **kw)
249 except Py4JJavaError as e:
250 converted = convert_exception(e.java_exception)
File /databricks/spark/python/lib/py4j-0.10.9.7-src.zip/py4j/protocol.py:326, in get_return_value(answer, gateway_client, target_id, name)
324 value = OUTPUT_CONVERTER[type](answer[2:], gateway_client)
325 if answer[1] == REFERENCE_TYPE:
→ 326 raise Py4JJavaError(
327 “An error occurred while calling {0}{1}{2}.\n”.
328 format(target_id, “.”, name), value)
329 else:
330 raise Py4JError(
331 “An error occurred while calling {0}{1}{2}. Trace:\n{3}\n”.
332 format(target_id, “.”, name, value))