ChatGPT GPT embedding + querying schema

connor · July 16, 2025, 1:52am

Hello folks,

I’m trying to use Pinecone to supply additional context to queries made in ChatGPT, via OpenAI’s “GPTs” tool.

I have a Pinecone index using llama-text-embed-v2 (with integrated embedding) to store contextual information.

My understanding is that the yaml code used in GPTs actions’ schema requires a REST API call to get the embedding for the query which then would (ideally) be used for a semantic search, but none of the obvious URLs seem to work (returns either 404s or blanks), from my index host to https://api.pinecone.io/embed to https://inference.pinecone.io/v1/embeddings (a suggestion made by ChatGPT )

I’m coming to the conclusion that (at this time) there isn’t such a URL to call, but I thought I’d check in with support to see if I’m missing something obvious.

Thank you in advance for your expertise and patience.

jocelyn · July 16, 2025, 2:33am

You’re correct that there isn’t a direct REST API endpoint for Pinecone’s integrated inference embedding models that you can call from a GPT action schema.

With Pinecone’s integrated inference (like your llama-text-embed-v2 index), the embedding happens automatically when you use the search functionality through the Python SDK. The embedding model is tied to your serverless index and processes text during search operations rather than through a separate embedding endpoint.

For your GPT action, you’ll need to create a middle layer (like a simple API endpoint) that:

Receives the query text from your GPT action
Uses the Pinecone Python SDK to perform the search (which handles embedding automatically)
Returns the results back to your GPT

Here’s how the search works with integrated inference in Python:

python

results = index.search(    namespace=namespace,    query={        "top_k": 10,        "inputs": {            'text': search_query        }    })

The integrated inference feature automatically embeds your query text using the llama-text-embed-v2 model you specified when creating the index, then performs the semantic search.

If you need a REST API approach, you’d need to create a simple web service that wraps this Python SDK functionality and exposes it as an endpoint your GPT action can call.

silas · July 16, 2025, 4:24pm

Hey @connor,

If you’re getting a 404 on the standalone embedding endpoint (https://api.pinecone.io/embed) I’m guessing it’s because you need to specify the version header X-Pinecone-API-Version. Our SDKs include that automatically but if you’re configuring the HTTP request in a different client you need to set it in the request config. Here’s an example curl:

curl -XPOST https://api.pinecone.io/embed \
  -H "Api-Key: YOUR-API-KEY" \
  -H "X-Pinecone-API-Version: 2025-04" \
  -d '{
  "model": "llama-text-embed-v2",
  "parameters": {
    "input_type": "passage"
  },
  "inputs": [
    {
      "text": "This is my query"
    }
  ]
}'