Query returns no results from index - how to troubleshoot?

I am using PHP to get embedding from OpenAI. I created an index on PineCone using the UI. The metric is cosine. The whoami, databases and describe API calls are all working and I am connecting to PineCone and my index.

This is the code I used to upsert my vectors to PineCone:

These are the embeddings I’ve been upserting:

// Generate the embeddings for this document
    $apiEndpoint = 'https://api.openai.com/v1/embeddings';
    $apiParameters = [
        'input' => $contents,
        'model' => 'text-embedding-ada-002',
        'tokenizer' => 'cl100k_base',
        'max_tokens' => 8191,
        'output_dimension' => 1536
    ];
    $client = new \GuzzleHttp\Client();
    $response = $client->post($apiEndpoint, [
        'headers' => ['Authorization' => "Bearer $apiKey"],
        'json' => $apiParameters
    ]);

    if ($response->getStatusCode() == 200) {
        // Extract the embeddings from the response
        $responseData = json_decode($response->getBody(), true);
        $embedding = $responseData['data'][0]['embedding'];

        // Create an object for this document's embeddings and add it to the output file
        $outputObject = [
            'id' => $id,
            'metadata' => [
                'title' => $title,
                'book' => $book,
                'library' => $library
            ],
            'values' => $embedding
        ];
        $outputLine = json_encode($outputObject, JSON_PRETTY_PRINT);

I essentially use the same code to grab the embeddings for a query ($text). The query I am using matches at least a couple of key words in at least one of the documents upserted into the index.

  // Define the API parameters
    $apiParameters = [
        'input' => $text,
        'model' => 'text-embedding-ada-002',
        'tokenizer' => 'cl100k_base',
        'max_tokens' => 8191,
        'output_dimension' => 1536
    ];

    $apiQuery = json_encode($apiParameters);

    // Send the API request using cURL
    $apiResponse = curl_exec($ch);
    
    // Decode the API response and extract the embedding and total_tokens values
    $responseArray = json_decode($apiResponse, true);
    $embedding = $responseArray['data'][0]['embedding'];

If I varDump $embedding, this is what I see:

array(1536) {
  [0]=>
  float(-0.014706593)
  [1]=>
  float(-0.0030323046)
  [2]=>
  float(-0.0022425186)
  [3]=>
  float(-0.013908351)
  [4]=>
  float(-0.021363119)
  [5]=>
  float(0.013265698)
etc...

However, when I run a PineCone query against my index using the retrieved $embedding:

    // Set request data
    $requestData = array(
        'vector' => $embedding,
        'topK' => $topK,
        'includeValues' => true
    );
    $jsonData = json_encode($requestData);

I get absolutely nothing.

   object(stdClass)#2 (3) {
      ["results"]=>
      array(0) {
      }
      ["matches"]=>
      array(0) {
      }
      ["namespace"]=>
      string(0) ""
    }

At this point, I don’t know if this is a problem with the index, my upserted vectors, my query vectors, just not finding any similarities (although I know they are there) or what? Can someone help me troubleshoot this?

Discovered the problem. I did not see it mentioned in the documentation that if your index has a namespace, you must include the namespace in the query.

This documentation states that you are able to query by namespace, but it does not indicate that you must include the namespace if the index has one. Or, perhaps I am missing something somewhere else in the documentation.

In any event, my queries are executing as expected. Issue (this one, at least) resolved.

1 Like

I mean, the fact that you need to include a namespace prevent any unexpected query on your dataset.

I don’t see that as a bug or a lack in the documentation. But perhaps it could be added as a note…