I am using PHP to get embedding from OpenAI. I created an index on PineCone using the UI. The metric is cosine. The whoami, databases and describe API calls are all working and I am connecting to PineCone and my index.
This is the code I used to upsert my vectors to PineCone:
These are the embeddings I’ve been upserting:
// Generate the embeddings for this document
$apiEndpoint = 'https://api.openai.com/v1/embeddings';
$apiParameters = [
'input' => $contents,
'model' => 'text-embedding-ada-002',
'tokenizer' => 'cl100k_base',
'max_tokens' => 8191,
'output_dimension' => 1536
];
$client = new \GuzzleHttp\Client();
$response = $client->post($apiEndpoint, [
'headers' => ['Authorization' => "Bearer $apiKey"],
'json' => $apiParameters
]);
if ($response->getStatusCode() == 200) {
// Extract the embeddings from the response
$responseData = json_decode($response->getBody(), true);
$embedding = $responseData['data'][0]['embedding'];
// Create an object for this document's embeddings and add it to the output file
$outputObject = [
'id' => $id,
'metadata' => [
'title' => $title,
'book' => $book,
'library' => $library
],
'values' => $embedding
];
$outputLine = json_encode($outputObject, JSON_PRETTY_PRINT);
I essentially use the same code to grab the embeddings for a query ($text). The query I am using matches at least a couple of key words in at least one of the documents upserted into the index.
// Define the API parameters
$apiParameters = [
'input' => $text,
'model' => 'text-embedding-ada-002',
'tokenizer' => 'cl100k_base',
'max_tokens' => 8191,
'output_dimension' => 1536
];
$apiQuery = json_encode($apiParameters);
// Send the API request using cURL
$apiResponse = curl_exec($ch);
// Decode the API response and extract the embedding and total_tokens values
$responseArray = json_decode($apiResponse, true);
$embedding = $responseArray['data'][0]['embedding'];
If I varDump $embedding, this is what I see:
array(1536) {
[0]=>
float(-0.014706593)
[1]=>
float(-0.0030323046)
[2]=>
float(-0.0022425186)
[3]=>
float(-0.013908351)
[4]=>
float(-0.021363119)
[5]=>
float(0.013265698)
etc...
However, when I run a PineCone query against my index using the retrieved $embedding:
// Set request data
$requestData = array(
'vector' => $embedding,
'topK' => $topK,
'includeValues' => true
);
$jsonData = json_encode($requestData);
I get absolutely nothing.
object(stdClass)#2 (3) {
["results"]=>
array(0) {
}
["matches"]=>
array(0) {
}
["namespace"]=>
string(0) ""
}
At this point, I don’t know if this is a problem with the index, my upserted vectors, my query vectors, just not finding any similarities (although I know they are there) or what? Can someone help me troubleshoot this?