How to extract useful data to feed it back to OpenAI?

Elindo586 · October 19, 2023, 3:04pm

I am working with JavaScript…

I think… I must do a query to pinecone to search a data base… after searching the database I suppose pinecone comes up with something back that OpenAI can use? ← but how do I extract useful information from the query response to feed it back to OpenAI?.. so far I can only see vectors from the response, and OpenAI works with actual text.

According the link for query results I only see vectors… how can this be transformed into useful data to feed it back to the AI?

LINK

FOLLOW UP QUESTIONS:

Is there a sample of the complete body of the response that I can see?.. this way it would be easier to track all the options…

Furthermore… is there a sample of the body for the upsert? I have seem some sample codes, but they might be short with all the information I need to do a proper query later on.

I can put small sections of the sample code I have if needed… I need to figured out how to connect the dots…

bryan · October 19, 2023, 4:13pm

Hey @Elindo586

Commonly only the embeddings of your content is stored in Pinecone as vectors. The original content (say text for example) is not stored in Pinecone unless you opt to include it as Metadata.

Also, there are limitations to the amount of metadata that can be stored and it can has a cost impact. Another approach is to maintain a mapping outside of Pinecone that allows you to lookup the original content (pre-embeddings) using the ID that you generated for each embedding before inserting it into Pinecone.

When you query Pinecone you are searching for ‘n’ number of ‘similar’ items as vectors. The response from Pinecone includes the IDs of those vectors (that you previously inserted) that most closely match your search. Note that If you store the original text as Metadata you’ll need to set the parameter “includeMetadata” on your query to have the Metadata associated with each Vector returned.

To make use of those results, say for example sending the most closely matching product descriptions, to OpenAI, you’d need to take those returned vector IDs and use them to lookup the original text version of the content and formulate a prompt to send over to OpenAI. This requires that you can access (lookup by ID) the corresponding original text.

This tutorial goes through a simple version of those steps, except it does not take the last step of calling OpenAI.

I realize this example is in python and not Javascript but it’s pretty straightforward and provides an easy to follow tutorial using a python notebook (simple setup) you can copy and walkthrough which I think will answer your question.

I hope that helps.

Elindo586 · October 19, 2023, 4:30pm

Say… for a question/answer bot… during the embedding/upsert process to pinecone I could add metadata that could include the question / answer text… maybe… I could tap into the metadata text and feed it back to openai…

I finally came across a sample from the github repo… but its confusing when you dont have a full sample of the body response for the query.

bryan · October 19, 2023, 4:43pm

I’m not following what you mean by not having a ‘body response for the query’.

Do you mean that in the response you are only getting the vectors id’s back like

{'matches': [{'id': '1329', 'score': 0.892151892, 'values': []},
             {'id': '1426', 'score': 0.853859782, 'values': []},
             {'id': '1623', 'score': 0.847651958, 'values': []}],
 'namespace': 'lorem-ipsum'}

If you have a table or data source where you can lookup the question/answer as text by the ID you would just need to make a subsequent call to that source and pull back the corresponding fields for question and answer.

Otherwise try storing question and answer as metadata as part of your insert to Pinecone and then set the parameter “includeMetadata” on your query to have the Metadata associated with each Vector returned and see if that meets your needs. At the very least it should give you a more informative query response that helps you understand what the results of your Pinecone query are.

There are tradeoffs to both approaches.

system · October 20, 2023, 4:43pm

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.