Compare two long texts (similarity search by embedding)

alexcoch · May 3, 2023, 1:13pm

Hello,
How can I compare for long texts using Pinecone? The questions are of interest to the following:

What is new in this article compared to the articles previously marked as read, by points,
What of this article has already been in previously read articles, point by point.
I guess that the solution may lie through a bundle of Langchain, Pinecone, and GPT, but it is not entirely clear how to formulate the above questions. I am glad to any ideas.
What I did before:
I broke the text of the first article into pieces,
I got embedding with the GPT API to them,
Saved it in Pinecone,
I sent requests to Vector DB and then, together with the results obtained, to the GPT API and received human-like responses.

Perhaps, it’s worth storing chunks in the view of embeddings to Pinecone and then try compare this article with previous one. Is it possible?

alexcoch · May 3, 2023, 1:40pm

update:
How can I carry out similarity search at DB if I 've got an embedding (not a text query)? This embedding represents an one paragraph.