Hi, i vectorized ticket data (support questions) into the pinecone DB and then use it to find similar tickets when a new ticket is created. However i am wondering how i can best vectorize the data, at the moment i take the original request of the ticket and all the actions within the ticket (customer and employee actions) and vectorize this with OpenAI. Afterwards i upload this with the metadata of the ticket for a quick return to the employee. When then a new ticket is created i take that request and title and vectorize this to compare to the DB, the results i get back i then take the ticket number to share internally. My tests seem to work fine, however i am wondering if this would be the best approach, and if using all ticket data in one vector entry is the best practice?
I’m thinking this is “too much” content such that the “meaning” of ticket data won’t be effectively preserved. This would likely not be a problem with only a small amount of ticket data, but as the number of tickets increases I think you’ll find the quality of your semantic search results will degrade.
One strategy could be to create “summaries” of ticket data and vectorize that instead? I’ve written about it here: https://www.ninetack.io/post/improving-rag-quality-by-summarization