Sentence Similarity for Code Generation related tasks

cookiefinder · March 24, 2022, 3:47pm

I am working on code generation using OpenAI Codex.
Specifically, I’m making a personal project that lets users enter a specific “task” that they want to do related to code generation.
Currently I have around 10 tasks e.g. explain a piece of code, convert some code to a one liner, write unit-tests for a function.
My goal is to recognize the user’s intent. Normally, the approach would be to use a multi-class classification model but due to some time & data constraints, I want to use sentence similarity instead.
Basically my idea is that for each task:
Generate a list of of the 3-4 most common prompts that users can enter for this task
For any user query
Perform a sentence similarity search
For the task with the top similarity score
If score > threshold
recognize this task as the user’s actual intent
Else user intent is not among current set of tasks
My questions are:

Is this approach any good?
Should I use a Passage Ranking or Semantic Textual Similarity model? Or something else?
Is there any relevant dataset on which I can finetune the model to work on prompts specific to code generation?

dingusagar · March 24, 2022, 4:09pm

What you are trying to do is use sentence similarity models for an intent classification task. Yes this is very much possible. I had tried a couple of models from sbert for intent classification in a chatbot usecase. Seemed to work really well.

cookiefinder · March 24, 2022, 4:25pm

What models did you use?

sophiem · April 15, 2022, 4:54pm

This is great! If you have any demo code or apps that you want to share here or for us to put on our Showcase page, please let me know (either share them here or with sophie@pinecone.io).