“Streamlining RAG Apps with Canopy”
a technical deep dive into Pinecone’s open-source RAG framework: Canopy. With Canopy, you can build and launch Production GenAI apps quickly and easily. By taking care of chunking, vectorization, LLM orchestration, and prompt engineering, Canopy abstracts away the heavy lifting of building RAG pipelines, leaving you with the energy to focus what’s important: building your end product. And since it’s completely opensource, you can extend and customize Canopy to meet any use case. We’ve saved lots of time at the end for questions
Presenter: Audrey Lorberfeld
Panelists: James Briggs, Bear Douglas, Jocelyn
Time/location: January 9, 2024 via Zoom
In today’s webinar we’re seeing some input from the audience - here are some highlights. the webinar is live now and we’re updating this thread as it proceeds
-
Before we start can I say thanks to James [Briggs] for all his fantastic YouTube videos
Great to hear, look for more videos from the DevRel team -
Is the session being recorded
Yes! It will be sent via email to attendees post-event* -
Is canopy compatible to use with AWS lambda in production already?
Technically it should be yes, with lambda you might need to make some modification and will be using the canopy core lib - but can definitely be used -
I may have missed it, but can you add your own metadata fields as well?
Yes, you can add it within metadata fields at upsert time -
When do you expect Canopy to become multimodal?
6 to 9 months. It’s really complex…the first piece of business is getting drag and drop for documents and then probably PDF. But audio and video will be more complicated. We are doing some stuff with that open source library and we hope multimodal will be catalyzed by that collaboration -
I tried to use langchain for this but the latency is really high. Can you speak to this?
the latency with canopy itself is near 0. under the hood relies on Open AI right now as a default. So if Open AI is bottlenecked or throttled by public traffic…we do know that Chat GPT 3.5 turbo is much faster than chat GPT 4but it depends on the API. But on top of the API call, the Canopy work under the hood should be extremely minimal in terms of latency -
I dont understand if Canopy chooses the best embedding model by default or uses the data embedding
the data embedding model if always the embedding model by default. but let’s say you wanted to use a cohere model. You can change that in a configuration file - just point it to the model name and you can use a different model. -
Is there a way to snoop the prompts which are being sent to the server (say something like a filter before it starts processing the prompt))
I beleive the team were working on adding something like this so you can see the prompts that are being used/sent everywhere -
*is it possible to debug the context that is sent? *
Yes, definitely. -
Is there a way to block specific prompts if we want to build better control aroudn it?
not a way to block specific prompts yet within canopy -
Does Canopy support uploading PDF documents?
Canopy doesn’t handle PDFs yet, it is a a WIP -
does canopy help extract figures and tables in PDF?
No. right now it only works with text files. PDFs technically are images even though they have text information on them. hopefully with the work the collaboration we’re doing with things like unstructured and also internal efforts for we’re going to be able to extract figures and tables from PDF files, and have semi structured text, data extracted from PDFs, which would be great. -
Does Canopy sertifies to work with 3.12+
versions in requirements do cover 3.9 upwards (tests run up to 3.11) -
Can you use langchain for loading PDF text and then use the text with Canopy?
yes you can -
does can if you support any kind of reranking?
this is on the docket. it would be awesome to be able to see the ranking of that context that you retrieve and then also have reranking. Short answer is that it’s coming. And I am definitely on board with that as well. -
How does it compare to LangChain RAG which supports similar query rewrites with local models?
canopy is focused on RAG, Langchain is broader, for out of the box RAG I think canopy is easier to get started with — hopefully we will get integrated with langchain in the future for canopy RAG + langchain agents (and everything else that langchain does) -
what about running on prem
Right now we have no onprem, PInecone is HIPAA GDPR compliant but the actual call so so the call to Open AIs chat GPT would have to be through the API. Soon we will be able to do onprem everything -
Are there any plans to create campy with rerank endpoint?
Not yet [as of January 5, 2024] but I think reranking has been the most popular topic for all these questions. I’ll definitely give that back to the team and note that yeah endpoints not only to do customer ranking but also just the hit a reranker that is hooked on Co ere, would be nice. -
But isn’t the Canopy part on-prem if you use the libraries?
yes the canopy part would be on-prem, pinecone part would not -
The things that Canopy abstracts away for us (TYTYTY) are the things that typically need to be tuned to get from POC to Deploy. Is the intent then to use Canopy, see if it works as is (yippee!) and if not then go back and then tune what needs to be tuned
yes generally I’ve seen great out of the box performance, the team worked a lot on the pipeline, chunking, prompting, etc — it tends to work very nicely out of the box -
Is Canopy compatible to use AWS lambda already?
its possible to get it working in a container, but you can theoretically use the canopy call library and write your own pipeline to send over to lambda -
Plans to increase default chunk size?
To change it, just enter the size you prefer. Canopy has a bunch of methods under the hood that ensure when you send context to LLM all the chunk you retrieve and the size of the query will not exceed the window. If your chunk size is too big youll be notified with an error. this avoids open AI errors -
how to enable streaming by default?
Just set to true and do canopy start -
any plans to support RAG fusion?
currently not in the works, but anythign could happen in the future -
can you convert a csv to a mock table and do NLSQL?
it’s best to convert CSV to markdown or HTML format and you can search thorugh the tables, although you will likely want to do some custom table summarization etc beforehand -
can you elaborate on when to use RAG or non-RAG workflows, how can canopy help analyze the results
if you need to answer general knowledge, not up-to-date info, non-RAG queries is usually better, for the rest I would recommend RAG, you can test RAG vs. No-RAG performance in canopy-cli -
Can you use langchain for loading PDF text and then use the text with Canopy?
yes you can -
does it figure out the optimal chunking size?
right now it doesn’t do any type of like grid search based on any type of business heuristics you might feed it. There’s a maximum chunk size of 256. Our ream has found that is the size that works best for most use cases