"canopy chat" points to another LLM not defined in config file

Working with the canopy-cli, linux via command line.

I created canopy index using a config file, and the config file was created via command line as instructed by Pinecone. In the config file, I set explicitly to use “gpt-3.5-turbo” for tokenizer and LLM. Furthermore, I restricted (limit) in OpenAI API to allow the project to only use “gpt-3.5-turbo”.
The “canopy new --config …”, “canopy upsert --config …” and “canopy start --config …” worked well or at least we can say that it run (to be more precise).
However, “canopy chat --no-rag” or “canopy chat” seem to point to another LLM in OpenAI, “gpt-3.5-turbo-0125”. Therefore, when I pass a query I received an exception the the following text (snipet),

… project xxxx has no permission to use gpt-3.5-turbo-0125 …

The “canopy chat” has no argument to point to a config file. And the questions are:

  • am I doing something wrong?
  • is there a way to force “canopy chat” to use a config file?

Thank you for your support.

Hi @alejandro, and welcome to the Pinecone forums!

Thank you for your question and sorry to hear you’re running into this.

If I’m understanding your question correctly, most canopy commands work when passing the --config flag, but the chat command does not appear to read or honor the config file?

A couple of clarifying questions so we can better help:

  1. Are you following the Canopy docs in the repository? canopy/docs at main · pinecone-io/canopy · GitHub

  2. Are you able to share your configuration file (After removing any secrets like your API keys)?

In the meantime, I’m going to pass your question back to the Canopy team for visibility. You may also want to consider filing your issue against the GitHub repo so the team will see it faster.

Best,
Zack

Hello @ZacharyProser , and thank you for your reply.

  1. Yes indeed, I’ve followed, only, the docs. I’ve cloned the pinecone-io/canopy and take it form there step by step. “canopy --version” → 0.9.0

A bit more context, I run the following commands:

  • Setup API key in Pinecone
  • Run canopy create-config path_to_dir (I’ve shared the config in point 2)
  • Setup key in OpenAI, limited to use “gpt-3.5-turbo” and “text-embedding-3-small”.
  • Edit the config file to use “gpt-3.5-turbo” and “text-embedding-3-small”
  • Setup required env. variables
  • Run canopy new --config path_to_config_file, it created the “canopy–test” index.
  • Added “canopy-test” to env. variables.
  • Run canopy upsert path_to_jsonl_file, it indexed all json lines.
  • Run canopy start --config path_to_config_file, server started and I can access http://127.0.0.1:8000/docs
  • Open another terminal window, and run canopy chat --no-rag --config path_to_config but I received and exception. No --config flag for canopy chat
  • The I run canopy chat --no-rag, was prompt to enter query.
  • Enter a query, and I received an exception and canopy chat exited. The exception message included"

"project xxxx has no permission to use “project xxxx has no permission to use gpt-3.5-turbo-0125”

Now the interesting part is, the query result “context” part was printed, but the “no context” part was the one that could not be executed and returned the exception.

Follow the exception, I added “gpt-3.5-turbo-0125” in OpenAI API.
Then I went back, to run canopy chat --no-rag, executed the query and all worked.

  1. Here is my config file:
# ===========================================================
#            Configuration file for Canopy Server
# ===========================================================

# ---------------------------------------------------------------------------------
# LLM prompts
# Defined here for convenience, then referenced in the chat engine configuration
# Note: line breaks in the prompts are important, and may affect the LLM's behavior
# ---------------------------------------------------------------------------------
system_prompt: &system_prompt |
  Use the following pieces of context to answer the user question at the next messages. This context retrieved from a knowledge database and you should use only the facts from the context to answer. Always remember to include the source to the documents you used from their 'source' field in the format 'Source: $SOURCE_HERE'.
  If you don't know the answer, just say that you don't know, don't try to make up an answer, use the context.
  Don't address the context directly, but use it to answer the user question like it's your own knowledge.
query_builder_prompt: &query_builder_prompt |
  Your task is to formulate search queries for a search engine, to assist in responding to the user's question.
  You should break down complex questions into sub-queries if needed.


tokenizer:
  # -------------------------------------------------------------------------------------------
  # Tokenizer configuration
  # A Tokenizer singleton instance must be initialized before initializing any other components
  # -------------------------------------------------------------------------------------------
  type: OpenAITokenizer                 # Options: [OpenAITokenizer]
  params:
    model_name: gpt-3.5-turbo


chat_engine:
  # -------------------------------------------------------------------------------------------------------------
  # Chat engine configuration
  # The chat engine is the main component of the server, generating responses to the `/chat.completion` API call.
  # The configuration is recursive, so that each component contains a subsection for each of its sub components.
  # -------------------------------------------------------------------------------------------------------------
  params:
    max_prompt_tokens: 4096             # The maximum number of tokens to use for input prompt to the LLM
    max_generated_tokens: 600          # Leaving `null` will use the default of the underlying LLM
    max_context_tokens: null            # Leaving `null` will use 70% of `max_prompt_tokens`
    system_prompt: *system_prompt       # The chat engine's system prompt for calling the LLM
    allow_model_params_override: false  # Whether to allow overriding the LLM's parameters in an API call

  history_pruner:             # How to prune messages if chat history is too long. Options: [RecentHistoryPruner, RaisingHistoryPruner]
    type: RecentHistoryPruner
    params:
      min_history_messages: 1             # Minimal number of messages to keep in history

  llm: &llm
    # -------------------------------------------------------------------------------------------------------------
    # LLM configuration
    # Configuration of the LLM (Large Language Model)
    # -------------------------------------------------------------------------------------------------------------
    type: OpenAILLM                     # Options: [OpenAILLM, AzureOpenAILLM]
    params:
      model_name: gpt-3.5-turbo         # The name of the model to use.
      # You can add any additional parameters which are supported by the LLM's `ChatCompletion()` API. The values
      # set here will be used in every LLM API call, but may be overridden if `allow_model_params_override` is true.
#      temperature: 0.7
#      top_p: 0.9

  query_builder:
    # -------------------------------------------------------------------------------------------------------------
    # LLM configuration
    # Configuration of the LLM (Large Language Model)
    # -------------------------------------------------------------------------------------------------------------
    type: FunctionCallingQueryGenerator # Options: [FunctionCallingQueryGenerator, LastMessageQueryGenerator, InstructionQueryGenerator]
    params:
      prompt: *query_builder_prompt     # The query builder's system prompt for calling the LLM
      function_description:             # A function description passed to the LLM's `function_calling` API
        Query search engine for relevant information

    llm:  # The LLM that the query builder will use to generate queries. Leave `*llm` to use the chat engine's LLM
      <<: *llm

  context_engine:
    # -------------------------------------------------------------------------------------------------------------
    # ContextEngine configuration
    # The context engine is responsible for generating textual context for the `/query` API calls.
    # -------------------------------------------------------------------------------------------------------------
    params:
      global_metadata_filter: null      # An optional metadata filter to apply to all queries

    context_builder:
      # -------------------------------------------------------------------------
      # Configuration for the ContextBuilder subcomponent of the context engine.
      # The context builder is responsible for formulating a textual context given query results.
      # -------------------------------------------------------------------------
      type: StuffingContextBuilder      # Options: [StuffingContextBuilder]

    knowledge_base:
      # -----------------------------------------------------------------------------------------------------------
      # KnowledgeBase configuration
      # The KnowledgeBase is a responsible for storing and indexing the user's documents
      # -----------------------------------------------------------------------------------------------------------
      params:
        default_top_k: 5                # The default number of document chunks to retrieve for each query
      chunker:
        # --------------------------------------------------------------------------
        # Configuration for the Chunker subcomponent of the knowledge base.
        # The chunker is responsible for splitting documents' text into smaller chunks.
        # --------------------------------------------------------------------------
        type: MarkdownChunker           # Options: [MarkdownChunker, RecursiveCharacterChunker]
        params:
          chunk_size: 256               # The maximum number of tokens in each chunk
          chunk_overlap: 0              # The number of tokens to overlap between chunks
          keep_separator: true          # Whether to keep the separator in the chunks

      record_encoder:
        # --------------------------------------------------------------------------
        # Configuration for the RecordEncoder subcomponent of the knowledge base.
        # The record encoder is responsible for encoding document chunks to a vector representation
        # --------------------------------------------------------------------------
        type: OpenAIRecordEncoder       # Options: [OpenAIRecordEncoder, AzureOpenAIRecordEncoder]
        params:
          model_name:                   # The name of the model to use for encoding
            text-embedding-3-small
          batch_size: 400               # The number of document chunks to encode in each call to the encoding model
create_index_params:
  # -------------------------------------------------------------------------------------------
  # Initialization parameters to be passed to create a canopy index. These parameters will
  # be used when running "canopy new".
  # -------------------------------------------------------------------------------------------
  metric: cosine
  spec:
    serverless:
      cloud: aws
      region: us-east-1
    # For pod indexes you can pass the spec with the key "pod" instead of "serverless"
    # See the example below:
#    pod:
#      environment: eu-west1-gcp
#      pod_type: p1.x1
#      # Here you can specify here replicas, shards, pods, metadata_config if needed.

Hope this helps.