

TL;DR
- AutoGen has expanded integrations with a variety of cloud-based model providers beyond OpenAI.
- Leverage models and platforms from Gemini, Anthropic, Mistral AI, Together.AI, and Groq for your AutoGen agents.
- Utilise models specifically for chat, language, image, and coding.
- LLM provider diversification can provide cost and resilience benefits.
api_type
and model
. We’ll demonstrate how to use them below.
The community is continuing to enhance and build new client classes as cloud-based inference providers arrive. So, watch this space, and feel free to discuss or develop another one.
Benefits of choice
The need to use only the best models to overcome workflow-breaking LLM inconsistency has diminished considerably over the last 12 months. These new classes provide access to the very largest trillion-parameter models from OpenAI, Google, and Anthropic, continuing to provide the most consistent and competent agent experiences. However, it’s worth trying smaller models from the likes of Meta, Mistral AI, Microsoft, Qwen, and many others. Perhaps they are capable enough for a task, or sub-task, or even better suited (such as a coding model)! Using smaller models will have cost benefits, but they also allow you to test models that you could run locally, allowing you to determine if you can remove cloud inference costs altogether or even run an AutoGen workflow offline. On the topic of cost, these client classes also include provider-specific token cost calculations so you can monitor the cost impact of your workflows. With costs per million tokens as low as 10 cents (and some are even free!), cost savings can be noticeable.Mix and match
How does Google’s Gemini 1.5 Pro model stack up against Anthropic’s Opus or Meta’s Llama 3? Now you have the ability to quickly change your agent configs and find out. If you want to run all three in the one workflow, AutoGen’s ability to associate specific configurations to each agent means you can select the best LLM for each agent.Capabilities
The common requirements of text generation and function/tool calling are supported by these client classes. Multi-modal support, such as for image/audio/video, is an area of active development. The Google Gemini client class can be used to create a multimodal agent.Tips
Here are some tips when working with these client classes:- Most to least capable - start with larger models and get your workflow working, then iteratively try smaller models.
- Right model - choose one that’s suited to your task, whether it’s coding, function calling, knowledge, or creative writing.
- Agent names - these cloud providers do not use the
name
field on a message, so be sure to use your agent’s name in theirsystem_message
anddescription
fields, as well as instructing the LLM to ‘act as’ them. This is particularly important for “auto” speaker selection in group chats as we need to guide the LLM to choose the next agent based on a name, so tweakselect_speaker_message_template
,select_speaker_prompt_template
, andselect_speaker_auto_multiple_template
with more guidance. - Context length - as your conversation gets longer, models need to support larger context lengths, be mindful of what the model supports and consider using Transform Messages to manage context size.
- Provider parameters - providers have parameters you can set such as temperature, maximum tokens, top-k, top-p, and safety. See each client class in AutoGen’s API Reference or documentation for details.
- Prompts - prompt engineering is critical in guiding smaller LLMs to do what you need. ConversableAgent, GroupChat, UserProxyAgent, and AssistantAgent all have customizable prompt attributes that you can tailor. Here are some prompting tips from Anthropic(+Library), Mistral AI, Together.AI, and Meta.
- Help! - reach out on the AutoGen Discord or log an issue if you need help with or can help improve these client classes.
Quickstart
Installation
Install the appropriate client based on the model you wish to use.Configuration Setup
Add your model configurations to theOAI_CONFIG_LIST
. Ensure you specify the api_type
to initialize the respective client (Anthropic, Mistral, or Together).
Usage
The[LLMConfig.from_json](https://docs.ag2.ai/latest/docs/api-reference/autogen/llm_config/LLMConfig)
method loads a list of configurations from an environment variable or a json file.
Construct Agents
Construct a simple conversation between a User proxy and an Assistant agentStart chat
Function Calls
Now, let’s look at how Anthropic’s Sonnet 3.5 is able to suggest multiple function calls in a single response. This example is a simple travel agent setup with an agent for function calling and a user proxy agent for executing the functions. One thing you’ll note here is Anthropic’s models are more verbose than OpenAI’s and will typically provide chain-of-thought or general verbiage when replying. Therefore we provide more explicit instructions tofunctionbot
to not reply with more than necessary. Even so, it can’t always help itself!
Let’s start with setting up our configuration and agents.
user_proxy
for execution and functionbot
for the LLM to consider using them.
summary_method
. Using summary_prompt
, we guide Sonnet to give us an email output.