Models
Learn how to choose the right model for your use-case
Learn how to choose the right model for your use-case
ElevenAgents provides a unified interface to connect your agent to multiple models and providers, offering flexibility, reliability, and cost optimization.
Currently, the following models are natively supported and can be configured via the agent settings:
Pricing is typically denoted in USD per 1 million tokens unless specified otherwise. A token is a fundamental unit of text data for LLMs, roughly equivalent to 4 characters on average.
Using your own custom LLM is supported by specifying the endpoint we should make requests to and providing credentials through our secure secret storage. Learn more about custom LLM integration.
With EU data residency enabled, a small number of older Gemini and Claude LLMs are not available in ElevenLabs Agents to maintain compliance with EU data residency. Custom LLMs and OpenAI LLMs remain fully available. For more information please see GDPR and data residency.
Selecting the most suitable LLM for your application involves considering several factors:
The maximum system prompt size is 2MB, which includes your agent’s instructions, knowledge base content, and other system-level context.
Temperature controls the randomness of model responses. Lower values produce more consistent, focused outputs while higher values increase creativity and variation.
Configure backup LLMs to ensure conversation continuity when the primary LLM fails or becomes unavailable.
Configuration options:
Disabling backup LLMs means conversations will end abruptly if your primary LLM fails or becomes unavailable. This is strongly discouraged for production use.
Learn more about LLM cascading.
Control how many internal reasoning tokens the model can use before responding. More tokens improve answer quality but slow down response time.
Options:
Some models support configurable reasoning effort levels (None, Low, Medium, High).
For conversational use-cases:
Keep reasoning effort set to None to avoid the agent thinking too long, which can disrupt natural conversation flow.
For workflow steps:
Reasoning effort is perfect for workflow steps that require complex thought or decision-making where response time is less critical.
input_cache_read: This refers to the cost associated with retrieving previously processed input data from a cache. Utilizing cached data can lead to cost savings if identical inputs are processed multiple timesinput_cache_write: This is the cost associated with storing input data into a cache. Some LLM providers may charge for this operationFor the most accurate and current information on model capabilities, pricing, and terms of service, always consult the official documentation from the respective LLM providers (OpenAI, Google, Anthropic).
Certain LLMs available on our platform may be suitable for use in environments requiring HIPAA compliance, please see the HIPAA compliance docs for more details.