Large Language Models (LLMs)
Understand the available LLMs for your conversational AI agents, their capabilities, and pricing.
Overview
Our conversational AI platform supports a variety of cutting-edge Large Language Models (LLMs) to power your voice agents. Choosing the right LLM depends on your specific needs, balancing factors like performance, context window size, features, and cost. This document provides details on the supported models and their associated pricing.
The selection of an LLM is a critical step in configuring your conversational AI agent, directly impacting its conversational abilities, knowledge depth, and operational cost.
Supported LLMs
We offer models from leading providers such as OpenAI, Google, and Anthropic, as well as the option to integrate your own custom LLM for maximum flexibility.
Pricing is typically denoted in USD per 1 million tokens unless specified otherwise. A token is a fundamental unit of text data for LLMs, roughly equivalent to 4 characters on average.
Gemini
Google’s Gemini models offer a balance of performance, large context windows, and competitive pricing, with the lowest latency.
OpenAI
OpenAI models are known for their strong general-purpose capabilities and wide range of options.
Anthropic
Anthropic’s Claude models are designed with a focus on helpfulness, honesty, and harmlessness, often featuring large context windows.
Choosing an LLM
Selecting the most suitable LLM for your application involves considering several factors:
- Task Complexity: More demanding or nuanced tasks generally benefit from more powerful models (e.g., OpenAI’s GPT-4 series, Anthropic’s Claude 3 Sonnet, Google’s Gemini 2.5 models).
- Latency Requirements: For applications requiring real-time or near real-time responses, such as live voice conversations, models optimized for speed are preferable (e.g., Google’s Gemini Flash series, Anthropic’s Claude Haiku, OpenAI’s GPT-4o-mini).
- Context Window Size: If your application needs to process, understand, or recall information from long conversations or extensive documents, select models with larger context windows.
- Cost-Effectiveness: Balance the desired performance and features against your budget. LLM prices can vary significantly, so analyze the pricing structure (input, output, and cache tokens) in relation to your expected usage patterns.
- HIPAA Compliance: If your application involves Protected Health Information (PHI), it is crucial to use an LLM that is designated as HIPAA compliant and ensure your entire data handling process meets regulatory standards.
HIPAA Compliance
Certain LLMs available on our platform may be suitable for use in environments requiring HIPAA compliance, please see the HIPAA compliance docs for more details
Understanding LLM Pricing
- Tokens: LLM usage is typically billed based on the number of tokens processed. As a general guideline for English text, 100 tokens is approximately equivalent to 75 words.
- Input vs. Output Pricing: Providers often differentiate pricing for input tokens (the data you send to the model) and output tokens (the data the model generates in response).
- Cache Pricing:
input_cache_read
: This refers to the cost associated with retrieving previously processed input data from a cache. Utilizing cached data can lead to cost savings if identical inputs are processed multiple times.input_cache_write
: This is the cost associated with storing input data into a cache. Some LLM providers may charge for this operation.
- The prices listed in this document are per 1 million tokens and are based on the information available at the time of writing. These prices are subject to change by the LLM providers.
For the most accurate and current information on model capabilities, pricing, and terms of service, always consult the official documentation from the respective LLM providers (OpenAI, Google, Anthropic, xAI).