Text to speech
- Opus format support: Added support for Opus format with 48kHz sample rate across multiple bitrates (32-192 kbps).
- Improved websocket error handling: Updated TTS websocket API to return more accurate error codes (1011 for internal errors instead of 1008) for better error identification and SLA monitoring.
Conversational AI
- Twilio outbound: Added ability to natively run outbound calls.
- Post-call webhook override: Added ability to override post-call webhook settings at the agent level, providing more flexible configurations.
- Large knowledge base document viewing: Enhanced the knowledge base interface to allow viewing the entire content of large RAG documents.
- Added call SID dynamic variable: Added
system__call_sid
as a system dynamic variable to allow referencing the call ID in prompts and tools.
Studio
- Actor Mode: Added Actor Mode in Studio, allowing you to use your own voice recordings to direct the way speech should sound in Studio projects.
- Improved keyboard shortcuts: Updated keyboard shortcuts for viewing settings and editor shortcuts to avoid conflicts and simplified shortcuts for locking paragraphs.
Dubbing
- Dubbing duplication: Made dubbing duplication feature available to all users.
- Manual mode foreground generation: Added ability to generate foreground audio when using manual mode with a file and CSV.
Voices
- Enhanced voice collections: Improved voice collections with visual upgrades, language-based filtering, navigation breadcrumbs, collection images, and mouse dragging for carousel navigation.
- Locale filtering: Added locale parameter to shared voices endpoint for more precise voice filtering.
API
View API changes
Updated Endpoints
Text to Speech
- Updated Text to Speech endpoints:
- Convert text to speech - Added
apply_language_text_normalization
parameter for improved text pronunciation in supported languages (currently Japanese) - Stream text to speech - Added
apply_language_text_normalization
- Convert with timestamps - Added
apply_language_text_normalization
- Stream with timestamps - Added
apply_language_text_normalization
- Convert text to speech - Added
Audio Format
- Added Opus format support to multiple endpoints:
- Text to speech - Added support for Opus format with 48kHz sample rate at multiple bitrates (32, 64, 96, 128, 192 kbps)
- Stream text to speech - Added Opus format options
- Convert with timestamps - Added Opus format options
- Stream with timestamps - Added Opus format options
- Speech to speech - Added Opus format options
- Stream speech to speech - Added Opus format options
- Create voice previews - Added Opus format options
- Sound generation - Added Opus format options
Conversational AI
- Updated Conversational AI endpoints:
- Delete agent - Changed success response code from 200 to 204
- Updated RAG embedding model options - replaced
gte_Qwen2_15B_instruct
withmultilingual_e5_large_instruct
Voices
- Updated Voice endpoints:
- Get shared voices - Added locale parameter for filtering voices by language region
Dubbing
- Updated Dubbing endpoint:
- Dub a video or audio file - Renamed beta feature
use_replacement_voices_from_library
parameter todisable_voice_cloning
for clarity
- Dub a video or audio file - Renamed beta feature