Sending generated audio through Twilio

Learn how to integrate generated speech into phone calls with Twilio.

How-to guide ยท Assumes you have completed the ElevenAPI quickstart and have a Twilio account.

In this guide, youโ€™ll learn how to send an AI generated message through a phone call using Twilio and ElevenLabs. This process allows you to send high-quality voice messages directly to your callers.

Create accounts with Twilio and ngrok

Weโ€™ll be using Twilio and ngrok for this guide, so go ahead and create accounts with them.

Create the server with Express

Initialize your project

Create a new folder for your project

mkdir elevenlabs-twilio
cd elevenlabs-twilio
npm init -y

Install dependencies

npm install @elevenlabs/elevenlabs-js express express-ws twilio

Install dev dependencies

npm i @types/node @types/express @types/express-ws @types/ws dotenv tsx typescript

Create your files

1// src/app.ts
2import { ElevenLabsClient } from '@elevenlabs/elevenlabs-js';
3import 'dotenv/config';
4import express, { Response } from 'express';
5import ExpressWs from 'express-ws';
6import { Readable } from 'stream';
7import VoiceResponse from 'twilio/lib/twiml/VoiceResponse';
8import { type WebSocket } from 'ws';
9
10const app = ExpressWs(express()).app;
11const PORT: number = parseInt(process.env.PORT || '5000');
12
13const elevenlabs = new ElevenLabsClient();
14const voiceId = 'aMSt68OGf4xUZAnLpTU8';
15const outputFormat = 'ulaw_8000';
16const text = 'This is a test. You can now hang up. Thank you.';
17
18function startApp() {
19 app.post('/call/incoming', (_, res: Response) => {
20 const twiml = new VoiceResponse();
21
22 twiml.connect().stream({
23 url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
24 });
25
26 res.writeHead(200, { 'Content-Type': 'text/xml' });
27 res.end(twiml.toString());
28 });
29
30 app.ws('/call/connection', (ws: WebSocket) => {
31 ws.on('message', async (data: string) => {
32 const message: {
33 event: string;
34 start?: { streamSid: string; callSid: string };
35 } = JSON.parse(data);
36
37 if (message.event === 'start' && message.start) {
38 const streamSid = message.start.streamSid;
39 const response = await elevenlabs.textToSpeech.convert(voiceId, {
40 modelId: 'eleven_flash_v2_5',
41 outputFormat: outputFormat,
42 text,
43 });
44
45 const readableStream = Readable.from(response);
46 const audioArrayBuffer = await streamToArrayBuffer(readableStream);
47
48 ws.send(
49 JSON.stringify({
50 streamSid,
51 event: 'media',
52 media: {
53 payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
54 },
55 })
56 );
57 }
58 });
59
60 ws.on('error', console.error);
61 });
62
63 app.listen(PORT, () => {
64 console.log(`Local: http://localhost:${PORT}`);
65 console.log(`Remote: https://${process.env.SERVER_DOMAIN}`);
66 });
67}
68
69function streamToArrayBuffer(readableStream: Readable) {
70 return new Promise((resolve, reject) => {
71 const chunks: Buffer[] = [];
72
73 readableStream.on('data', (chunk) => {
74 chunks.push(chunk);
75 });
76
77 readableStream.on('end', () => {
78 resolve(Buffer.concat(chunks).buffer);
79 });
80
81 readableStream.on('error', reject);
82 });
83}
84
85startApp();
1# .env
2SERVER_DOMAIN=
3ELEVENLABS_API_KEY=

Understanding the code

Handling the incoming call

When you call your number, Twilio makes a POST request to your endpoint at /call/incoming. We then use twiml.connect to tell Twilio that we want to handle the call via our websocket by setting the url to our /call/connection endpoint.

1function startApp() {
2 app.post('/call/incoming', (_, res: Response) => {
3 const twiml = new VoiceResponse();
4
5 twiml.connect().stream({
6 url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
7 });
8
9 res.writeHead(200, { 'Content-Type': 'text/xml' });
10 res.end(twiml.toString());
11 });

Creating the text to speech

Here we listen for messages that Twilio sends to our websocket endpoint. When we receive a start message event, we generate audio using the ElevenLabs TypeScript SDK.

1 app.ws('/call/connection', (ws: WebSocket) => {
2 ws.on('message', async (data: string) => {
3 const message: {
4 event: string;
5 start?: { streamSid: string; callSid: string };
6 } = JSON.parse(data);
7
8 if (message.event === 'start' && message.start) {
9 const streamSid = message.start.streamSid;
10 const response = await elevenlabs.textToSpeech.convert(voiceId, {
11 modelId: 'eleven_flash_v2_5',
12 outputFormat: outputFormat,
13 text,
14 });

Sending the message

Upon receiving the audio back from ElevenLabs, we convert it to an array buffer and send the audio to Twilio via the websocket.

1const readableStream = Readable.from(response);
2const audioArrayBuffer = await streamToArrayBuffer(readableStream);
3
4ws.send(
5 JSON.stringify({
6 streamSid,
7 event: 'media',
8 media: {
9 payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
10 },
11 })
12);

Point ngrok to your application

Twilio requires a publicly accessible URL. Weโ€™ll use ngrok to forward the local port of our application and expose it as a public URL.

Run the following command in your terminal:

ngrok http 5000

Copy the ngrok domain (without https://) to use in your environment variables.

Update your environment variables

Update the .env file with your ngrok domain and ElevenLabs API key.

# .env
SERVER_DOMAIN=*******.ngrok.app
ELEVENLABS_API_KEY=*************************

Start the application

Run the following command to start the app:

npm run dev

Set up Twilio

Follow Twilioโ€™s guides to create a new number. Once youโ€™ve created your number, navigate to the โ€œConfigureโ€ tab in Phone Numbers -> Manage -> Active numbers

In the โ€œA call comes inโ€ section, enter the full URL to your application (make sure to add the/call/incoming path):

E.g. https://***ngrok.app/call/incoming

Make a phone call

Make a call to your number. You should hear a message using the ElevenLabs voice.

Tips for deploying to production

When running the application in production, make sure to set the SERVER_DOMAIN environment variable to that of your server. Be sure to also update the URL in Twilio to point to your production server.

Conclusion

You should now have a basic understanding of integrating Twilio with ElevenLabs voices. If you have any further questions, or suggestions on how to improve this blog post, please feel free to select the โ€œSuggest editsโ€ or โ€œRaise issueโ€ button below.