How to send an AI message through a phone call using Twilio and ElevenLabs in Node.js

In this guide, you’ll learn how to send an AI generated message through a phone call using Twilio and ElevenLabs. This process allows you to send high-quality voice messages directly to your callers.

Create accounts with Twilio and ngrok

We’ll be using Twilio and ngrok for this guide, so go ahead and create accounts with them.

Get the code

If you want to get started quickly, you can get the entire code for this guide on GitHub

Create the server with Express

Initialize your project

Create a new folder for your project

mkdir elevenlabs-twilio
cd elevenlabs-twilio
npm init -y

Install dependencies

npm install elevenlabs express express-ws twilio

Install dev dependencies

npm i @types/node @types/express @types/express-ws @types/ws dotenv tsx typescript

Create your files

1// src/app.ts
2
3import 'dotenv/config';
4import express, { Response } from 'express';
5import ExpressWs from 'express-ws';
6import VoiceResponse from 'twilio/lib/twiml/VoiceResponse';
7import { ElevenLabsClient } from 'elevenlabs';
8import { type WebSocket } from 'ws';
9import { Readable } from 'stream';
10
11const app = ExpressWs(express()).app;
12const PORT: number = parseInt(process.env.PORT || '5000');
13
14const elevenlabs = new ElevenLabsClient();
15const voiceId = '21m00Tcm4TlvDq8ikWAM';
16const outputFormat = 'ulaw_8000';
17const text = 'This is a test. You can now hang up. Thank you.';
18
19function startApp() {
20 app.post('/call/incoming', (_, res: Response) => {
21 const twiml = new VoiceResponse();
22
23 twiml.connect().stream({
24 url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
25 });
26
27 res.writeHead(200, { 'Content-Type': 'text/xml' });
28 res.end(twiml.toString());
29 });
30
31 app.ws('/call/connection', (ws: WebSocket) => {
32 ws.on('message', async (data: string) => {
33 const message: {
34 event: string;
35 start?: { streamSid: string; callSid: string };
36 } = JSON.parse(data);
37
38 if (message.event === 'start' && message.start) {
39 const streamSid = message.start.streamSid;
40 const response = await elevenlabs.textToSpeech.convert(voiceId, {
41 model_id: 'eleven_flash_v2_5',
42 output_format: outputFormat,
43 text,
44 });
45
46 const readableStream = Readable.from(response);
47 const audioArrayBuffer = await streamToArrayBuffer(readableStream);
48
49 ws.send(
50 JSON.stringify({
51 streamSid,
52 event: 'media',
53 media: {
54 payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
55 },
56 }),
57 );
58 }
59 });
60
61 ws.on('error', console.error);
62 });
63
64 app.listen(PORT, () => {
65 console.log(`Local: http://localhost:${PORT}`);
66 console.log(`Remote: https://${process.env.SERVER_DOMAIN}`);
67 });
68}
69
70function streamToArrayBuffer(readableStream: Readable) {
71 return new Promise((resolve, reject) => {
72 const chunks: Buffer[] = [];
73
74 readableStream.on('data', (chunk) => {
75 chunks.push(chunk);
76 });
77
78 readableStream.on('end', () => {
79 resolve(Buffer.concat(chunks).buffer);
80 });
81
82 readableStream.on('error', reject);
83 });
84}
85
86startApp();
1# .env
2SERVER_DOMAIN=
3ELEVENLABS_API_KEY=

Understanding the code

Handling the incoming call

When you call your number, Twilio makes a POST request to your endpoint at /call/incoming. We then use twiml.connect to tell Twilio that we want to handle the call via our websocket by setting the url to our /call/connection endpoint.

1function startApp() {
2 app.post('/call/incoming', (_, res: Response) => {
3 const twiml = new VoiceResponse();
4
5 twiml.connect().stream({
6 url: `wss://${process.env.SERVER_DOMAIN}/call/connection`,
7 });
8
9 res.writeHead(200, { 'Content-Type': 'text/xml' });
10 res.end(twiml.toString());
11 });

Creating the text to speech

Here we listen for messages that Twilio sends to our websocket endpoint. When we receive a start message event, we generate audio using the ElevenLabs TypeScript SDK.

1 app.ws('/call/connection', (ws: WebSocket) => {
2 ws.on('message', async (data: string) => {
3 const message: {
4 event: string;
5 start?: { streamSid: string; callSid: string };
6 } = JSON.parse(data);
7
8 if (message.event === 'start' && message.start) {
9 const streamSid = message.start.streamSid;
10 const response = await elevenlabs.textToSpeech.convert(voiceId, {
11 model_id: 'eleven_flash_v2_5',
12 output_format: outputFormat,
13 text,
14 });

Sending the message

Upon receiving the audio back from ElevenLabs, we convert it to an array buffer and send the audio to Twilio via the websocket.

1const readableStream = Readable.from(response);
2const audioArrayBuffer = await streamToArrayBuffer(readableStream);
3
4ws.send(
5 JSON.stringify({
6 streamSid,
7 event: 'media',
8 media: {
9 payload: Buffer.from(audioArrayBuffer as any).toString('base64'),
10 },
11 }),
12);

Point ngrok to your application

Twilio requires a publicly accessible URL. We’ll use ngrok to forward the local port of our application and expose it as a public URL.

Run the following command in your terminal:

ngrok http 5000

Copy the ngrok domain (without https://) to use in your environment variables.

Update your environment variables

Update the .env file with your ngrok domain and ElevenLabs API key.

# .env
SERVER_DOMAIN=*******.ngrok.app
ELEVENLABS_API_KEY=*************************

Start the application

Run the following command to start the app:

npm run dev

Set up Twilio

Follow Twilio’s guides to create a new number. Once you’ve created your number, navigate to the “Configure” tab in Phone Numbers -> Manage -> Active numbers

In the “A call comes in” section, enter the full URL to your application (make sure to add the/call/incoming path):

E.g. https://*******ngrok.app/call/incoming

Make a phone call

Make a call to your number. You should hear a message using the ElevenLabs voice.

Tips for deploying to production

When running the application in production, make sure to set the SERVER_DOMAIN environment variable to that of your server. Be sure to also update the URL in Twilio to point to your production server.

Conclusion

You should now have a basic understanding of integrating Twilio with ElevenLabs voices. If you have any further questions, or suggestions on how to improve this blog post, please feel free to select the “Suggest edits” or “Raise issue” button below.

Built with