This guide will walk you through creating a web application that implements ElevenLabs Conversational AI using the @11labs/client SDK. You’ll build a simple interface that allows users to have voice conversations with an AI agent.

Prerequisites

  • Node.js installed on your system
  • Basic knowledge of JavaScript/HTML/CSS
  • An API key & Agent ID from ElevenLabs platform

Project Setup

  1. Create a new directory for your project:
mkdir elevenlabs-convai-demo
cd elevenlabs-convai-demo
  1. Initialize a new Node.js project:
npm init -y
  1. Install the required dependencies:
npm install @11labs/client cors dotenv express
  1. Install development dependencies:
npm install --save-dev webpack webpack-cli webpack-dev-server copy-webpack-plugin concurrently

Project Structure

Create the following directory structure:

elevenlabs-convai-demo/
│── src/
│   ├── app.js
│   ├── index.html
│   └── styles.css
│── backend/
│   └── server.js
│── webpack.config.js
│── package.json
│── .env

Environment Configuration

  1. You’ll find .env.example in the github project root directory:
XI_API_KEY=your_elevenlabs_api_key
AGENT_ID=your_agent_id
PORT=3000
  1. Create your .env file:
    • Copy .env.example to .env
    • Replace the placeholder values with your actual credentials
    • Make sure .env is listed in your .gitignore to keep your credentials secure

⚠️ Security Note: Never commit your .env file or share your API credentials. Always keep them private and secure.

  1. Set up webpack.config.js:
const path = require('path');
const CopyPlugin = require('copy-webpack-plugin');

module.exports = {
    entry: './src/app.js',
    output: {
        filename: 'bundle.js',
        path: path.resolve(__dirname, 'dist'),
        publicPath: '/'
    },
    devServer: {
        static: {
            directory: path.join(__dirname, 'dist'),
        },
        port: 8080,
        proxy: {
            '/api': 'http://localhost:3000'
        },
        hot: true
    },
    plugins: [
        new CopyPlugin({
            patterns: [
                { from: 'src/index.html', to: 'index.html' },
                { from: 'src/styles.css', to: 'styles.css' }
            ],
        }),
    ]
};

Frontend Implementation

Let’s build our user interface step by step.

Basic UI Structure

First, we create a simple interface in index.html with these key components:

  • Status indicators for connection and speaking states
  • Start/End conversation buttons

Here’s the basic HTML structure:

<div class="status-container">
    <div id="connectionStatus" class="status">Disconnected</div>
    <div id="speakingStatus" class="speaking-status">Agent Silent</div>
</div>

<div class="controls">
    <button id="startButton" class="button">Start Conversation</button>
    <button id="endButton" class="button" disabled>End Conversation</button>
    
</div>

Without any styling, the interface looks basic but functional:

Adding Visual Polish

Let’s enhance the UI with CSS to make it more user-friendly and visually appealing. We’re using raw CSS here, but you can use alternatives like Tailwind CSS or shadcn/ui based on your preference.

Key styling highlights:

This is how it will look like after styling :

Core Application Logic

Now let’s build the application logic piece by piece in app.js:

  1. Microphone Permission Handler
async function requestMicrophonePermission() {
    try {
        await navigator.mediaDevices.getUserMedia({ audio: true });
        return true;
    } catch (error) {
        console.error('Microphone permission denied:', error);
        return false;
    }
}

This function requests microphone access from the user. We need this before starting any conversation to ensure we can capture the user’s voice.

  1. Signed URL Retrieval
async function getSignedUrl() {
    try {
        const response = await fetch('/api/signed-url');
        if (!response.ok) throw new Error('Failed to get signed URL');
        const data = await response.json();
        return data.signedUrl;
    } catch (error) {
        console.error('Error getting signed URL:', error);
        throw error;
    }
}

The signed URL contains a short lived token that can be used to start the conversation, without exposing your ElevenLabs API key to your end users.

  1. Status Management
function updateStatus(isConnected) {
    const statusElement = document.getElementById('connectionStatus');
    statusElement.textContent = isConnected ? 'Connected' : 'Disconnected';
    statusElement.classList.toggle('connected', isConnected);
}

function updateSpeakingStatus(mode) {
    const statusElement = document.getElementById('speakingStatus');
    const isSpeaking = mode.mode === 'speaking';
    statusElement.textContent = isSpeaking ? 'Agent Speaking' : 'Agent Silent';
    statusElement.classList.toggle('speaking', isSpeaking);
}

These functions handle the UI updates for connection and speaking states. They provide visual feedback to users about the current state of the conversation.

  1. Conversation Initialization
async function startConversation() {
    // First, check for microphone permission
    const hasPermission = await requestMicrophonePermission();
    if (!hasPermission) {
        alert('Microphone permission is required for the conversation.');
        return;
    }

    // Get the signed URL for authentication
    const signedUrl = await getSignedUrl();
    
    // Initialize the conversation with the ElevenLabs SDK
    conversation = await Conversation.startSession({
        signedUrl,
        onConnect: () => {
            // Handle successful connection
            updateStatus(true);
            toggleButtons(true);
        },
        onDisconnect: () => {
            // Handle disconnection
            updateStatus(false);
            toggleButtons(false);
            updateSpeakingStatus({ mode: 'listening' });
        },
        onError: (error) => {
            // Handle errors during conversation
            console.error('Conversation error:', error);
            alert('An error occurred during the conversation.');
        },
        onModeChange: (mode) => {
            // Update UI when agent switches between speaking and listening
            updateSpeakingStatus(mode);
        }
    });
}

This function orchestrates the conversation startup process.

  1. Conversation Control
async function endConversation() {
    if (conversation) {
        await conversation.endSession();
        conversation = null;
    }
}

A simple function to properly end the conversation session and clean up resources.

Backend Implementation

The backend server handles secure communication with the ElevenLabs API and serves your frontend application. Let’s build it step by step:

1. Setting Up Dependencies

First, we need to import our required packages:

const express = require('express');
const cors = require('cors');
const dotenv = require('dotenv');
const path = require('path');

dotenv.config();

These imports provide:

  • express: Web framework for handling HTTP requests
  • cors: Middleware for enabling Cross-Origin Resource Sharing
  • dotenv: Loads environment variables from .env file
  • path: Utilities for handling file paths

2. Configuring Express Server

const app = express();
app.use(cors());
app.use(express.json());
app.use(express.static(path.join(__dirname, '../dist')));

This configuration:

  • Creates an Express application
  • Enables CORS for development (allows frontend to communicate with backend)
  • Adds JSON parsing middleware for handling request bodies
  • Serves the frontend files from the ‘dist’ directory

3. Creating the Signed URL Endpoint

app.get('/api/signed-url', async (req, res) => {
    try {
        const response = await fetch(
            `https://api.elevenlabs.io/v1/convai/conversation/get_signed_url?agent_id=${process.env.AGENT_ID}`,
            {
                method: 'GET',
                headers: {
                    'xi-api-key': process.env.XI_API_KEY,
                }
            }
        );

        if (!response.ok) {
            throw new Error('Failed to get signed URL');
        }

        const data = await response.json();
        res.json({ signedUrl: data.signed_url });
    } catch (error) {
        console.error('Error:', error);
        res.status(500).json({ error: 'Failed to get signed URL' });
    }
});

The signed URL is essential because:

  • It provides secure, temporary access to the conversational API
  • Keeps your API key secure on the backend
  • Allows the frontend to connect without exposing sensitive credentials

4. Adding a Catch-all Route

app.get('*', (req, res) => {
    res.sendFile(path.join(__dirname, '../dist/index.html'));
});

This route:

  • Catches any unmatched routes
  • Serves the main index.html file
  • Enables client-side routing in your single-page application

5. Starting the Server

const PORT = process.env.PORT || 3000;
app.listen(PORT, () => {
    console.log(`Server running on port ${PORT}`);
});

This final step:

  • Sets the server port (from environment variables or defaults to 3000)
  • Starts the Express server
  • Logs a confirmation message when the server is running

Running the Application

  1. Add the following scripts to your package.json:
{
  "scripts": {
    "start:backend": "node backend/server.js",
    "build": "webpack --mode production",
    "dev": "concurrently \"npm run start:backend\" \"webpack serve --mode development\"",
    "start": "npm run build && npm run start:backend"
  }
}
  1. Start the development server:
npm run dev

Your application will now be running on:

Key Features

  1. Microphone Permission Handling: The application requests and verifies microphone access before starting a conversation.

  2. Connection Status Display: Visual indicators show whether the application is connected and if the AI agent is speaking.

  3. Error Handling: The application includes comprehensive error handling for both the frontend and backend.

Important Notes

  • Make sure to keep your API key secure and never expose it in the frontend code
  • The application requires HTTPS in production for microphone access
  • Always handle errors appropriately to provide a good user experience
  • Test microphone permissions and audio playback thoroughly

Troubleshooting

  1. Microphone Access Issues:

    • Ensure your browser has permission to access the microphone
    • Check if your microphone is properly connected and working
    • Try using HTTPS in production environments
  2. Connection Problems:

    • Verify your API key and Agent ID are correct
    • Check your internet connection
    • Look for any CORS-related errors in the console
  3. Audio Issues:

    • Ensure your browser’s audio output is working
    • Verify that no other applications are blocking audio output