JavaScript SDK

Scribe: real-time speech-to-text transcription in JavaScript

For an overview of Scribe and its capabilities, see the Speech to Text overview. For step-by-step usage guides, see Client-side streaming.

Installation

$npm install @elevenlabs/client
$# or
$yarn add @elevenlabs/client
$# or
$pnpm install @elevenlabs/client

Use the ElevenLabs speech-to-text skill to transcribe audio from your AI coding assistant:

$npx skills add elevenlabs/skills --skill speech-to-text

This library can be used in any JavaScript-based project. If you are using React, consider the useScribe hook which provides built-in state management and lifecycle handling.

Usage

Here is a minimal working example that connects to Scribe and logs transcription results:

1import { Scribe, RealtimeEvents } from '@elevenlabs/client';
2
3const token = await fetchTokenFromServer();
4
5const connection = Scribe.connect({
6 token,
7 modelId: 'scribe_v2_realtime',
8 microphone: {
9 echoCancellation: true,
10 noiseSuppression: true,
11 },
12});
13
14connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, (data) => {
15 console.log('Partial:', data.text);
16});
17
18connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
19 console.log('Committed:', data.text);
20});
21
22// Later, close the connection
23connection.close();

Getting a token

Scribe requires a single-use token for authentication. Create an API endpoint on your server:

1// Node.js server
2app.get('/scribe-token', yourAuthMiddleware, async (req, res) => {
3 const response = await fetch('https://api.elevenlabs.io/v1/single-use-token/realtime_scribe', {
4 method: 'POST',
5 headers: {
6 'xi-api-key': process.env.ELEVENLABS_API_KEY,
7 },
8 });
9
10 const data = await response.json();
11 res.json({ token: data.token });
12});

Your ElevenLabs API key is sensitive. Never expose it to the client. Always generate the token on the server.

1// Client
2const fetchToken = async () => {
3 const response = await fetch('/scribe-token');
4 const { token } = await response.json();
5 return token;
6};

Connection options

Scribe.connect() accepts either microphone options or manual audio options. Both share a common set of base options.

Base options

PropertyTypeDefaultDescription
tokenstringSingle-use token for WebSocket authentication.
modelIdstringModel ID (e.g., "scribe_v2_realtime").
baseUristring"wss://api.elevenlabs.io"Custom WebSocket base URI.
commitStrategyCommitStrategy"manual""manual" or "vad".
vadSilenceThresholdSecsnumber1.5Seconds of silence before VAD commits (0.3-3.0).
vadThresholdnumber0.4VAD sensitivity (0.1-0.9, lower is more sensitive).
minSpeechDurationMsnumber100Minimum speech duration in ms (50-2000).
minSilenceDurationMsnumber100Minimum silence duration in ms (50-2000).
languageCodestringISO-639-1 or ISO-639-3 language code. Leave empty for auto-detection.
includeTimestampsbooleanfalseReceive word-level timestamps via the COMMITTED_TRANSCRIPT_WITH_TIMESTAMPS event.

Microphone options

Pass a microphone object to stream audio directly from the user’s microphone. The connection handles getUserMedia and audio encoding automatically.

1const connection = Scribe.connect({
2 token,
3 modelId: 'scribe_v2_realtime',
4 microphone: {
5 deviceId: 'optional-device-id',
6 echoCancellation: true,
7 noiseSuppression: true,
8 autoGainControl: true,
9 },
10});
PropertyTypeDescription
deviceIdstringSpecific microphone device ID.
echoCancellationbooleanEnable echo cancellation.
noiseSuppressionbooleanEnable noise suppression.
autoGainControlbooleanEnable automatic gain control.

Manual audio options

Pass audioFormat and sampleRate to send audio data manually via connection.send().

1import { AudioFormat } from '@elevenlabs/client';
2
3const connection = Scribe.connect({
4 token,
5 modelId: 'scribe_v2_realtime',
6 audioFormat: AudioFormat.PCM_16000,
7 sampleRate: 16000,
8});
PropertyTypeDescription
audioFormatAudioFormatAudio encoding format (e.g., AudioFormat.PCM_16000).
sampleRatenumberSample rate in Hz. Must match audioFormat.

AudioFormat enum

1enum AudioFormat {
2 PCM_8000 = 'pcm_8000',
3 PCM_16000 = 'pcm_16000',
4 PCM_22050 = 'pcm_22050',
5 PCM_24000 = 'pcm_24000',
6 PCM_44100 = 'pcm_44100',
7 PCM_48000 = 'pcm_48000',
8 ULAW_8000 = 'ulaw_8000',
9}

Microphone mode

Stream audio directly from the user’s microphone:

1import { Scribe, RealtimeEvents } from '@elevenlabs/client';
2
3async function transcribeFromMicrophone() {
4 const token = await fetchToken();
5
6 const connection = Scribe.connect({
7 token,
8 modelId: 'scribe_v2_realtime',
9 microphone: {
10 echoCancellation: true,
11 noiseSuppression: true,
12 autoGainControl: true,
13 },
14 });
15
16 connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, (data) => {
17 document.getElementById('live').textContent = data.text;
18 });
19
20 connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
21 const el = document.createElement('p');
22 el.textContent = data.text;
23 document.getElementById('transcripts').appendChild(el);
24 document.getElementById('live').textContent = '';
25 });
26
27 document.getElementById('stop').addEventListener('click', () => {
28 connection.close();
29 });
30}

Manual audio mode (file transcription)

Transcribe pre-recorded audio files by sending audio data manually:

1import { Scribe, RealtimeEvents, AudioFormat } from '@elevenlabs/client';
2
3async function transcribeFile(file) {
4 const token = await fetchToken();
5
6 const connection = Scribe.connect({
7 token,
8 modelId: 'scribe_v2_realtime',
9 audioFormat: AudioFormat.PCM_16000,
10 sampleRate: 16000,
11 });
12
13 connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
14 console.log('Transcript:', data.text);
15 });
16
17 // Decode audio file
18 const arrayBuffer = await file.arrayBuffer();
19 const audioContext = new AudioContext({ sampleRate: 16000 });
20 const audioBuffer = await audioContext.decodeAudioData(arrayBuffer);
21
22 // Convert to PCM16
23 const channelData = audioBuffer.getChannelData(0);
24 const pcmData = new Int16Array(channelData.length);
25
26 for (let i = 0; i < channelData.length; i++) {
27 const sample = Math.max(-1, Math.min(1, channelData[i]));
28 pcmData[i] = sample < 0 ? sample * 32768 : sample * 32767;
29 }
30
31 // Send in chunks
32 const chunkSize = 4096;
33 for (let offset = 0; offset < pcmData.length; offset += chunkSize) {
34 const chunk = pcmData.slice(offset, offset + chunkSize);
35 const bytes = new Uint8Array(chunk.buffer);
36 const base64 = btoa(String.fromCharCode(...bytes));
37
38 connection.send({ audioBase64: base64 });
39 await new Promise((resolve) => setTimeout(resolve, 50));
40 }
41
42 // Commit and close
43 connection.commit();
44}

RealtimeConnection

Scribe.connect() returns a RealtimeConnection instance with the following methods.

on(event, listener)

Register an event listener. See Events for available event types.

1connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
2 console.log('Committed:', data.text);
3});

off(event, listener)

Remove a previously registered event listener.

1const handler = (data) => console.log(data.text);
2connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, handler);
3
4// Later
5connection.off(RealtimeEvents.COMMITTED_TRANSCRIPT, handler);

send(data)

Send audio data to Scribe (manual audio mode only).

1connection.send({
2 audioBase64: base64AudioChunk,
3 commit: false, // Optional: commit immediately
4 sampleRate: 16000, // Optional: override sample rate
5 previousText: 'Previous transcription text', // Optional: context from a previous transcription
6});

The previousText field can only be sent in the first audio chunk of a session. Sending it in subsequent chunks results in an error.

commit()

Manually commit the current transcription. Only needed when using CommitStrategy.MANUAL.

1connection.commit();

close()

Close the WebSocket connection and clean up resources (microphone stream, audio context).

1connection.close();

Events

Register event listeners using connection.on(event, listener). All events are available as constants on the RealtimeEvents enum.

Transcription events

EventDataDescription
SESSION_STARTED{ session_id: string }Scribe session started.
PARTIAL_TRANSCRIPT{ text: string }Interim transcription result.
COMMITTED_TRANSCRIPT{ text: string }Finalized transcription result.
COMMITTED_TRANSCRIPT_WITH_TIMESTAMPS{ text: string; language_code?: string; words?: WordsItem[] }Finalized result with word-level timing.

The WordsItem type contains word-level timing information:

1interface WordsItem {
2 text?: string; // Word text
3 start?: number; // Start time in seconds
4 end?: number; // End time in seconds
5 type?: 'word' | 'spacing'; // Token type
6 speaker_id?: string; // Speaker identifier
7}

Connection events

EventDataDescription
OPENEventWebSocket connection opened.
CLOSEEventWebSocket connection closed.
ERRORError | EventGeneric error.

Error events

All error events receive { error: string }.

EventDescription
AUTH_ERRORAuthentication error.
QUOTA_EXCEEDEDUsage quota exceeded.
COMMIT_THROTTLEDCommit request throttled.
TRANSCRIBER_ERRORTranscription engine error.
UNACCEPTED_TERMSTerms of service not accepted.
RATE_LIMITEDRate limited.
INPUT_ERRORInvalid input format.
QUEUE_OVERFLOWProcessing queue full.
RESOURCE_EXHAUSTEDServer resources at capacity.
SESSION_TIME_LIMIT_EXCEEDEDMaximum session time reached.
CHUNK_SIZE_EXCEEDEDAudio chunk too large.
INSUFFICIENT_AUDIO_ACTIVITYNot enough audio activity to maintain the connection.

Commit strategies

Control when transcriptions are committed:

1import { Scribe, CommitStrategy } from '@elevenlabs/client';
2
3// Manual (default): you control when to commit
4const connection = Scribe.connect({
5 token,
6 modelId: 'scribe_v2_realtime',
7 audioFormat: AudioFormat.PCM_16000,
8 sampleRate: 16000,
9 commitStrategy: CommitStrategy.MANUAL,
10});
11
12// Send audio, then commit when ready
13connection.send({ audioBase64: chunk });
14connection.commit();
15
16// Voice Activity Detection: Scribe detects silences and commits automatically
17const connection = Scribe.connect({
18 token,
19 modelId: 'scribe_v2_realtime',
20 microphone: { echoCancellation: true },
21 commitStrategy: CommitStrategy.VAD,
22});

For more details, see Transcripts and commit strategies.

Complete example

Here is a complete example that transcribes microphone audio with VAD-based commit strategy:

1import { Scribe, RealtimeEvents, CommitStrategy } from '@elevenlabs/client';
2
3async function startTranscription() {
4 const token = await fetchToken();
5
6 const connection = Scribe.connect({
7 token,
8 modelId: 'scribe_v2_realtime',
9 commitStrategy: CommitStrategy.VAD,
10 microphone: {
11 echoCancellation: true,
12 noiseSuppression: true,
13 },
14 });
15
16 connection.on(RealtimeEvents.SESSION_STARTED, (data) => {
17 console.log('Session started:', data.session_id);
18 });
19
20 connection.on(RealtimeEvents.PARTIAL_TRANSCRIPT, (data) => {
21 document.getElementById('live').textContent = data.text;
22 });
23
24 connection.on(RealtimeEvents.COMMITTED_TRANSCRIPT, (data) => {
25 const el = document.createElement('p');
26 el.textContent = data.text;
27 document.getElementById('transcripts').appendChild(el);
28 document.getElementById('live').textContent = '';
29 });
30
31 connection.on(RealtimeEvents.ERROR, (error) => {
32 console.error('Scribe error:', error);
33 });
34
35 // Stop button
36 document.getElementById('stop').addEventListener('click', () => {
37 connection.close();
38 });
39}
40
41document.getElementById('start').addEventListener('click', startTranscription);