Scribe comparison to OpenAI’s 4o Speech to Text model

One month after its launch, Scribe keeps proving it’s the most advanced speech to text model in the industry.

In just one month since launch, our speech to text model Scribe has attracted thousands of companies with its industry leading accuracy. From media captions to call centers and medical transcriptions, Scribe has quickly become the model of choice for developers.

Industry leading performance

Multiple third party analysis confirmed our own accuracy benchmarks with Scribe outperforming all models, including the new 4o transcribe models from OpenAI. For instance, a benchmark from Artificial Analysis reveals Scribe outperforms both 4o and 4o mini transcript on Word Error Rate, on average:

  • 4o transcribe makes 16% more errors than Scribe
  • 4o mini transcribe makes 71% more errors than Scribe
Third party speech to text benchmark from Artificial Analysis
Third party speech to text benchmark from Artificial Analysis shows Scribe is the best model

Scribe also outperforms or is on par with 4o & 4o mini transcription models in OpenAI’s own launch benchmark, for 11 of the 15 languages they tested. Looking at Japanese and Hindi for instance, Scribe is better than both OpenAI’s 4o models by significant margins, in OpenAI’s own benchmarks:

  • Japanese sees OpenAI’s 4o speech to text model making 55% more errors than Scribe, and 105% more errors for their 4o mini model
  • Hindi sees OpenAI’s 4o speech to text model makes 18% more errors than Scribe, and 37% more errors for their 4o mini model

We made decisions with Scribe to be as useful as possible for customers, even if it can create inconsistencies in industry benchmarks. For example:

  1. Scribe captures numbers as “one” “two” “three” which is more useful for transcripts, however the FLEURS benchmark uses the actual numbers “1”, “2”, “3”, thus creating errors
  2. Scribe is able to detect words like “hum” “ha” “hey”, another useful feature for customers to get more context, but these words aren’t part of the benchmarks, again creating artificial errors 

This is why it’s helpful to look at final results when thinking about performance. For instance, in English, OpenAI’s 4o Speech to Text model has a similar performance as Scribe in benchmarks. However, comparing English transcripts really puts a spotlight on Scribe’s advanced capabilities.

Transcript comparison

In this transcript analysis of a UK parliamentary hearing, you can see how Scribe makes no mistake while properly capturing accents, different voice tones and correctly labelling background noise and laughter.

ElevenLabs’s Scribe (Time taken to create the transcript: 4.66s)

Can I ask the honorable gentleman what work is being done to make sure this place is more accessible, particularly for some of our colleagues who have a disability? Hear, hear. (crowd murmuring) I'm sorry, it must be something to do with my Antipodean background. Could he please repeat the question, because I didn't follow it? (crowd laughing) Wow. Oh, wow. Very popular today. Um, I- I was saying that- that a number of parliamentary colleagues who have disabilities do find it quite difficult getting around certain parts of the estate. Given that we're doing this refurbishment work, what can be done to make sure that those with a disability are able to move around more freely and the place is accessible? Mr. Paul. (crowd laughing) I'm really sorry. Please could he do it very slowly in Antipodean English? Thank you. Just give any old answer. I- I think the answer... I think the answer might be helped if you can reply in writing when you read, Mr. Speaker. Right, Chris Elmore. (laughs) Thank you, Mr. Deputy Speaker, I- I'll try it on the first go. (crowd murmuring) Oh, no. You're Welsh. Can I- can I- can I- 'cause I'm Welsh, so God help him.

OpenAI’s 4o (Time taken to create the transcript: 5.01s)

Can I ask the hon. Gentleman what work is being done to ensure that this place is more accessible, particularly for some of our colleagues who have a disability? Sorry, it must be something to do with my Antipodean background. Could he please repeat the question, because I didn't follow it? Well, very popular today. I'm seeing that a number of parliamentary colleagues who have disabilities do find it quite difficult getting around certain parts of the estate. Given that we're doing this refurbishment work, what can be done to ensure that those with a disability are able to move around more freely and the place is accessible? I'm really sorry. Please, could you do it very slowly in Antipodean English? I think the answer might be helped if you can reply in writing when you'll read it. Thank you, Mr Deputy Speaker. I'll try it on the first go. Because I'm Welsh, so God help him.

Accessibility with Suttering transcription

With each advancement in AI, an often-overlooked group stands to benefit immensely: people who stutter. Stuttering, a genetic speech disorder that affects ~1% of the population, poses unique challenges for automatic speech recognition (ASR) systems. In a study with test samples where suttering occured in nearly one in four words, Scribe performance is particularly impressive with 98.7% accuracy on average. This yet again proves Scribe leads the industry, and provides a model tailored for all enterprise needs.

Solutions for enterprise 

Scribe’s performance comes to life with its feature set tailored to solving the needs of enterprise customers. 

  • Precise World level timestamps unlocks tremendous value for Creators, Media & Entertainment, turning your transcripts into captions, searchable entries and precise translations 
  • Smart Speaker diarization allows you to summarize meetings, sales pitches or customer support calls to get the most precise and actionable insights and increase collaboration and training between your team 
  • Dynamic audio tagging gives you more content and context from your audio input to enable sentiment analysis for example 
  • Support for 99 languages, easily reach the world with a single integration
  • All of these features are available in our API, letting developers build without compromises
  • A real time streaming version of Scribe, together with a low latency one, are also planned for the coming weeks. This will cement Scribe as the most advanced Speech to Text model ever created, covering all of your business use cases, and giving you more choice and flexibility between speed, price and accuracy.

Get started today

Try Scribe today, our web product is free until April 9th. Scribe pricing is very competitive, starting at $0.22/hour for enterprise customers. Feel free to contact our sales team, we’ll be happy to setup a demo and show you how we can help your business.

Explore more

ElevenLabs

Create with the highest quality AI Audio

Get started free

Already have an account? Log in