Apna scales 7.5 million AI interview minutes using ElevenLabs

Last updated Nov 7, 2025 • 5 minutes reading time

Ruta Bhatt, Growth,

Tauseef Khan, GTM

Building human-realistic mock interviews for millions of job seekers across India

Contact Sales

Interview preparation in India has long been broken - generic, disconnected, and inaccessible to most job seekers.

Apna, India’s leading job search and careers platform, set out to change that by making every mock interview feel like a real one - personalized to each role, company, and candidate.

With over 60 million users and 10,000+ companies across 30,000+ roles, Apna’s vision required more than training modules. It demanded conversation - lifelike timing, empathy, and domain depth - at massive scale.

To achieve this, Apna engineered one of the most advanced AI interview ecosystems, powered by ElevenLabs Text to Speech and Blue Machines’ voice orchestration platform. Together, these systems have delivered over 1.5 million AI interviews, totaling 7.5 million voice minutes, with sub-300 ms latency.

Why Apna chose ElevenLabs

For interview simulations to feel natural, voice quality and responsiveness are inseparable. Any audible delay or robotic tone breaks immersion and trust.

Apna selected ElevenLabs for three core reasons:

Low-latency streaming performance - responses begin playback within 150–180 ms.
Multilingual capability - seamless synthesis across Indian English, Hindi, and code-mixed speech.
Emotional nuance - tone modulation that mirrors human empathy and challenge.

These qualities allow Apna to preserve the rhythm of real conversation while maintaining emotional credibility at scale.

Orchestrating real-time human realism at scale

Delivering a mock interview that feels real requires more than scripted dialogue. It demands precise orchestration across multiple systems – from voice and latency to empathy and context – all operating in sync at machine speed.

Every company interviews differently. A product manager’s role may expect metrics reasoning; a bank credit officer role may be probed for compliance logic; an e-commerce platform lead checks route optimization.

Apna’s orchestration platform Blue Machines built a Retrieval-Augmented Generation (RAG) graph for each role × company intersection:

● 10 000 + companies × 50–100 roles = ~500 million micro-models.
● Each model anchored to company-specific rubrics, tone, and vocabulary.

They integrated ElevenLabs’ streaming TTS directly into its conversational loop. Each turn begins with candidate speech, processed by multilingual ASR and NLU models, followed by workflow logic that evaluates intent, emotional tone, and role-specific context. The system then retrieves relevant domain data, composes the next question, and plays it back through ElevenLabs — all within roughly 300 milliseconds end-to-end.

“Each synthesized response begins playback within ~150–180 ms, thanks to ElevenLabs’ low-latency APIs integrated directly into Apna and Blue Machines’ orchestration layer”, said Abhishek Ranjan, CTO, Apna

At 300 ms, the human brain perceives speech as continuous rather than delayed - the threshold where realism begins.

Function

Edge ingress

Regional gateways + smart routing

ASR + NLU

Streaming multilingual recognition

Workflow logic + persona

Role logic + empathy modulation

Context retrieval + evaluation

Domain data fetch + validation

TTS playback

ElevenLabs voice synthesis start

Total

—

Time (ms)

Edge ingress

ASR + NLU

Workflow logic + persona

Context retrieval + evaluation

TTS playback

100

Total

≈300 ms

Stage

Function

Time (ms)

Edge ingress

Regional gateways + smart routing

ASR + NLU

Streaming multilingual recognition

Workflow logic + persona

Role logic + empathy modulation

Context retrieval + evaluation

Domain data fetch + validation

TTS playback

ElevenLabs voice synthesis start

100

Total

—

≈300 ms

The result is a system that balances technical precision with emotional depth. Thousands of interviews run concurrently across Indian English, Hindi, and code-mixed speech, each maintaining the rhythm, empathy, and credibility of a real human exchange.

Impact at scale

Result

Mock AI interviews conducted

1.5 million+

Voice minutes

7.5 million+

Average latency

<300 ms

Role–company models

500 million+

Metrics

Result

Mock AI interviews conducted

1.5 million+

Voice minutes

7.5 million+

Average latency

<300 ms

Role–company models

500 million+

Equalizing access to opportunity

A 24-year-old candidate from Pune shared:

The AI interviewer knew my résumé, switched between Hindi and English, and challenged me like a real HDFC bank panel. I cracked the job on my next attempt.

For the first time, candidates can practice interviews that feel truly real – tailored to their résumé, company, and dream role.

Apna’s AI Interview Prep shows how voice technology can democratize opportunity - giving millions of job seekers the same level of preparation once reserved for a privileged few.

For many, practicing with a lifelike interviewer builds real confidence before their first human interview.

By combining real-time voice with adaptive context and empathy, Apna has turned preparation into participation - giving everyone, regardless of background or language, an equal chance to succeed.

Unlocking the next frontier of learning

Apna’s AI Interview Prep defines the next generation of AI-driven learning and interviewing.

Realistic, responsive voices powered by the ElevenLabs Text to Speech API let candidates experience personalized feedback, natural timing, and bilingual fluency that text-based practice could never offer.

Through this collaboration, Apna has redefined what scalable learning sounds like - proving that voice-based AI can extend human opportunity, not replace it.

Apna’s success demonstrates how high-fidelity voice can transform education, employability, and access to opportunity at national scale.

If you’re building conversational learning tools, AI interviewers, or any system where realism and empathy matter, discover what's possible with ElevenLabs Conversational Agents Platform.

Explore articles by the ElevenLabs team

Impact

Impact

Restoring identity through voice in Africa: Senses Hub x ElevenLabs

Millions of people across Africa live with speech impairments or loss of voice. Through our partnership with Senses Hub, we’re developing personalized, culturally relevant voices that restore identity, confidence, and connection across the continent.

Company

Building the first Agentic Government with Ukraine

Company

Building the First Agentic Government with Ukraine

Using AI to make public services works for everyone by voice

Create with the highest quality AI Audio

Get started free

Already have an account? Log in