Apna scales 7.5 million AI interview minutes using ElevenLabs

Building human-realistic mock interviews for millions of job seekers across India

Apna Blog 1x1.

Interview preparation in India has long been broken - generic, disconnected, and inaccessible to most job seekers.

Apna, India’s leading job search and careers platform, set out to change that by making every mock interview feel like a real one - personalized to each role, company, and candidate.

With over 60 million users and 10,000+ companies across 30,000+ roles, Apna’s vision required more than training modules. It demanded conversation - lifelike timing, empathy, and domain depth - at massive scale.

To achieve this, Apna engineered one of the most advanced AI interview ecosystems, powered by ElevenLabs Text to Speech and Blue Machines’ voice orchestration platform. Together, these systems have delivered over 1.5 million AI interviews, totaling 7.5 million voice minutes, with sub-300 ms latency.

Why Apna chose ElevenLabs

For interview simulations to feel natural, voice quality and responsiveness are inseparable. Any audible delay or robotic tone breaks immersion and trust.

Apna selected ElevenLabs for three core reasons:

  • Low-latency streaming performance - responses begin playback within 150–180 ms.
  • Multilingual capability - seamless synthesis across Indian English, Hindi, and code-mixed speech.
  • Emotional nuance - tone modulation that mirrors human empathy and challenge.

These qualities allow Apna to preserve the rhythm of real conversation while maintaining emotional credibility at scale.

Orchestrating real-time human realism at scale 

Delivering a mock interview that feels real requires more than scripted dialogue. It demands precise orchestration across multiple systems – from voice and latency to empathy and context – all operating in sync at machine speed.

Every company interviews differently. A product manager’s role may expect metrics reasoning; a bank credit officer role may be probed for compliance logic; an e-commerce platform lead checks route optimization.

Apna’s orchestration platform Blue Machines built a Retrieval-Augmented Generation (RAG) graph for each role × company intersection: 

● 10 000 + companies × 50–100 roles = ~500 million micro-models. 
● Each model anchored to company-specific rubrics, tone, and vocabulary.

They integrated ElevenLabs’ streaming TTS directly into its conversational loop. Each turn begins with candidate speech, processed by multilingual ASR and NLU models, followed by workflow logic that evaluates intent, emotional tone, and role-specific context. The system then retrieves relevant domain data, composes the next question, and plays it back through ElevenLabs — all within roughly 300 milliseconds end-to-end.

“Each synthesized response begins playback within ~150–180 ms, thanks to ElevenLabs’ low-latency APIs integrated directly into Apna and Blue Machines’ orchestration layer”, said Abhishek Ranjan, CTO, Apna

At 300 ms, the human brain perceives speech as continuous rather than delayed - the threshold where realism begins. 

Function
Edge ingress
Regional gateways + smart routing
ASR + NLU
Streaming multilingual recognition
Workflow logic + persona
Role logic + empathy modulation
Context retrieval + evaluation
Domain data fetch + validation
TTS playback
ElevenLabs voice synthesis start
Total
Time (ms)
Edge ingress
30
ASR + NLU
90
Workflow logic + persona
40
Context retrieval + evaluation
40
TTS playback
100
Total
≈300 ms

The result is a system that balances technical precision with emotional depth. Thousands of interviews run concurrently across Indian English, Hindi, and code-mixed speech, each maintaining the rhythm, empathy, and credibility of a real human exchange.

Impact at scale

Result
Mock AI interviews conducted
1.5 million+
Voice minutes
7.5 million+
Average latency
<300 ms
Role–company models
500 million+

Equalizing access to opportunity

A 24-year-old candidate from Pune shared:

The AI interviewer knew my résumé, switched between Hindi and English, and challenged me like a real HDFC bank panel. I cracked the job on my next attempt.

For the first time, candidates can practice interviews that feel truly real –  tailored to their résumé, company, and dream role.

Apna’s AI Interview Prep shows how voice technology can democratize opportunity - giving millions of job seekers the same level of preparation once reserved for a privileged few.

For many, practicing with a lifelike interviewer builds real confidence before their first human interview.

By combining real-time voice with adaptive context and empathy, Apna has turned preparation into participation - giving everyone, regardless of background or language, an equal chance to succeed.

Unlocking the next frontier of learning

Apna’s AI Interview Prep defines the next generation of AI-driven learning and interviewing.

Realistic, responsive voices powered by the ElevenLabs Text to Speech API let candidates experience personalized feedback, natural timing, and bilingual fluency that text-based practice could never offer.

Through this collaboration, Apna has redefined what scalable learning sounds like - proving that voice-based AI can extend human opportunity, not replace it.

Apna’s success demonstrates how high-fidelity voice can transform education, employability, and access to opportunity at national scale.

If you’re building conversational learning tools, AI interviewers, or any system where realism and empathy matter, discover what's possible with ElevenLabs Conversational Agents Platform.

Explore articles by the ElevenLabs team

ElevenLabs

Create with the highest quality AI Audio

Get started free

Already have an account? Log in