.webp&w=3840&q=95)
Top 5 Speechify alternatives for reading text aloud
Explore the best alternatives to Speechify.
KUBI is a conversational barista that works with ElevenLabs' Conversational AI. Here's how.
KUBI is a conversational robot barista and receptionist at Second Space, a next-gen 24/7 co-working space in Kaohsiung, Taiwan. Since the workspace operation is fully automated, it’s very important for KUBI — as the first interaction point with members — to add a unique friendly touch. That’s why Second Space chose ElevenLabs’ Conversational AI to create fun and memorable interactions with members. Let’s see KUBI in action.
KUBI employs a sophisticated multi-sensory architecture to simulate human-like interaction. The system hinges on a microservices architecture, where specialized services operate concurrently and communicate via a real-time event stream. These services manage various tasks, including facial and object recognition using real-time AI inference, cup detection and sanity checks via cameras, receipt printing, secure facial recognition for access control, and precise control of milk and bean dispensers.
These are some of the services that are running concurrently:
Why all these microservices? Easy — we manage them independently, scale easily, and use the best tools for each task.
Coordinating all these micr-services is a central service, humorously called "BigBoy". It’s essentially a giant, non-blocking event processor:
1 | |
2 | internal object WeatherIdleScenario: SingleTaskScenario(scenario){ |
3 | |
4 | importance = Importance.Medium |
5 | compilationTimeout = Time.ThreeSeconds |
6 | interruptable = false |
7 | exeutionExpiration = = Time.TenSeconds |
8 | |
9 | override fun isEligible(event: Event, environment: Environment): Maybe<Boolean> = withEnvironment(environment) { |
10 | just { |
11 | (event is IdleEvent |
12 | && !triggeredInLast(40.minutes) |
13 | && (personPresent() || hasActiveSessions) |
14 | && environment.weatherService.lastReportWithin(10.minutes)) |
15 | } |
16 | } |
17 | } |
18 | |
19 | private val scenario = ScenarioRecipe { event, env, session -> |
20 | |
21 | |
22 | invokeOneOf( |
23 | |
24 | phrase { |
25 | sayWith { |
26 | "Rainy day today, isn't it? That's why I have my little umbrella! Look!".asEnglish |
27 | }.withAutoGift().withAutoMotion() |
28 | }.given { Weather.isRaining() }, |
29 | |
30 | phrase { |
31 | sayWith { |
32 | "Friend, it's so cold outside! So sad for you... because you're a human. I don't really mind!".asEnglish |
33 | }.withAutoMotion() |
34 | |
35 | sayWith { |
36 | "Wait, that soudned a bit rude.".asEnglish |
37 | }.withAutoMotion() |
38 | |
39 | }.given { Weather.isCold() }, |
40 | |
41 | ) |
42 | |
43 | |
44 | } |
45 | |
46 |
What are scenarios?
Think of scenarios as non-blocking compilers for robot action events. An action event is usually the most downstream event, that is the last step in a chain, resulting in a physical effect, such as motion or speech. For instance, a greeting scenario might trigger:
Event Generation with LLM: Some action events are automatically generated by an LLM, for example, withAutoMotion
would pick the best motion from a pre-defined list based on the given context. While withAutoGif
uses an LLM to generate the most suitable tag for the given phrase. The tag is used to get a GIF on Giphy, which will later be displayed on the face of KUBI together with the phrase.
Synchronization of action events: These events then flow through a scheduler that ensures speech, facial expressions, and motions stay synchronized. Synchronization ensures KUBI’s speech matches its gestures perfectly.
The cool thing is, that scenarios can even listen to action events and trigger new action events dynamically. For example:
BigBoy literally sees and knows everything going on. Pretty cool, huh?
Most of the services are hosted locally and are wrapped in a docker container. In the container, their lifecycle is managed by the Supervisor process control system. Error logs are collected in Sentry and are fed into a custom admin app to monitor any exceptions, real-time status of services and sensors as well as latency reportings. The cool thing is that the Flutter app was 90% generated by AI.
Second Space had a very specific personality in mind for KUBI - a mixture of Deadpool, Wheatley from Portal game and a bit of Pathfinder from Apex Legends. They managed to design the voice in 15 minutes complete with emotions and pauses that make the voice even more human.
ElevenLabs powers KUBI’s speech capabilities through two core APIs:
Activated when a customer says, "Hey KUBI!", Conversational AI from ElevenLabs is able to respond in 200ms, making the interaction feel truly human-like.
Using ElevenLabs’ Conversational AI via WebSocket connection, KUBI can leverage function calling, for example:
Switching between different LLM models easily through ElevenLabs' admin panel helps Second Space optimize understanding and accuracy, as we noticed that different models recognize the tool intents better than others. They are currently using Gemini 2.0 Flash to be their core model for Conversational AI and ChatGPT 4o for the static speech generations.
Second Space’s first GitHub commits referencing ElevenLabs date back to January 2023 - even before the multilanguage model was released. They recognized ElevenLab’s dedication to quality early on and confidently built out an architecture anticipating future multilingual support. Now, entering markets like Japan and South Korea is as simple as flipping a switch — no extra dev work required!
Microservices, real-time events, and ElevenLabs' powerful voice technology make KUBI feel truly alive and ready to conquer and delight the world, one coffee and witty interaction at a time.
Explore the best alternatives to Speechify.
Explore AI voice's growing role in interactive storytelling and gaming.