Streaming and Caching with Supabase
Generate and stream speech through Supabase Edge Functions. Store speech in Supabase Storage and cache responses via built-in CDN.
Introduction
In this tutorial you will learn how to build and edge API to generate, stream, store, and cache speech using Supabase Edge Functions, Supabase Storage, and ElevenLabs.
Prefer to jump straight to the code?
Find the example project on GitHub.
Requirements
- An ElevenLabs account with an API key.
- A Supabase account (you can sign up for a free account via database.new).
- The Supabase CLI installed on your machine.
- The Deno runtime installed on your machine and optionally setup in your facourite IDE.
Setup
Create a Supabase project locally
After installing the Supabase CLI, run the following command to create a new Supabase project locally:
Configure the storage bucket
You can configure the Supabase CLI to automatically generate a storage bucket by adding this configuration in the config.toml
file:
Upon running supabase start
this will create a new storage bucket in your local Supabase
project. Should you want to push this to your hosted Supabase project, you can run supabase seed buckets --linked
.
Configure background tasks for Supabase Edge Functions
To use background tasks in Supabase Edge Functions when developing locally, you need to add the following configuration in the config.toml
file:
When running with per_worker
policy, Function won’t auto-reload on edits. You will need to
manually restart it by running supabase functions serve
.
Create a Supabase Edge Function for Speech generation
Create a new Edge Function by running the following command:
If you’re using VS Code or Cursor, select y
when the CLI prompts “Generate VS Code settings for Deno? [y/N]“!
Set up the environment variables
Within the supabase/functions
directory, create a new .env
file and add the following variables:
Dependencies
The project uses a couple of dependencies:
- The @supabase/supabase-js library to interact with the Supabase database.
- The ElevenLabs JavaScript SDK to interact with the text-to-speech API.
- The open-source object-hash to generate a hash from the request parameters.
Since Supabase Edge Function uses the Deno runtime, you don’t need to install the dependencies, rather you can import them via the npm:
prefix.
Code the Supabase Edge Function
In your newly created supabase/functions/text-to-speech/index.ts
file, add the following code:
Code deep dive
There’s a couple of things worth noting about the code. Let’s step through it step by step.
Handle the incoming request
To handle the incoming request, use the Deno.serve
handler. In the demo we don’t validate the request origin, but you can for example validate the request origin, or append a user access token and validate it with Supabase Auth.
From the incoming request, the function extracts the text
and voiceId
parameters. The voiceId
parameter is optional and defaults to the ElevenLabs ID for the “Allison” voice.
Using the object-hash
library, the function generates a hash from the request parameters. This hash is used to check for existing audio files in Supabase Storage.
Check for existing audio file in Supabase Storage
Supabase Storage comes with a smart CDN built-in allowing you to easily cache and serve your files.
Here, the function checks for an existing audio file in Supabase Storage. If the file exists, the function returns the file from Supabase Storage.
Generate Speech as a stream and split into two branches
Using the streaming capabilities of the ElevenLabs API, the function generates a stream. The benefit here is that even for larger text, you can start streaming the audio back to your user immediately, and then upload the stream to Supabase Storage in the background.
This allows for the best possible user experience, making even large text blocks feel magically quick. The magic here happens on line 17, where the stream.tee()
method branches the readablestream into two branches: one for the browser and one for Supabase Storage.
Upload the audio stream to Supabase Storage in the background
The EdgeRuntime.waitUntil
method on line 20 in the previous step is used to upload the audio stream to Supabase Storage in the background using the uploadAudioToStorage
function. This allows the function to return the streaming response immediately to the browser, while the audio is being uploaded to Supabase Storage.
Once the storage object has been created, the next time your users makes a request with the same parameters, the function will return the audio file from the Supabase Storage CDN.
Run locally
To run the function locally, run the following commands:
Once the local Supabase stack is up and running, run the following command to start the function and observe the logs:
Try it out
Navigate to http://127.0.0.1:54321/functions/v1/text-to-speech?text=hello%20world
to hear the function in action.
Afterwards, navigate to http://127.0.0.1:54323/project/default/storage/buckets/audio
to see the audio file in your local Supabase Storage bucket.
Deploy to Supabase
If you haven’t already, create a new Supabase account at database.new and link the local project to your Supabase account:
Once done, run the following command to deploy the function:
Set the function secrets
Now that you have all your secrets set locally, you can run the following command to set the secrets in your Supabase project:
Test the function
The function is designed in a way that it can be used directly as a source for an <audio>
element.
You can find an example frontend implementation in the complete code example on GitHub.