Creating a Project
Projects is an end-to-end workflow for creating very long-form content. It allows you to upload a full book or document. You can even import a whole webpage via a URL. The AI can then generate a voiceover narration for the entire book, document, or webpage. You can then download either individual MP3 files for each chapter or as a single MP3 file for the whole audiobook.
We will provide a brief walkthrough of this feature, but we recommend that you test it yourself by navigating to the Projects tab in the menu.
Once you enter the new tab, you will encounter a screen where you can create new projects or open existing ones. The number of projects you can have at any given time is determined by your subscription. The higher your subscription is, the more projects you can have concurrently.
Click “Create new project” and you will be presented with a popup. Here, you can choose to create a new empty project, import an already existing EPUB, PDF, or TXT file, which will then automatically be converted into a project, or import text directly from a website using the URL to have the page be converted into a project. You can then use our Audio Native feature to easily and effortlessly embed any narration project onto your website.
For now, let’s create a new empty project. Next, you will be prompted to name your project and choose the default voice. Additionally, you will need to select the model that will be used and decide the quality settings. The voice and its settings can be changed after the project is created.
It is important to note that the model and quality settings will remain locked after the project has been created and cannot be changed without creating a completely new project from scratch.
Under “Download Settings,” you can set if you want the audio to be normalized / compressed to meet the ACX compliant standards of -18dB to -23dB with a maximum peak level of -3dB. This will apply the normalization to both audio played in the editor and audio downloaded. You can toggle this setting even after the project has been created via the “Settings” panel.
Here you can also input any relevant metadata you might want to attach to the downloaded audio files.
The quality settings determine the quality of the rendered output of your projects. This setting decides the bitrate for the MP3 and quality optimization. For most people, standard or high settings will be sufficient. However, for those who require the highest possible quality, we offer Ultra, which might be preferable in certain cases. These different quality settings have different costs associated with them, as they require different computational resources. Ultra is quite computationally intensive, making it the most expensive option. You are more than welcome to experiment with these different quality settings to find the one that best suits your project.
Now you can see your newly created project in the list. Click “Select” to open the project.
Settings and Buttons
Once inside the project, you will be presented with a blank page. However, if you choose to create a project by either importing a file or using a URL, you will be presented with that text as the system will automatically fill out the pages for you. If the EPUB, is well-structured and correctly formatted, it will also automatically split each chapter into its own chapter in Projects, making it very easy to navigate.
If you’ve ever used an online text editor, you will find yourself very at home with both the look and the structure of Projects, but we do have a few nifty features that will help you with especially long-form content.
At the top, you have a few buttons. I will go through the buttons in order of appearance, left to right. You can hover over some of these buttons to get more information.
- Continuous Play
- Insert Divider
Most of these are probably pretty self-explanatory, but I will go through some of them that might not be.
Regenerate will regenerate the currently selected paragraph and give you a new performance from your voice. This will remove the previous generated performance, so be careful.
Continuous play means that when you play a paragraph, it will continue and play the next one once the first one finishes. This makes it so that you can listen to your audiobook without having to pre-render all of the paragraphs first.
Insert divider makes it so that you can split a paragraph into multiple sections without dividing it into separate paragraphs. This can be useful if you want to have multiple speakers within a single paragraph, for example. Let’s say you’re creating an audio drama. Then you might want to have the narrator read the narration while the individual voices only read the dialogue within the paragraph.
In the upper right corner, you will see two buttons called “Settings” and “Convert.” However, if you are on a smaller screen, you might not see these buttons; instead, you will see three dots that you can click to access the drop-down list. The “Settings” button is for the settings of the whole project, where you can change the project name and set the default voice. Here, you can also view the model used as well as the quality of the project.
On the right-hand side, next to the text documentation itself, you will find a window called “Block Settings” where you can set the settings for the specific block you currently have selected. Here, you can change the specific voice and the voices settings – either for that specific block or as global settings.
Below that, you will see chapters where you can view all of the chapters you have either created or imported from a file. Keep in mind that the system does its best approximation if the file is not properly structured, and it will split the chapters automatically if they become too long. You can add more chapters by clicking the plus sign.
Now you can start writing. Please ensure that you use proper grammar and paragraph structures, as well as using line breaks where appropriate, as the AI will use these when generating. This goes for both Projects and Speech Synthesis, but it is even more important in Projects for optimal results.
When you have finished writing your text and are happy with it, you can generate a voiceover for it. You can click the paragraph - called blocks - for which you want to generate audio. The current selection will be highlighted in yellow. Then to generate that section to audio, just click the yellow play button at the top. This will initiate the generation of audio for the specific section you have highlighted. Once the audio has finished generating, it will play. This process is similar to how audio generation works on the Speech Synthesis page.
Paragraphs that have already been generated are indicated by the black bar on the right-hand side of each paragraph. If you press the yellow play button on the top bar and a paragraph has already been generated, it will just play that paragraph. However, if you press the button next to the play button, the one with the two circling arrows, it will regenerate that paragraph.
If you press the play button, and the paragraph is fully generated, you can also download the paragraph by clicking the download button in the lower right corner of the player. This is exactly how it works in the Speech Synthesis. However, this button will only appear when something is finished generating. So, if you have continuous play activated, it will not appear because the AI will keep generating the next section after the next section., meaning his only works for downloading individual paragraphs.
If you want to convert the entire chapter in one go, you can click the convert button in the upper right corner. This will open a window where you can choose to convert either your entire project or individual chapters. You can also download the entire project or individual chapters. Even after converting the whole chapter, you can still go back and generate sections of the book that you are happy with before downloading the entire thing. However, if you make any changes, you will need to press convert once again for the changes to be reflected in the whole book, so you can download the entire chapter.
After the conversion of either a whole project or individual chapters has finished, you will be able to see these conversions by clicking “Versions” next to either the project or the individual chapters. You can then download the different versions.
Sometimes you may want to specify the pronunciation of certain words, such as character/brand names, or to specify how acronyms should be read. Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that specifies pairs of words and how they should be pronounced, either using a phonetic alphabet or word substitutions. Whenever one of these words is encountered in a project, the AI model will pronounce the word using the specified replacement.
To provide a pronunciation dictionary file, open the settings for a project and upload a file in the .PLS format. When a dictionary is added to a project it will automatically recalculate which pieces of the project will need to be re-converted using the new dictionary file and mark these as unconverted.
Currently we only support PLS files that specify replacements using Phonemes, or Aliases.
- Phonemes. Phonemes are used to specify pronunciation using either the IPA (International Phonetic Alphabet) or CMU Arpabet alphabet. Phoneme rules are currently only supported by the Turbo v2 English model.
- Aliases. Aliases are used to specify pronunciation using other words or phrases. For example, to specify that the “UN” should be read “United Nations” whenever it is encountered in a project.
Both sets of rules specify a word or phrase they are looking for, referred to as a grapheme in the PLS files, and then their replacement. Please note that searches are case sensitive.
Here is an example PLS file that specifies in IPA the pronunciation of “Apple” with IPA of “ˈæpl̩” and “UN” with an alias of “United Nations”:
<?xml version="1.0" encoding="UTF-8"?> <lexicon version="1.0" xmlns="http://www.w3.org/2005/01/pronunciation-lexicon" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd" alphabet="ipa" xml:lang="en-GB"> <lexeme> <grapheme>Apple</grapheme> <phoneme>ˈæpl̩</phoneme> </lexeme> <lexeme> <grapheme>UN</grapheme> <alias>United Nations</alias> </lexeme> </lexicon>
When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the very first replacement is used.