Creating a Project

Projects is an end-to-end workflow for creating long-form content. It allows you to upload a full book or document. You can even import a whole webpage via a URL. The AI can then generate a voiceover narration for the entire book, document, or webpage. You can then download either individual MP3 files for each chapter or as a single MP3 file for the whole audiobook.

We will provide a brief walkthrough of this feature, but we recommend that you test it yourself by navigating to the Projects tab in the menu.

Once you enter the new tab, you will encounter a screen where you can create new projects or open existing ones. The number of projects you can have at any given time is determined by your subscription. The higher your subscription is, the more projects you can have concurrently.

Click “Add a new project” and you will be presented with a popup. Here, you can choose to create a new empty project, import an already existing EPUB, PDF, TXT or HTML file, which will then automatically be converted into a project, or import text directly from a website using the URL to have the page be converted into a project. You can then use our Audio Native feature to easily and effortlessly embed any narration project onto your website.

For now, let’s create a new empty project. You can name your project and choose the default voice. Additionally, you will need to select the model that will be used and decide the quality settings. The voice and their settings can be changed after the project is created.

Model and quality settings will remain locked after the project has been created and cannot be changed without creating a completely new project from scratch.

The quality setting determines the quality of the rendered output of your projects. This setting decides the bitrate for the MP3/Lossless WAV and quality optimization. For most people, standard or high settings will be sufficient. However, for those who require the highest possible quality we offer Ultra and Ultra Lossless (an uncompressed WAV file) which might be preferable in certain cases. These different quality settings have different costs associated with them, as they require different computational resources. Ultra Lossless is quite computationally intensive, making it the most expensive option. You are more than welcome to experiment with these different quality settings to find the one that best suits your project.

Once you set all the settings, press Create Project, you will be redirected to the editor.

Settings and Buttons

Once inside the project, you will be presented with a blank page. However, if you choose to create a project by either importing a file or using a URL, you will be presented with that text as the system will automatically fill out the pages for you. If the EPUB, is well-structured and correctly formatted, it will also automatically split each chapter into its own chapter in Projects, making it very easy to navigate.

If you’ve ever used an online text editor, you will find yourself very at home with both the look and the structure of Projects, but we do have a few nifty features that will help you with especially long-form content.

At the top, you have a few buttons. You can hover over some of these buttons to get more information.

Most of these are probably pretty self-explanatory, but let’s go through all of them.

Using Projects

Now you can start writing. Please ensure that you use proper grammar and paragraph structures, as well as using line breaks where appropriate, as the AI will use these when generating. This goes for both Projects and Speech Synthesis, but it is even more important in Projects for optimal results.

When you have finished writing your text and are happy with it, you can generate a voiceover for it. You can click the paragraph - for which you want to generate audio. The current selection will be highlighted in colour. Then to generate that section to audio, just click the play button at the top. This will initiate the generation of audio for the specific section you have highlighted. Once the audio has finished generating, it will play. This process is similar to how audio generation works on the Speech Synthesis page.

Paragraphs that have already been generated are indicated by the black bar on the left-hand side of each paragraph. If you press the play button on the top bar and a paragraph has already been generated, it will just play that paragraph. However, if you press the regenerate button, with two circling arrows, it will regenerate the paragraph. Once a paragraph has been generated at least once, you can select one or more words to regenerate only the selected text, rather than the whole paragraph. For the best results, we recommend regenerating a complete phrase or sentence at a time.

If you press the play button, and the paragraph is fully generated, you can also download the paragraph by clicking the download button in the lower right corner of the player. This is exactly how it works in the Speech Synthesis. However, this button will only appear when something is finished generating. So, if you have “play until end” activated, it will not appear because the AI will keep generating the next section after the next section., meaning this only works for downloading individual paragraphs.

If you want to convert the entire chapter in one go, you can click the convert button in the upper right corner. This will open a page where you can choose to convert either your entire project or individual chapters. You can also download the entire project or individual chapters. Even after converting the whole chapter, you can still go back and regenerate sections of the book that you are not happy with before downloading the entire thing. However, if you make any changes, you will need to press convert once again for the changes to be reflected in the whole book, so you can download the entire chapter.

After the conversion of either a whole project or individual chapters has finished, you will be able to see these conversions by clicking “Versions” next to either the project or the individual chapters. You can then download the different versions.

Once your Project is converted, you have several download options available.

Auto-Regenerate

When the Auto-Regenerate feature is enabled for your project, we will automatically check the output for any mispronunciations or unwanted audio artefacts. If we detect any, we will automatically regenerate the audio up to twice, at no extra cost.

This feature will increase the processing time, and needs to be enabled.

You can either enable it as an option in the Convert dialogue window, if you’re converting your Project or Chapter in one step, or in Project settings, which will impact individual paragraph generation.

Pronunciation Dictionaries

Sometimes you may want to specify the pronunciation of certain words, such as character/brand names, or to specify how acronyms should be read. Pronunciation dictionaries allow this functionality by enabling you to upload a lexicon or dictionary file that specifies pairs of words and how they should be pronounced, either using a phonetic alphabet or word substitutions. Whenever one of these words is encountered in a project, the AI model will pronounce the word using the specified replacement.

To provide a pronunciation dictionary file, open the settings for a project and upload a file in the .PLS format. When a dictionary is added to a project it will automatically recalculate which pieces of the project will need to be re-converted using the new dictionary file and mark these as unconverted.

Currently we only support PLS files that specify replacements using Phonemes, or Aliases.

  • Phonemes. Phonemes are used to specify pronunciation using either the IPA (International Phonetic Alphabet) or CMU Arpabet alphabet. Phoneme rules are currently only supported by the Turbo v2 English model.
  • Aliases. Aliases are used to specify pronunciation using other words or phrases. For example, to specify that the “UN” should be read “United Nations” whenever it is encountered in a project. You can use aliases with all models.

Both sets of rules specify a word or phrase they are looking for, referred to as a grapheme in the PLS files, and then their replacement. Please note that searches are case sensitive.

Here is an example PLS file that specifies in IPA the pronunciation of “Apple” with IPA of “ˈæpl̩” and “UN” with an alias of “United Nations”:

<?xml version="1.0" encoding="UTF-8"?>
<lexicon version="1.0"
      xmlns="http://www.w3.org/2005/01/pronunciation-lexicon"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="http://www.w3.org/2005/01/pronunciation-lexicon
        http://www.w3.org/TR/2007/CR-pronunciation-lexicon-20071212/pls.xsd"
      alphabet="ipa" xml:lang="en-GB">
  <lexeme>
    <grapheme>Apple</grapheme>
    <phoneme>ˈæpl̩</phoneme>
  </lexeme>
  <lexeme>
    <grapheme>UN</grapheme>
    <alias>United Nations</alias>
  </lexeme>
</lexicon>

When checking for a replacement word in a pronunciation dictionary, the dictionary is checked from start to end and only the very first replacement is used.