Create speech with timing

Generate speech from text with precise character-level timing information for audio-text synchronization.