Tada: Voice Cloning Suite on Scenario

Last updated: April 9, 2026

Tada arrives on Scenario as a dual-model voice cloning suite built for creators who need real, expressive human voices - not robotic synthesis. With Tada 3B and Tada 1B, you get two powerful options for turning text into speech that sounds exactly like the voice you want.

Introduction

Tada Text to Speech is a voice cloning technology that replicates any voice from a short audio reference and applies it to any text you provide. Unlike traditional TTS engines that rely on fixed voice libraries, Tada captures the unique timbre, rhythm, pacing, and emotional texture of a real voice and transfers it with striking accuracy - making every output feel personal and authentic.

The suite ships in two variants:

Tada 3B: The flagship model, powered by a 3-billion parameter architecture that prioritizes fidelity and expressive depth.
Tada 1B: The lightweight counterpart, offering faster generation and lower compute cost while maintaining strong voice consistency and natural-sounding output.

Both models support the same 10 languages and share identical parameter controls, making it easy to switch between them depending on your project's needs. Integrated natively into Scenario, Tada can use any audio asset already in your project as a reference.

The Two Models at a Glance

Feature	Tada 3B (High-Fidelity)	Tada 1B (Efficient)
Parameters	3 Billion	1 Billion
Cost	3 CU per generation	2 CU per generation
Best For	Character dialogue, branded narration, premium content.	Large-scale localization, prototyping, accessibility.
Focus	Subtle nuances, breath patterns, emotional depth.	Speed, volume, and cost-efficiency.

Parameters and Settings

Both models share a full set of granular controls:

Reference Audio (Required): Any audio file or Scenario asset ID. A 10 to 30-second sample of clean, expressive speech works best.
Prompt (Required): The text to be synthesized (up to 10,000 characters). Punctuation and structure influence pacing.
Reference Transcript (Optional): Recommended for non-English references to improve phonetic alignment.
Language: Supports English, Arabic, Chinese, German, Spanish, French, Italian, Japanese, Polish, and Portuguese.
Inference Steps: Controls acoustic generation quality (Default: 20, Max: 50).
Speed Up Factor: Controls speech rate (Range: 0.5 to 2.0).
Temperature & Top P: Controls variability. Lower values are more predictable; higher values introduce natural variation (Default Temp: 0.6, Top P: 0.9).
Acoustic CFG Scale: Guidance scale to push output closer to the prompt's intended delivery (Default: 1.6).
Noise Temperature: Controls diffusion noise for natural variation (Default: 0.9).

Multilingual Voice Cloning

One of Tada's most powerful capabilities is its multilingual support. You can provide a reference audio in English and generate output in Japanese, Portuguese, or Arabic. The model preserves the original speaker's characteristics while adapting to the target language's phonetics.

Note: Always provide the Reference Transcript when the reference audio language differs from the target language.

Recommended Workflow

A proven approach inside Scenario is to:

Generate reference voices using ElevenLabs 3 Alpha (which offers distinct preset voices).
Feed those assets directly into Tada 3B or Tada 1B as references.
Synthesize your text in any of the supported languages.

This allows for a consistent, repeatable voice identity across thousands of lines of dialogue without additional recording sessions.

Conclusion

Tada 3B and Tada 1B bring genuine voice cloning to Scenario, turning any audio reference into a scalable, multilingual asset. Paired with Lyria 3 for music and ElevenLabs for reference voices, Tada enables a complete, synthetic audio production pipeline — requiring no studio or external tools.