ElevenLabs Dubbing: The Essentials

Last updated: April 22, 2026

asset_gUCwRf4aV73fAK82eGDkRTeW_A sleek, modern desk setup with a central graphic illustrating the transformation of a single audio waveform into multiple language waveforms, seamlessly preserving the original voice.png

ElevenLabs Dubbing enables video and audio localization by translating spoken audio in any video or audio file into a different language while preserving the original speaker's voice, emotion, and timing. The service automates transcription, translation, and voice synthesis.

Key capabilities:

  • Input support: raw video files, finished exports, or standalone audio files.

  • Vocal fidelity: maintains nuance, pace, and emotional energy from original performances.

  • Output: dubbed versions with translated audio replacing original speech.


Parameters

Parameter

Type

Required

Default

Description

File

File (audio or video)

Yes

The audio or video asset to dub.

Target Language

String

Yes

en

Output language code (ISO 639-1). See supported languages below.

Source Language

String

No

Auto-detect

Original file language. Leave empty for automatic detection.

Number of Speakers

Integer

No

0

Speaker count. 0 = auto-detect. Max 10.

Drop Background Audio

Boolean

No

false

Removes music and ambience from the dubbed output when enabled.

Disable Voice Cloning

Boolean

No

false

Uses a generic voice instead of cloning the original speaker.

Highest Resolution

Boolean

No

false

Returns the dubbed video in the highest resolution available.


Supported Languages

The model supports 30 languages:

Arabic, Bulgarian, Chinese, Croatian, Czech, Danish, Dutch, English, Finnish, French, German, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Norwegian, Filipino, Polish, Portuguese, Romanian, Russian, Slovak, Spanish, Swedish, Thai, Turkish, Ukrainian, Vietnamese.


Features

Voice Cloning

By default, the model clones the original speaker's voice for each dubbed track. For best results, speakers should have at least 30 seconds of clean, clear audio. Set Disable Voice Cloning to true to use a generic synthesized voice instead.

Multi-Speaker Support

The model automatically detects up to 10 distinct speakers. For scenes with overlapping dialogue, set the Number of Speakers manually to improve accuracy.

Background Audio Control

Enable Drop Background Audio to strip music and sound design from the output. This is useful when dubbing into languages where background audio may interfere with voice clarity, but note the control is binary: complex sound design that needs partial preservation requires separate post-production mixing.


Best Practices

  • Prioritize clean source audio for optimal voice cloning quality.

  • Set speaker count manually when dialogue overlaps between multiple speakers.

  • Leave Highest Resolution disabled during testing to reduce processing time.

  • Ensure each speaker has at least 30 seconds of clear audio for effective voice learning.


Known Limitations

  • Optimized for up to 10 distinct speakers.

  • Dubbed audio is time-stretched to fit original visuals. Longer target languages may sound slightly accelerated.

  • The background audio toggle is binary. Complex sound design requires separate post-production mixing.