
1. Overview
ElevenLabs offers two text-to-speech models in Scenario: Multilingual v2 and Turbo 2.5. Multilingual v2 focuses on emotional richness and natural speech quality across 29 languages. Turbo 2.5 prioritizes speed and low latency for real-time applications, supporting 32 languages with significantly faster generation times.
Multilingual v2 delivers lifelike, emotionally expressive speech suitable for audiobooks, film dubbing, and narrative content. Turbo 2.5 sacrifices some emotional nuance for speed and cost-effectiveness, making it ideal for conversational agents, interactive applications, and real-time use cases.
Both models support automatic language detection and work with ElevenLabs' built-in voice library, enabling multilingual content creation with consistent voice characteristics.
Multilingual v2 supports 29 languages with rich emotional expression and natural prosody. Turbo 2.5 supports 32 languages (adds Vietnamese, Hungarian, Norwegian) with 3x faster generation for non-English languages and 25% faster English generation.
2. Model Selection
ElevenLabs Multilingual v2
Select when you need high-quality, emotionally rich speech for content like audiobooks, podcasts, voiceovers, or any application where natural expression matters more than speed.
ElevenLabs Turbo 2.5 (50% cheaper)
Choose for real-time applications, conversational agents, interactive media, or when fast generation is more important than emotional nuance.
3. Interface Controls / Step-by-Step
3.1 Text Input
Enter your text in the main text field. Both models support automatic language detection and can handle multilingual content within the same generation.
3.2 Voice Selection
Choose from ElevenLabs' voice library including Aria, Roger, Sarah, Laura, Charlie, George, Callum, River, Liam, Charlotte, Alice, Matilda, Will, Jessica, Eric, Chris, Brian, Daniel, Lily, and Bill. Each voice works with both models.
Aria: calm and warm, female voice.
Roger: confident and calm male voice.
Sarah: gentle, relaxed, female voice.
Laura: bright, energetic, friendly female voice.
Charlie: calm, smooth, male voice.
George: warm, gentle, conversational male voice.
Callum: direct, firm, professional male voice.
River: soft, calm, friendly female voice.
Liam: bright, energetic, engaging male voice.
Charlotte: smooth, calm, professional female voice.
Alice: friendly, light, conversational female voice.
Matilda: measured, clear, professional female voice.
Will: firm, calm, professional male voice.
Jessica: gentle, smooth, friendly female voice.
Eric: calm, balanced, professional male voice.
Chris: warm, clear, friendly male voice.
Brian: firm, clear, professional male voice.
Daniel: soft, calm, friendly male voice
Lily: calm, smooth, professional female voice.
Bill: warm, gentle, friendly male voice.
3.3 Generation Parameters
Stability (0-1, default 0.5)
Controls consistency and predictability of speech. Higher values produce more stable, consistent output. Lower values allow more variation and expressiveness.Similarity Boost (0-1, default 0.5)
Enhances similarity to the selected voice characteristics. Higher values make the output more closely match the chosen voice profileStyle Exaggeration (0-1, default 0)
Controls emotional intensity and expressiveness. Higher values increase dramatic emphasis and emotional range. More effective with Multilingual v2Speed (0.7-1.2, default 1)
Adjusts speech rate. Values below 1.0 slow down speech, above 1.0 speed it up. Extreme values may affect quality.Timestamps Toggle
When enabled, returns timestamps for each word in the generated speech, useful for synchronization applications.
3.4 Advanced Features
Previous Text / Next Text
Allows chaining multiple text segments for longer content generation while maintaining voice consistencyLanguage Code (Turbo 2.5 and Flash v2.5 only)
Manually specify language using ISO 639-1 codes to enforce specific language pronunciation when automatic detection isn't sufficient.
4. Key Differences
4.1 Multilingual v2
Multilingual v2 supports 29 languages and delivers the highest emotional fidelity with natural prosody that captures subtle speech nuances. The model uses standard generation times to prioritize quality over speed, making it ideal for audiobooks, dubbing, podcasts, voiceovers, and narrative content where natural expression matters most. Style Exaggeration is highly effective with this model, allowing for dramatic content that requires emotional depth and varied vocal expression.
4.2 Turbo 2.5
Turbo 2.5 extends language support to 32 languages by adding Vietnamese, Hungarian, and Norwegian to the Multilingual v2 foundation. While maintaining good overall quality, it reduces emotional nuance in favor of significantly faster generation speeds—3x faster for non-English languages and 25% faster for English. This makes it perfect for real-time applications, conversational AI, and interactive media where response time is critical. Additionally, Turbo 2.5 includes Language Code support for manual language enforcement when automatic detection needs override.
5. Practical Prompt Examples
Audiobook Narration
"Chapter One: The Discovery. Sarah walked through the ancient library, her footsteps echoing in the silence. She had no idea that this moment would change everything."
Emotional Dialogue
"I can't believe you're leaving! After everything we've been through together, how can you just walk away like this means nothing?"
Educational Content
"Today we'll explore the fascinating world of quantum physics. Don't worry if it seems complex at first – we'll break it down step by step."
Multilingual Content
"Welcome to our international conference. Bienvenue à notre conférence internationale. Bienvenidos a nuestra conferencia internacional."
Conversational AI (Turbo 2.5)
"Hi there! How can I help you today? I'm here to answer your questions and assist with whatever you need."
Interactive Gaming (Turbo 2.5)
"Quest completed! You've earned 500 experience points. Your next objective is marked on your map."
Real-time Notifications (Turbo 2.5)
"You have three new messages. The first message is from John, received five minutes ago."
Quick Announcements (Turbo 2.5)
"Attention passengers, the 3:15 train to downtown will be departing from platform 2 in five minutes."
6. Optimization Settings by Use Case
6.1 Audiobook/Podcast (Multilingual v2)
For audiobook and podcast production, choose expressive voices like Aria or Sarah that can convey narrative emotion effectively. Set Stability between 0.6-0.8 to ensure consistent narration throughout longer content, while using a Similarity Boost of 0.7 to maintain strong voice consistency across chapters or episodes. Apply moderate Style Exaggeration of 0.3-0.5 for natural expression that engages listeners without overwhelming the content. Keep Speed between 0.9-1.0 to create a comfortable listening pace that allows for proper comprehension and enjoyment.
6.2 Conversational AI (Turbo 2.5)
Conversational AI applications work best with natural-sounding voices like Charlie or Laura that feel approachable and friendly. Use Stability settings of 0.4-0.6 to allow slight variation that makes conversations feel more human and less robotic. Set Similarity Boost to 0.5 for balanced consistency that maintains character while allowing natural speech variation. Keep Style Exaggeration minimal at 0-0.2 to maintain a neutral, professional tone appropriate for most conversational contexts. Speed should be set to 1.0-1.1 to create a responsive feel that matches natural conversation pacing.
6.3 Dramatic Content (Multilingual v2)
Dramatic content requires expressive voices like George or Jessica that can handle emotional range and character depth. Lower Stability to 0.3-0.5 to allow for emotional variation that brings characters to life and supports dramatic storytelling. Use Similarity Boost of 0.6 to maintain character consistency while allowing for emotional expression. Increase Style Exaggeration to 0.6-0.8 for dramatic emphasis that enhances the emotional impact of the content. Reduce Speed to 0.8-0.9 for dramatic pacing that gives weight to important moments and allows emotional beats to resonate.
6.4 Educational Content (Both Models)
Educational content benefits from clear, articulate voices like Brian or Alice that prioritize comprehension and clarity. Set Stability to 0.7 for consistent delivery that helps students focus on the content rather than vocal variations. Use Similarity Boost of 0.6 for reliability that ensures consistent voice characteristics across lessons or modules. Apply moderate Style Exaggeration of 0.2-0.4 to create engaging but clear speech that maintains student interest without distracting from the educational material. Set Speed to 0.9 to optimize for comprehension, giving students time to process complex information while maintaining engagement.
7. Language Support
Multilingual v2 Languages (29)
English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Korean, Chinese (Mandarin), Arabic, Hindi, Dutch, Polish, Czech, Slovak, Ukrainian, Croatian, Romanian, Bulgarian, Greek, Finnish, Danish, Swedish, Norwegian, Hungarian, Turkish, Hebrew, Malay, Tamil.
Turbo 2.5 Additional Languages (32 total)
All Multilingual v2 languages plus Vietnamese, Hungarian, and Norwegian with enhanced support.
Language Detection
Both models automatically detect input language. For Turbo 2.5, you can manually specify language using the Language Code dropdown when automatic detection needs override.
8. Best Practices
Text Formatting
Use proper punctuation for natural pacing. Periods create pauses, commas add brief breaks, and exclamation points increase energy. Break long paragraphs into shorter sentences for better flow.
Voice Consistency
Use the same voice and similar parameter settings across related content. Enable Timestamps when you need precise synchronization with other media.
Parameter Tuning
Start with default settings and adjust based on results. Higher Stability for consistent content, higher Style Exaggeration for dramatic effect, adjusted Speed for pacing preferences.
Model Selection
Use Multilingual v2 for final production content where quality matters. Use Turbo 2.5 for prototyping, real-time applications, or when speed is the priority.
9. Creative Applications
Content Creation
Create audiobook narrations, podcast intros, video voiceovers, and educational content. Use Previous Text/Next Text features for longer content while maintaining voice consistency.
Interactive Applications
Build conversational AI systems, voice assistants, interactive games, and real-time communication tools. Turbo 2.5's speed makes it ideal for responsive applications.
Multilingual Projects
Develop content for global audiences using automatic language detection or manual language specification. Both models maintain voice characteristics across different languages.
10. Troubleshooting
Quality Issues
If speech sounds robotic, lower Stability and increase Style Exaggeration. If pronunciation is incorrect, try different punctuation or manual Language Code specification for Turbo 2.5.
Speed vs Quality Balance
For real-time applications needing better quality, try Turbo 2.5 with higher Stability settings. For high-quality content needing faster generation, use Multilingual v2 with optimized parameters.
Voice Consistency
Use identical parameter settings and the same voice selection across related content. The Similarity Boost setting helps maintain consistent voice characteristics.
Conclusion
ElevenLabs Multilingual v2 and Turbo 2.5 serve different needs in text-to-speech applications. Multilingual v2 excels when emotional expression and speech quality are priorities, making it ideal for audiobooks, dubbing, and narrative content across 29 languages. Turbo 2.5 prioritizes speed and responsiveness, supporting 32 languages with faster generation times perfect for real-time applications and interactive systems.
Choose your model based on whether quality or speed is more important for your specific use case, and adjust the generation parameters to match your content requirements and audience needs.
Was this helpful?