Meta MusicGen: AI Music with Text & Melody Control

1. Overview of Meta MusicGen

Meta MusicGen represents Meta's breakthrough in AI-powered music generation, offering creators unprecedented control over musical composition through both text prompts and melody conditioning. Developed by Meta's research team, MusicGen excels at generating high-quality, coherent musical pieces that can span various genres, styles, and emotional contexts.

MusicGen's unique strength lies in its dual-input capability: it can generate music from text descriptions alone or use reference audio to guide the melodic and harmonic structure of new compositions. This flexibility makes it particularly powerful for creators who want to maintain specific musical elements while exploring new arrangements and productions.

The model understands complex musical relationships, from genre conventions to instrumental arrangements, making it suitable for both professional music production and creative experimentation. Whether you're creating background music, developing musical ideas, or producing complete compositions, MusicGen delivers consistent, high-quality results that maintain musical coherence throughout.

2. Getting Started with Meta MusicGen

2.1 Model Selection

When generating audio in Scenario, select Meta MusicGen from the model dropdown. You'll then need to choose your Model Version:

stereo-melody-large: The most capable version, supporting both text-only generation and melody conditioning with stereo output
stereo-large: Optimized for text-only generation with high-quality stereo output
melody: Specialized for melody conditioning workflows
large: Standard high-quality generation model

For most use cases, stereo-melody-large provides the best balance of quality and functionality.

2.2 Understanding the Interface

The MusicGen interface in Scenario provides several key sections:

Prompt Field: Where you describe your desired music
Input Audio: Upload reference audio for melody conditioning
Duration Control: Set the length of generated music (up to 30 seconds)
Continuation Settings: Extend existing audio with seamless transitions
Advanced Controls: Fine-tune generation parameters

3. Text-Based Music Generation

3.1 Crafting Effective Prompts

Effective MusicGen prompts should be descriptive and specific:

Genre/Style: "jazz fusion," "ambient electronic," "rock ballad"
Instrumentation: "electric guitar and bass," "piano and strings," "synthesizers"
Mood/Energy: "upbeat and energetic," "melancholic," "mysterious"
Production Style: "lo-fi," "polished studio sound," "live recording feel"

Example: "Upbeat jazz fusion with electric guitar lead, walking bass line, and crisp drum kit, recorded in a professional studio setting"

3.2 Advanced Prompting Techniques

For more sophisticated results, include:

Tempo Descriptors: "fast-paced," "moderate tempo," "slow and contemplative"
Musical Structure: "verse-chorus structure," "instrumental solo section"
Harmonic Content: "minor key," "blues progression," "complex jazz harmonies"
Rhythmic Elements: "syncopated rhythm," "straight beat," "swing feel"

Example: "Cinematic orchestral piece in D minor with soaring string melodies, powerful brass sections, and dramatic percussion, building to an epic crescendo"

3.3 Using Prompt Spark

Scenario's Prompt Spark can enhance your MusicGen prompts by:

Adding genre-specific terminology
Suggesting instrumentation details
Expanding basic concepts into comprehensive descriptions
Providing production and mixing descriptors

4. Melody Conditioning with Input Audio

4.1 Understanding Melody Conditioning

MusicGen's melody conditioning feature allows you to upload reference audio that guides the melodic and harmonic structure of the generated music. This is particularly useful for:

Reimagining existing melodies in different styles
Creating variations of musical themes
Maintaining melodic consistency across different arrangements
Exploring genre transformations of familiar tunes

For best results with melody conditioning, choose reference audio that features clear, recognizable melodies without background noise or distortion. Keep your clips between 15-30 seconds long, focusing on simple instrumental or vocal melodies rather than complex multi-layered arrangements.

4.2 Using Input Audio

To use melody conditioning:

Select from Library: Choose from previously uploaded or generated audio files
Upload New Audio: Drag and drop or click to upload audio files, from your computer or from the Scenario interface (to the right)
Optimal length: 15-30 seconds for best results
Quality: Higher quality input generally produces better results

Example Workflow: Upload a simple piano melody, then prompt "Transform this melody into an energetic rock arrangement with electric guitars and drums

5. Duration and Continuation Controls

Set your audio length using the duration slider (5-30 seconds). For best results, use 15-20 seconds per generation. Turn on Continuation to extend existing audio seamlessly. Set your start and end times, then generate the next segment. Repeat this process to build longer compositions while maintaining musical coherence.

6. Advanced Generation Settings

Multi Band Diffusion improves audio quality and reduces artifacts. Enable it for professional results.

Normalization Strategy controls volume levels. Choose "Loudness" for consistent volume, "Peak" to prevent clipping, or "RMS" for balanced levels.

Temperature controls creativity. Lower values (0.5-0.8) give predictable results. Higher values (0.9-1.2) create more variation.

Guidance determines how closely the model follows your prompt. Higher values stick closer to your instructions.

Use the Seed field to reproduce results or create controlled variations.

7. Asset Management and Workflow

Preview your generated music directly in Scenario. Download high-quality files when satisfied. Pin favorites, add tags for organization, and share with collaborators using Scenario's built-in tools.

8. Creative Applications and Use Cases

Use MusicGen for video soundtracks, podcast music, game audio, and commercial projects. Generate reference tracks for musicians, create musical sketches, or explore different genres and styles. The royalty-free output works for any application.

9. Best Practices and Optimization

Be specific in your prompts. Use proper musical terminology and describe the production style you want. For melody conditioning, use clear, high-quality reference audio that's 15-30 seconds long.

Avoid overly complex prompts and contradictory instructions. Enable Multi Band Diffusion for better quality. Plan longer compositions with logical segment breaks.

10. Troubleshooting and Advanced Techniques

If results aren't good enough, simplify your prompts and try different model versions. Adjust temperature and guidance settings. Use cleaner reference audio for melody conditioning.

MusicGen works well for pre-production concepts, backing tracks, and custom soundtracks. Share generated music with team members and use it throughout your creative workflow.

Was this helpful?