1. Overview
Beatoven provides high‑quality audio generation through two complementary models: Music Generation and Sound Effects. Both models are built on Beatoven’s maestro foundation model, which is trained exclusively on licensed music and sound‑effect datasets. Unlike many AI audio systems that rely on scraped data, maestro sources its training material from partners such as Rightsify, Soundtrack Loops and Pro Sound Effects. This ethical approach ensures that every output is royalty‑free and that revenue is shared with the original rights holders. The result is fast, professional‑grade audio generation that can be customised via descriptive text prompts.
2. Models & Capabilities
2.1 Music Generation (maestro)
Beatoven’s music generator produces full instrumental tracks from text descriptions. Key capabilities include:
Licensed dataset & diverse genres: maestro is trained on over three million licensed music and sound‑effect tracks spanning jazz, electronic, ambient, hip‑hop, Latin and cinematic styles. This large corpus gives it the musical vocabulary to cover a wide range of genres.
Professional audio quality: Outputs are delivered at a 44.1 kHz sample rate, ensuring professional fidelity.
Customisable duration: You can generate complete instrumental tracks up to 2 minutes 30 seconds long. The model can also create shorter clips for intros or social media.
Deep creative control: Prompts can specify instrumentation, tempo, mood and key. Additional parameters such as negative prompts and seed values allow you to refine or randomise results. The model can even output isolated stems (e.g. just drums or just piano) for more flexible mixing.
Commercial use & ethical training: Since maestro is trained on licensed music and shares revenue with rights holders, every track you generate is cleared for commercial use.
These features make the music model ideal for podcasters, filmmakers, game developers and advertisers who need original background music on demand. Example prompts might include:
“Uplifting electronic pop with bright synthesizers, a driving bassline and a tempo around 120 BPM.”
“Cinematic orchestral piece with soaring strings and powerful brass, building to an epic climax.”
“Smooth jazz with saxophone and piano, medium tempo, creating a sophisticated lounge atmosphere.”
2.2 Sound Effects (maestro SFX)
The sound‑effects model generates realistic, context‑aware audio effects from text prompts. Its notable features include:
Large SFX dataset: The SFX model is trained on over a million licensed sound effects contributed by Pro Sound Effects, whose catalogue includes recordings used in blockbuster films. This ensures high‑quality and legally safe outputs.
Layered soundscapes up to 35 s: The model can produce sophisticated, multi‑layered soundscapes, such as “heavy rain with distant thunder” or “busy airport ambience”, for durations up to 35 seconds.
Immersive detail: Prompts can describe specific environments, objects and abstract effects. The model captures full environmental context, object sounds, natural ambiences and even futuristic noises—for different scenarios.
Creative control: You can adjust mood, intensity and emotion, and combine multiple elements (e.g. wind plus footsteps) to tailor the atmosphere. Outputs are generated quickly and are cleared for commercial use.
Common use cases include foley for film and video, game sound effects (explosions, vehicle engines, ambient loops), app notification tones and podcast sound design. Example prompts:
“Roaring thunderstorm with rolling thunder and heavy rain, lasting about 20 seconds.”
“Sci‑fi laser blast with futuristic energy build‑up and echoing decay.”
“Notification chime that is gentle yet attention‑grabbing, suitable for a meditation app.”
3. Key Strengths
Ethical and royalty‑free: maestro’s training uses officially licensed datasets, and Beatoven shares revenue with rights holders. This means users can generate music and effects without worrying about copyright issues or lawsuits.
High fidelity & length: Both models produce professional‑quality audio at a 44.1 kHz sampling rate and support longer durations (up to 2:30 for music and 35 s for sound effects).
Genre & style diversity: The music model covers jazz, Latin, ambient, cinematic, house, techno and more. The SFX model spans natural ambiences, mechanical noises and abstract effects.
Deep creative control: Text prompts can specify instrumentation, tempo, mood, intensity and environment. Negative prompts and seed parameters enable refinement or variation.
Isolated stems: For music, you can generate isolated instrument layers to use in your own mixes.
Quick generation & integrated workflow: Both models deliver results in seconds and are designed to integrate with Scenario’s audio generation interface.
4. Creative Applications
Beatoven’s models unlock a variety of creative workflows:
Music Generation
Podcast & video production: Compose intros/outros, background scores and theme music without hiring a composer.
Film & TV scoring: Generate cinematic cues, ambient textures and mood‑driven pieces for scenes.
Game audio: Quickly create background loops, level music and character themes tailored to genre and emotion.
Marketing & advertising: Develop custom tracks that reinforce campaign moods or brand identity.
Meditation & fitness apps: Produce ambient soundscapes, yoga tracks or high‑energy workout music on demand.
Sound Effects
Foley and post‑production: Design environmental sounds (rain, footsteps, explosions) and abstract effects for films or animations.
Game design & VR: Create immersive sound effects that react to player actions or environment—vehicle engines, weapon sounds, and ambient loops.
Mobile & desktop apps: Generate notification tones, UI sounds or brand signatures.
Podcast & social media: Add impactful stingers, transitions and ambience to audio‑visual content.
5. Crafting Effective Prompts
5.1 Music Prompts
To achieve the best results from Beatoven’s music model, follow these guidelines:
Describe the genre and style: Specify the musical style (“ambient electronic,” “jazz fusion,” “cinematic orchestral”) to set the foundation.
Set the mood and emotion: Use adjectives like “uplifting,” “dark,” “romantic,” or “energetic” to guide the emotional tone.
Define instrumentation: List key instruments (e.g. “piano and strings,” “synthesizers with drum machine”) to shape the arrangement.
Indicate tempo & key: If relevant, mention tempo (“slow at 60 BPM,” “fast 4/4 beat”) or musical key.
State duration: Request a specific length (e.g. “30‑second loop” or “120‑second track”).
Include use case: Mention the context (“for a podcast intro,” “for an action scene”) to help the model adapt the dynamics.
Use negative prompts and seeds: Exclude unwanted elements (“no vocals,” “avoid heavy drums”) and provide a seed for reproducibility or creative variation.
Example: “Epic orchestral score with soaring strings and powerful brass, minor key, 120 seconds, building to a heroic climax for a fantasy film trailer.”
5.2 Sound‑Effects Prompts
When crafting prompts for sound effects, aim for sensory clarity:
Identify the source: Describe the primary sound (“thunderstorm,” “old wooden door creaking,” “engine revving”).
Specify environment & context: Mention where and how the sound occurs (“indoors in a small room,” “outdoors on a rainy night,” “in deep space”).
Detail intensity & motion: Indicate volume, distance or movement (“distant rumbling thunder,” “close‑up roar of a sports car passing by”).
Combine layers: For complex scenes, include multiple elements (“waves crashing with seagulls and wind”).
Define duration: Request how long the effect should last (e.g. “10‑second notification chime,” “35‑second ambient loop”).
Set mood or emotion: Use descriptors like “ominous,” “peaceful,” “playful” to influence tone.
Refine iteratively: If the first generation doesn’t fit, adjust the prompt, use negative prompts or change the seed.
Example: “Gentle rain on a tin roof with occasional drips, lasting 25 seconds, creating a calming and intimate atmosphere.”
6. Controls & Settings in Scenario
When using Beatoven models in Scenario’s Generate Audio interface, you’ll have access to several controls:
Model selection: Choose between Beatoven Music for full tracks and Beatoven SFX for sound effects.
Prompt field: Enter your descriptive text following the guidelines above.
Negative prompt: Optionally exclude unwanted instruments or sounds.
Duration slider: Set the length of the output (up to 150 seconds for music and 35 seconds for sound effects).
Guidance/creativity setting: Adjust how strictly the model follows your prompt—higher values produce precise results, while lower values encourage creative variation.
Seed field: Provide a numeric seed for reproducible results or leave blank for random variation.
Output format: Outputs are delivered at 44.1 kHz in common audio formats (e.g. MP3 or WAV) suitable for professional use.
7. Best Practices & Tips
Be specific: Clear, detailed prompts yield better results than vague descriptions.
Focus on one idea: Especially for sound effects, avoid combining unrelated sources in a single prompt.
Iterate and refine: Use the seed and negative prompts to fine‑tune your outputs. Generate multiple variations and select the best fit.
Leverage stems: For music, request isolated instrument layers if you need flexibility in mixing.
Consider context: Think about how the audio will interact with your project (video timing, dialogue, other sounds) and adjust duration and intensity accordingly.
Ethical use: Remember that Beatoven’s models are trained on licensed content; respect the terms of use and credit the rights holders where appropriate.
8. Limitations & Considerations
No vocals (yet): maestro currently generates instrumental tracks and does not produce vocal performances. Vocals may be supported in future updates.
Genre biases: While the dataset covers many genres, niche styles or unconventional sound combinations may yield less convincing results.
Prompt sensitivity: Overly complex or contradictory prompts can confuse the model. Start simple and build complexity gradually.
Duration constraints: The maximum length is 2 minutes 30 seconds for music and 35 seconds for sound effects; longer pieces should be assembled from multiple generations.
Was this helpful?