Grok Imagine Video: A Guide to AI Motion Creation
Last updated: June 3, 2026
Last updated: June 3, 2026 · Covers Grok Imagine Video 1.5 and Grok Imagine Video

Scenario hosts two xAI video models under the Grok Imagine family. Grok Imagine Video 1.5 is the newest image-to-video release: strong face fidelity, cinematic motion, and native audio synced to the clip. Grok Imagine Video is the earlier all-in-one model: text-to-video, image-to-video, and video editing from a source clip.
The short version
Have a still you love? Start with Grok Imagine Video 1.5.
Need text-to-video, aspect-ratio control, or video editing? Use Grok Imagine Video.
Your prompt directs the motion. Describe actions, camera moves, and sound beats, not just adjectives.
Which Model Should I Use?
Model | Best for | Input modes on Scenario |
|---|---|---|
Grok Imagine Video 1.5 Quality | Hero image-to-video, character dialogue, face-locked motion, native audio | Image + prompt (image-to-video only) |
Grok Imagine Video Generation | Text-to-video, flexible aspect ratios, re-imagining an existing clip | Prompt only, image + prompt, or source video + prompt |
Pick 1.5 when the first frame is fixed and quality matters most. Pick the original Grok Imagine Video when you need to generate from text alone, pick an aspect ratio in the UI, or edit footage you already have.
1. Your Prompt Is the Director
In both models, the prompt is the script. The clearer the action, the cleaner the motion.
Describe movement, not mood alone. "Cinematic and dramatic" gives the model little to animate. "She takes one step back, turns her head 30 degrees left, the coat hem catches the wind" gives it a beat it can follow.
Structure longer clips beat by beat. For clips of 8 to 15 seconds, anchor what happens when. Timestamp markers like [00:00], [00:04], [00:08] help keep multi-step action on track.
Call out sound when you want it (1.5). Grok Imagine Video 1.5 generates native audio in the same pass. Name dialogue lines, ambient layers, and SFX explicitly: "soft room tone, distant traffic, no music" lands better than "ambient sound."
Use built-in prompt tools when stuck. Scenario's prompt enhancer and translation helpers can expand a thin idea into a full direction. You can also upload a still and let the system draft a starting prompt from the image.
2. Three Ways to Create
Image to video with Grok Imagine Video 1.5
Upload or select a First Frame image, write a prompt that describes how the scene should move, and run the model. The still becomes frame one; everything after follows your direction. Output runs 1 to 15 seconds at 480p or 720p.
This is the path for product reveals, character moments, stylized animation, and any shot where the composition is already locked in a still. Generate the source still with GPT Image 2 or another strong text-to-image model, then animate here.
Aspect ratio on 1.5: This model has no aspect-ratio setting. Frame the ratio in your source image (16:9, 9:16, 1:1, and so on). The video inherits it.
First frame with Grok Imagine Video
Same idea as 1.5: upload a photo as the starting frame and prompt the action. Use this when you need text-to-video elsewhere in the same project, video editing, or explicit aspect-ratio control from the model UI.
Source video editing with Grok Imagine Video
Upload an existing clip and prompt a transformation: new style, new environment, new materials, while keeping the original motion structure. When a source video is attached, the First Frame field is disabled because the clip already defines the structure.
Video editing on this model supports shorter transforms (up to about 8.7 seconds depending on input). For full 15-second image-driven clips, use image-to-video on either model instead.
Text to video with Grok Imagine Video
Leave the image empty, write a full scene prompt, set aspect ratio and duration, and generate from scratch. This mode exists on Grok Imagine Video only, not on 1.5.
3. Settings That Shape the Output
Shared on both models
Duration. Set anywhere from 1 to 15 seconds. Shorter tests (5 to 8 seconds) are the fastest way to validate prompt structure before committing to a long hero clip.
Video Count. Generate 1 to 4 variations in one run. Useful when you want alternate motion directions from the same first frame.
Resolution. 480p for drafts and iteration. 720p for customer-facing or final social cuts. Higher resolution increases cost.
Grok Imagine Video 1.5 only
First Frame (required). The source still. Center the subject and leave room for the motion you describe in the prompt.
Prompt (required). Up to 10,000 characters. Match prompt depth to clip length: a 10 second clip usually needs three or four distinct action beats.
Grok Imagine Video only
Aspect Ratio. Choose Auto to match the source image, or pick 16:9, 9:16, 1:1, and other presets for the target platform.
Optional First Frame or Source Video. Text-only for generation from scratch; attach one or the other when you need anchored motion or editing.
Tips for Better Results
Run a short test first. A 5 second 480p clip confirms motion before you spend on a 15 second 720p hero.
Keep source stills clean. Simple backgrounds and a clear subject help both models track faces and objects through the clip.
Limit beats per clip. Two or three strong actions hold together better than five quick cuts in a single 10 second generation.
Match platform framing upstream. For 1.5, compose the aspect ratio in the source image. For Grok Imagine Video, set Aspect Ratio in the model or use Auto from the still.
Describe camera motion explicitly. "Slow dolly in," "static wide," or "handheld follow from behind" gives clearer results than "dynamic camera."
Iterate with Video Count. Four takes from the same frame is often cheaper than four separate manual rewrites.
Known Limitations
Grok Imagine Video 1.5 is image-to-video only on Scenario. Text-to-video, reference-to-video, video extension, and standalone video edit modes from the wider xAI API are not exposed on this model page today.
1.5 has no aspect-ratio control. Control framing in the source image before you animate.
Water and fine liquid motion can artifact. Prefer reaction shots, sound cues, or implied motion when water is central to the beat.
Content moderation may block some prompts. Reframe violent or sensitive subjects when a run fails moderation.
Render time varies. Most clips finish in one to two minutes, but complex 720p runs can take longer. Poll the job rather than duplicating the request.
Video editing stays on Grok Imagine Video. To restyle existing footage, use the original model's Source Video workflow, not 1.5.
Use Cases
Social and short-form: Animate a key still into a 9:16 or 1:1 clip with synced sound on 1.5.
Marketing and product: Turn hero photography into a 5 to 10 second reveal with camera push and ambient audio.
Games and cinematics: Character dialogue beats and intro moments from concept art or key art stills.
Previz and film: Test motion direction from storyboard frames before a full shoot.
Video refresh: Restyle location, season, or art direction on existing footage with Grok Imagine Video's Source Video mode.
Start with the model that matches your input: Grok Imagine Video 1.5 for still-to-motion with native audio, or Grok Imagine Video when you need text, aspect ratios, or clip editing in the same family.