Grok Imagine Video: A Guide to AI Motion Creation

Last updated: June 3, 2026

Last updated: June 3, 2026 · Covers Grok Imagine Video 1.5 and Grok Imagine Video

asset_Q4i1bH5M8qFufTCa2Tpf1Wwj_Render a clean, editorial banner (16_9 wide) in a warm, sunlit desk composition — high-end 3D_illustration hybrid, soft shadows, shallow depth of field, minimal and readable. Use the .png

Scenario hosts two xAI video models under the Grok Imagine family. Grok Imagine Video 1.5 is the newest image-to-video release: strong face fidelity, cinematic motion, and native audio synced to the clip. Grok Imagine Video is the earlier all-in-one model: text-to-video, image-to-video, and video editing from a source clip.

The short version

Have a still you love? Start with Grok Imagine Video 1.5.
Need text-to-video, aspect-ratio control, or video editing? Use Grok Imagine Video.
Your prompt directs the motion. Describe actions, camera moves, and sound beats, not just adjectives.

Which Model Should I Use?

Model	Best for	Input modes on Scenario
Grok Imagine Video 1.5 Quality	Hero image-to-video, character dialogue, face-locked motion, native audio	Image + prompt (image-to-video only)
Grok Imagine Video Generation	Text-to-video, flexible aspect ratios, re-imagining an existing clip	Prompt only, image + prompt, or source video + prompt

Pick 1.5 when the first frame is fixed and quality matters most. Pick the original Grok Imagine Video when you need to generate from text alone, pick an aspect ratio in the UI, or edit footage you already have.

1. Your Prompt Is the Director

In both models, the prompt is the script. The clearer the action, the cleaner the motion.

Describe movement, not mood alone. "Cinematic and dramatic" gives the model little to animate. "She takes one step back, turns her head 30 degrees left, the coat hem catches the wind" gives it a beat it can follow.

Structure longer clips beat by beat. For clips of 8 to 15 seconds, anchor what happens when. Timestamp markers like [00:00], [00:04], [00:08] help keep multi-step action on track.

Call out sound when you want it (1.5). Grok Imagine Video 1.5 generates native audio in the same pass. Name dialogue lines, ambient layers, and SFX explicitly: "soft room tone, distant traffic, no music" lands better than "ambient sound."

Use built-in prompt tools when stuck. Scenario's prompt enhancer and translation helpers can expand a thin idea into a full direction. You can also upload a still and let the system draft a starting prompt from the image.

2. Three Ways to Create

Image to video with Grok Imagine Video 1.5

Upload or select a First Frame image, write a prompt that describes how the scene should move, and run the model. The still becomes frame one; everything after follows your direction. Output runs 1 to 15 seconds at 480p or 720p.

This is the path for product reveals, character moments, stylized animation, and any shot where the composition is already locked in a still. Generate the source still with GPT Image 2 or another strong text-to-image model, then animate here.

Aspect ratio on 1.5: This model has no aspect-ratio setting. Frame the ratio in your source image (16:9, 9:16, 1:1, and so on). The video inherits it.

First frame with Grok Imagine Video

Same idea as 1.5: upload a photo as the starting frame and prompt the action. Use this when you need text-to-video elsewhere in the same project, video editing, or explicit aspect-ratio control from the model UI.

Source video editing with Grok Imagine Video

Upload an existing clip and prompt a transformation: new style, new environment, new materials, while keeping the original motion structure. When a source video is attached, the First Frame field is disabled because the clip already defines the structure.

Video editing on this model supports shorter transforms (up to about 8.7 seconds depending on input). For full 15-second image-driven clips, use image-to-video on either model instead.

Text to video with Grok Imagine Video

Leave the image empty, write a full scene prompt, set aspect ratio and duration, and generate from scratch. This mode exists on Grok Imagine Video only, not on 1.5.

3. Settings That Shape the Output

Shared on both models

Duration. Set anywhere from 1 to 15 seconds. Shorter tests (5 to 8 seconds) are the fastest way to validate prompt structure before committing to a long hero clip.

Video Count. Generate 1 to 4 variations in one run. Useful when you want alternate motion directions from the same first frame.

Resolution. 480p for drafts and iteration. 720p for customer-facing or final social cuts. Higher resolution increases cost.

Grok Imagine Video 1.5 only

First Frame (required). The source still. Center the subject and leave room for the motion you describe in the prompt.

Prompt (required). Up to 10,000 characters. Match prompt depth to clip length: a 10 second clip usually needs three or four distinct action beats.

Grok Imagine Video only

Aspect Ratio. Choose Auto to match the source image, or pick 16:9, 9:16, 1:1, and other presets for the target platform.

Optional First Frame or Source Video. Text-only for generation from scratch; attach one or the other when you need anchored motion or editing.

Tips for Better Results

Run a short test first. A 5 second 480p clip confirms motion before you spend on a 15 second 720p hero.
Keep source stills clean. Simple backgrounds and a clear subject help both models track faces and objects through the clip.
Limit beats per clip. Two or three strong actions hold together better than five quick cuts in a single 10 second generation.
Match platform framing upstream. For 1.5, compose the aspect ratio in the source image. For Grok Imagine Video, set Aspect Ratio in the model or use Auto from the still.
Describe camera motion explicitly. "Slow dolly in," "static wide," or "handheld follow from behind" gives clearer results than "dynamic camera."
Iterate with Video Count. Four takes from the same frame is often cheaper than four separate manual rewrites.

Known Limitations

Grok Imagine Video 1.5 is image-to-video only on Scenario. Text-to-video, reference-to-video, video extension, and standalone video edit modes from the wider xAI API are not exposed on this model page today.
1.5 has no aspect-ratio control. Control framing in the source image before you animate.
Water and fine liquid motion can artifact. Prefer reaction shots, sound cues, or implied motion when water is central to the beat.
Content moderation may block some prompts. Reframe violent or sensitive subjects when a run fails moderation.
Render time varies. Most clips finish in one to two minutes, but complex 720p runs can take longer. Poll the job rather than duplicating the request.
Video editing stays on Grok Imagine Video. To restyle existing footage, use the original model's Source Video workflow, not 1.5.

Use Cases

Social and short-form: Animate a key still into a 9:16 or 1:1 clip with synced sound on 1.5.
Marketing and product: Turn hero photography into a 5 to 10 second reveal with camera push and ambient audio.
Games and cinematics: Character dialogue beats and intro moments from concept art or key art stills.
Previz and film: Test motion direction from storyboard frames before a full shoot.
Video refresh: Restyle location, season, or art direction on existing footage with Grok Imagine Video's Source Video mode.

Start with the model that matches your input: Grok Imagine Video 1.5 for still-to-motion with native audio, or Grok Imagine Video when you need text, aspect ratios, or clip editing in the same family.