Pruna P-Video models overview

Last updated: May 25, 2026

asset_ovwxjks2ReN2YtsSiLwmjuRs_A top-down view of a luminous, modern design workspace, showcasing the versatility of Pruna AI's P-Video models. On the surface, various conceptual elements are laid out_ a flowing se.png

Overview of Pruna AI's P-Video models for character-driven video on Scenario.

The short version

P-Video generates video from text, image, or audio — with Draft mode for fast previews.
P-Video Replace swaps a character in existing footage while keeping background, lighting, camera motion, and audio.
P-Video Animate applies motion from a source clip to a still character image.
P-Video Avatar turns a portrait into a lip-synced talking video from a script or custom audio.

Pruna AI is a European model laboratory. It optimizes models for production speed, output quality, and practical use using pruning, compression, and quantization. The P-Video suite covers the most common character video workflows on Scenario. Run each model on its own or chain them in sequence.

P-Video

P-Video is Pruna's general-purpose video generation model. It supports text-to-video, image-to-video, and audio-conditioned video generation. It is designed for speed and affordability — well-suited to rapid prototyping and social content.

Draft mode generates lower-quality previews at four times the speed of a standard run. Use it to validate a concept before committing to a full 1080p output.

Key settings

Prompt: describes the scene, subject, and motion. Required for all runs.
Image: upload a starting frame to generate video from. When provided, aspect ratio is derived from the image and the aspect ratio field is ignored.
Last Frame Image: upload a target end frame to guide the video toward a specific final composition.
Audio: upload an audio file to condition video generation to the rhythm and content of the audio. When audio is provided, the duration field is ignored.
Duration: 1 to 20 seconds (default 5). Ignored when audio is provided.
Aspect Ratio: 16:9, 9:16, 4:3, 3:4, 3:2, 2:3, or 1:1 (default 16:9). Ignored when an input image is provided.
Resolution: 720p (default) or 1080p.
FPS: 24 (default) or 48.
Draft: enable for a fast low-quality preview at four times the generation speed before running at full quality.
Prompt Upsampling: on by default. Automatically expands and refines the prompt before generation. Disable to control the exact wording sent to the model.

P-Video Replace

P-Video Replace takes a source video and one to four reference character images. It replaces the on-screen character while preserving background, lighting, camera motion, and audio. Only the character changes.

This is not a face swap. The model reads full-body posture and spatial placement. It composites the reference character into each frame. Provide two to four reference images from different angles when the character moves significantly throughout the video.

It is highly recommended to extract the first frame of the source video and, using the reference character, generate the character in the same pose and framing as that frame for the most stable result.

Key settings

Video: the clip containing the character to replace.
Reference images: one to four images of the replacement character. Multiple angles improve consistency across the video. Match the pose to the source video's first frame for the most stable result.
Instruction Prompt: optional text that guides how the reference character is placed into the scene. Use this to specify clothing, expression, or spatial details not captured in the reference images.
Resolution: 720p or 1080p (default 1080p).
Save Audio: on by default. Carries the source video's audio into the output.
Target FPS: original, 24, or 48. Defaults to matching the source video's frame rate.
File size limit: Input videos must stay under 10 MB.

P-Video Animate

P-Video Animate transfers motion from a source video to a still character image. The output shows the reference character performing the source performer's actions in the reference image's visual style.

Use it for a strong character concept with no animation. Reuse a recorded performance across multiple characters without re-recording.

Key settings

Video: the clip that supplies the motion.
Image: the still character to animate. A clear, front-facing, full-body image on a clean background gives the most consistent motion transfer.
Instruction Prompt: optional text to guide how the animation is applied to the reference character.
Resolution: 720p or 1080p (default 1080p).
Save Audio: on by default. Carries the source video's audio into the output.
Target FPS: original, 24, or 48.

P-Video Animate accepts larger input files than P-Video Replace. Choose it for high-quality or long source footage.

P-Video Avatar

P-Video Avatar turns a portrait into a talking character video. Upload a photo or illustrated character. Provide a script or custom audio recording. The model generates accurate lip sync.

It supports photographs, illustrated characters, anime-style avatars, and stylized figures. Built-in voices cover 10 languages: English (US), English (UK), Spanish, French, German, Italian, Portuguese (Brazil), Japanese, Korean, and Hindi.

Key settings

Portrait image: a clear, well-lit image with the face fully visible and in a frontal position. Headshots produce more accurate lip sync than full-body shots. Strong side profiles or extreme head angles reduce consistency.
Script: the text the character speaks. The model synthesizes speech from this text using the selected voice.
Voice: choose from 30 built-in voices across the supported languages.
Voice Prompt: optional delivery instructions to adjust tone and pacing of the built-in voice. Ignored when custom audio is provided.
Custom audio: upload a recording instead of a built-in voice. When custom audio is set, the script, voice, and voice prompt fields are all ignored. Remove background music from the audio track before uploading.
Resolution: 720p or 1080p. Iterate at 720p, then switch to 1080p for final output.

Workflow examples

Rapid video prototyping with P-Video

Write a prompt describing the scene. Enable Draft mode and set resolution to 720p. Generate a preview in seconds. Review the composition and motion. Disable Draft and switch to 1080p for the final output. For audio-driven content, upload a music or speech track. P-Video matches video duration and rhythm to the audio automatically.

Advertising: character swap in a finished scene

Generate a five-second fitness ad with P-Video or Seedance 2 Fast. Extract the first frame. Use GPT Image 2 to turn the trainer into a superhero in the same pose and framing. Run P-Video Replace with that image as the reference. The gym environment and ambient audio stay intact at 1080p.

Game development: animated character from a still concept

Record a short walk-and-gesture reference performance. Upload a painted game character illustration. Run P-Video Animate with the illustration as the image and the recording as the motion source. The illustrated character performs the recorded movement while keeping the original art style.

Entertainment: knight replaced by a dark sorceress

Generate a cinematic clip of a knight walking through a castle hall. Use GPT Image 2 on the first frame to create a dark sorceress in the same pose. Run P-Video Replace with the sorceress image as the reference. The corridor, lighting, and camera motion remain unchanged.

Localization: same character, multiple languages

Design a brand mascot portrait. Write product scripts in English, Spanish, and Portuguese (Brazil). Run P-Video Avatar three times with the same portrait and a different built-in voice per language. Each output is a full lip-synced video of the same character in a different language.

Tips and limitations

P-Video: use Draft mode at 720p to validate composition and motion before a full 1080p run. Disable Prompt Upsampling only when precise control over the exact prompt wording is required. Upload a first and last frame together to bracket a specific motion arc.
P-Video Animate: use a full-body, front-facing character image on a clean background. Partial occlusions or extreme angles reduce motion transfer quality.
P-Video Avatar: Remove background music from custom audio before uploading. Keep the camera fixed, pans and zooms during lip sync increase synchronization errors.
P-Video Avatar supports one speaker per run. For dialogue between two characters, generate two separate clips and edit them together. The recommended maximum clip length is 3 minutes. Longer clips may show gradual consistency drift.
For Replace and Animate: P-Video Replace fails on input videos over 10 MB. Use P-Video Animate for larger source files. P-Video Replace works best with one clear focal character — crowd scenes reduce replacement consistency.
Chain the models for full pipeline control. Generate a base clip with a text-to-video model. Design a replacement character from the first frame. Run P-Video Replace, then add voiceover with P-Video Avatar. Each step is non-destructive — the original source clip stays unchanged.