Third-Party Image Models on Scenario

Last updated: April 23, 2026

asset_oSsLaFodkKHCbsJzJfumiXug_A high-fidelity, clean, and modern banner. Overhead shot of a futuristic digital studio desk with bright, soft lighting. Abstract, glowing representations of diverse AI image generati.png

Available Third-Party Image Models on Scenario

Scenario offers a wide selection of third-party models for image generation, spanning several leading model families: Flux, Gemini, GPT Image, Imagen, Seedream, Dreamina, Luma Photon, Recraft, Ideogram, Minimax, Retro Diffusion, Qwen, Hunyuan, Wan, Phota, ERNIE Image, Z-Image, and P-Image.

Each model comes with its own strengths, parameters, and distinctive features. In the following sections, we'll walk through the main capabilities of each model family available on Scenario. Experiment with prompts, compare outputs, and explore different styles. This will help you find the best fit for your creative vision.

If you're new to image generation, a good starting point is Imagen 4 (Ultra) for photorealistic and general-purpose generation, or Gemini 3.1 Pro for complex, detail-rich prompts. From there, you can explore other families based on your specific needs: pixel art with Retro Diffusion, vector graphics with Recraft V4, or precise image editing with GPT Image 2.


1. Imagen Family (Google)

Google’s Imagen family is known for its high-fidelity output and smooth stylistic adaptability. Both Imagen 3 and Imagen 4 are available in FastStandard, and Ultra versions, offering creators a spectrum of speed and quality options. Imagen 4, the latest generation, supports resolutions up to 1408×768.

These models deliver rich visual detail with special attention to textures, materials, and photorealistic finishes. Fabric drape, skin tone gradients, lighting interactions, and even text placement are handled with notable accuracy. Imagen is particularly adept at balancing style flexibility, performing well across realism, abstraction, editorial, and illustrated looks, while preserving strong compositional awareness and color fidelity.

It is best suited for creative applications that demand nuanced visuals and structured layouts, such as product design mockups, concept art, technical scenes and editorial illustrations.


2. Gemini Family (Google)

Gemini Image models are Google's most capable multimodal AI models, integrated into Scenario for high-fidelity image generation. Known for their exceptional prompt adherence and logical reasoning, these models excel at interpreting complex scene descriptions and maintaining semantic accuracy across a wide range of subjects and styles.

They are particularly strong at handling dense instructions where element placement and specific details matter most.

  • Gemini 3.1 is the current flagship model, offering state-of-the-art performance in visual quality and prompt understanding. It is the recommended choice for complex compositions requiring a high level of intelligence and precision behind the generation.

  • Gemini 2.5 remains a solid option for teams already integrated with the previous generation, providing strong quality and a good balance between performance and creative interpretation.


4. GPT Family (OpenAI)

GPT Image models from OpenAI are versatile tools for generation and editing, built for high prompt accuracy, spatial reasoning, and precise image understanding. They support both text-to-image generation and reference-based editing workflows.

  • GPT Image 2 is the latest and most capable model in the family. It supports up to 10 reference images per generation, inpainting via alpha masks for region-specific edits, and output resolutions up to 3840x3840px. A built-in reasoning pass before each generation improves adherence to complex prompts, and it delivers particularly strong results with text inside images and multi-object compositions. Quality presets (low, medium, high, auto) let you balance speed and cost depending on your workflow.

  • GPT Image 1.5 is an instruction-first model featuring a logic-first architecture for complex layouts and highly legible text rendering. It is 4x faster than GPT Image 1 and excels at localized edits, allowing you to modify specific areas of an image while maintaining consistent lighting, composition, and identity.

  • GPT Image 1 is available in three quality tiers. High Quality delivers the best visual fidelity and is ideal for polished, production-ready assets. Medium offers a balanced option with slightly less detail but faster generation and lower compute cost. Low is optimized for speed and rapid iteration, making it a good fit for drafts and early concept exploration.


5. Ideogram Family

  • Ideogram V3 is a text-focused image generation model that excels at producing visually rich outputs with legible, well-composed typography. It is available in four rendering modes that let you balance speed against output quality: Flash for the fastest results, Turbo for rapid iteration, Balanced for a middle ground, and Quality for the most refined and polished output.

    Ideogram supports multiple aspect ratios from 1:3 to 3:1, giving you flexibility for portrait, landscape, and banner-style compositions. The Magic Prompt feature intelligently enhances your input before generation, helping produce more coherent and visually rich images without requiring verbose or highly technical prompts. This makes the model particularly beginner-friendly and effective for iterative creative workflows.

    For editing, Ideogram includes a mask tool that lets you restrict changes to specific areas of an image, making it easy to refine details or adjust composition without affecting the rest of the output.

  • Ideogram V3 Character is built for single-image character consistency. With one reference photo, ideally a well-lit frontal or three-quarter headshot, you can generate the same character across different scenes, poses, lighting conditions, outfits, and styles while keeping them instantly recognizable. It is particularly strong at photorealistic portraits and lifestyle imagery.

  • Ideogram V3 Generate Transparent creates images with a native alpha channel, making it ideal for logos, icons, stickers, and UI overlays that need to be placed on different backgrounds without additional editing.

  • Ideogram V3 Layerize Text removes text from existing images and returns a clean background, making it easy to replace or update copy in designs without rebuilding the entire asset.


6. Seedream Family (ByteDance)

Seedream models are powerful generators known for their artistic versatility and vivid color reproduction. They excel at creating visually striking images that balance realism with stylized aesthetics, often favored for concept art and creative exploration.

The family has expanded with more powerful iterations:

  • Seedream 4.5 is the newest and most advanced tier, pushing the boundaries of image coherence, lighting, and composition. It is designed for users seeking top-tier visual impact.

  • Seedream 4 provides a robust generation engine that serves as a solid standard for high-quality assets.

  • Seedream 3 remains available as a reliable option for specific stylistic preferences established in previous workflows.Dreamina 3.1, also referred to as Seedream 3.1, builds on the strengths of Seedream 3 while introducing an official five element prompt structure covering subject, description, style, context and narrative that ensures optimal high fidelity outputs. It excels at rendering water environments and reflections, earning recognition for its elevated aesthetic quality and richly detailed, versatile artistic styles.

  • Dreamina 3.1 is highly flexible and maintains strong adherence to prompts across a wide range of styles. Its structured prompt format ensures comprehensive scene coverage, offering predictable and professional results. This makes it ideal for landscapes, portraits, artistic creations, water scenes and any project requiring immersive narrative context and emotional depth.


7. Recraft Family

The Recraft family focuses on stylistic diversity, design intelligence, and vector quality, offering tools that cater to both illustrative and technical creative workflows. The current suite is Recraft V4, which brings significant improvements in compositional accuracy, color theory, and anatomical precision over previous versions.

  • Recraft V4 is the standard raster model, optimized for speed and web-ready output at up to 1024x1024. It interprets plain-language descriptions directly, without requiring complex syntax or technical flags, and applies built-in design judgment to produce clean, editorial-quality results from short prompts.

  • Recraft V4 Pro is the high-fidelity raster option, reaching up to 2048x2048. It is engineered for maximum detail and anatomical accuracy, making it the right choice for print, professional display, and character-heavy compositions. Both V4 and V4 Pro support a wide range of aesthetic styles including Glow, Plasticine, and Vector Art, among others.

  • Recraft V4 SVG and Recraft V4 Pro SVG generate native vector graphics in SVG format, producing real paths and layers rather than a rasterized approximation of vector art. This makes them especially valuable for designers who need scalable assets compatible with Figma, Adobe Illustrator, or animation pipelines. The Pro variant handles more complex illustrations and detailed branding work.

For typography, placing the desired text in quotation marks in the prompt gives the model the best accuracy. All four variants support a wide range of aspect ratios from vertical 1:2 to horizontal 2:1.


8. Luma Photon

Luma Photon is built with fidelity and reference-driven generation in mind. It supports up to 2048×1152 at 16:9 and up to 1536×1536 at 1:1 outputs, giving creators ample flexibility for different formats.

The model can leverage three types of reference inputsCharacterStyle, and Composition, enabling a high degree of creative control. Its standard generation mode prioritizes realism and style accuracy, while Luma Photon Flash provides faster, more cost-efficient results with slightly reduced quality, ideal for bulk generation or rapid prototyping.

Luma Photon shines in producing clean results across minimalist 2D design, polished digital styles, and detailed 3D environments. It's equally proficient in generating character-centric art and richly layered backgrounds, making it an excellent fit for worldbuilding and visual storytelling.


9. Minimax Image 01

Minimax Image 01 is a text-to-image and image-to-image model recognized for its prompt accuracy, photorealistic detail, and visually balanced compositions. It supports a wide range of aspect ratios and produces images up to 1024×1024 pixels.

The model is especially strong in realism, delivering lifelike lighting, shadows, and textures for both characters and objects. Users can provide a reference image to generate the same character in different scenes and outfits with consistent appearance.

With the ability to generate up to 9 images per request, Minimax Image 01 offers an efficient, high-quality workflow and accessible pricing, making it an excellent choice for anyone looking for consistent, high-fidelity results from clear, well-defined prompts.


10. Qwen Image

Specialized models known for advanced text rendering and precise spatial manipulation.

  • Qwen Image: A powerful diffusion model that generates high-fidelity visuals (up to 2048px) with a unique mastery of text rendering. It produces exceptional image quality while ensuring industry-leading clarity for typography, signage, and logos in both English and Chinese.

  • Qwen Edit Multi-Angle: Also known as Camera Control, this model adjusts camera perspectives, reframes compositions, and shifts focus without needing extra source images. Ideal for product rendering and cinematic framing while preserving lighting and object integrity.


11. Retro Diffusion Family

Retro Diffusion models are specialized tools designed specifically for pixel art and game asset creation. Unlike general-purpose models, these are fine-tuned to respect pixel grids, limited color palettes, and retro aesthetics, making them indispensable for indie game developers and pixel artists.

There are three distinct variations tailored to specific workflow needs:

  • Retro Diffusion Tile is optimized for creating seamless textures, environmental tiles, and game maps (isometric or top-down), ensuring assets align perfectly on a grid.

  • Retro Diffusion Plus is the all-rounder for high-quality pixel illustrations, character portraits, and full scenes with a refined retro look.

  • Retro Diffusion Animation is specifically trained to generate sprite sheets or sequential frames, aiding in the creation of game character movements and effects.


12. REVE REMIX

A creative fusion model specialized in transforming existing images. It allows you to reinterpret visuals by blending multiple references or applying new artistic styles while keeping the core structure. Best for:

  • Style Transfer: Applying a new art style (e.g., "watercolor") to an existing image.

  • Image Fusion: Merging elements from multiple reference images into one.

  • Experimentation: Exploring different aesthetic variations of a single concept.


13. Hunyuan Image

A powerful transformer-based model developed by Tencent, designed to rival top-tier models like Flux and Midjourney. It features a "native multimodal" architecture that excels at understanding long, complex prompts (1,000+ characters) and includes a "World Knowledge" engine to intelligently fill in missing details in scenes. Best for:

  • Text Rendering: Accurately displaying English and Chinese text within the image (posters, signs, UI).

  • Complex Prompts: Handling dense, multi-subject descriptions with high adherence.

  • Photorealism: Generating highly realistic lighting, textures, and compositions.


14. P-Image (Pruna AI)

Powered by Pruna AI’s compression technology, this model family focuses on efficiency and accessibility. P-Image models are "distilled" to be smaller and faster, significantly reducing the computational cost (credits) and generation time while maintaining impressive visual quality for standard assets. Best for:

  • Cost Efficiency: The most budget-friendly option for high-volume generation.

  • Consistency: Reliable output for style-consistent assets (icons, props, items).

  • Batch Workflows: Ideal when you need to generate hundreds of variations quickly.


15. Z-Image Turbo

High-speed models engineered for rapid iteration using "Turbo" distillation (8 steps or fewer) for near real-time generation.

  • Z-Image Turbo: The fastest option on the platform, perfect for instant brainstorming, rapid prototyping, and drafting initial sketches.

  • Z-Image Turbo ControlNet: Combines high-speed generation with ControlNet conditioning. It allows you to guide the output using reference images (Canny, Depth, Pose) for precise structural alignment and consistent composition while maintaining full prompt control.