Third-Party Image Models on Scenario

Available Third-Party Image Models on Scenario

Scenario currently offers a wide selection of Third-Party Models for Image Generation, spanning several leading model families such as Flux, Gemini, GPT Image, Imagen, Seedream, Dreamina, Luma Photon, Recraft, Ideogram, Minimax, Retro Diffusion, Qwen, and Stable Diffusion.

Each Base Model comes with its own strengths, parameters, and distinctive features. In the following sections, we’ll walk through the main capabilities of each Base Model available on Scenario. Experiment with prompts, compare outputs, and explore different styles—this will help you find the best fit for your creative vision.

If you're new to image generation, you may start with some of the most well-known and recognized models like Imagen 4 (Ultra) or Gemini 3 Pro. Based on this, you may chose to then experiment with other models.

1. Imagen Family (Google)

Google’s Imagen family is known for its high-fidelity output and smooth stylistic adaptability. Both Imagen 3 and Imagen 4 are available in Fast, Standard, and Ultra versions, offering creators a spectrum of speed and quality options. Imagen 4, the latest generation, supports resolutions up to 1408×768.

These models deliver rich visual detail with special attention to textures, materials, and photorealistic finishes. Fabric drape, skin tone gradients, lighting interactions, and even text placement are handled with notable accuracy. Imagen is particularly adept at balancing style flexibility, performing well across realism, abstraction, editorial, and illustrated looks, while preserving strong compositional awareness and color fidelity.

It is best suited for creative applications that demand nuanced visuals and structured layouts, such as product design mockups, concept art, technical scenes and editorial illustrations.

Imagen3:

2. Gemini Family (Google)

Gemini Image models are Google's most capable multimodal AI models, now integrated into Scenario for high-fidelity image generation. Known for their exceptional prompt adherence and logical reasoning, these models excel at interpreting complex scene descriptions and maintaining semantic accuracy.

They are particularly strong at handling dense instructions where element placement and specific details are crucial.

Gemini 3.0 Pro is the latest flagship model, offering state-of-the-art performance in visual quality and prompt understanding. It is the go-to choice for complex compositions requiring a high level of intelligence behind the generation.
Gemini 2.5 remains a powerful option, providing excellent quality and a strong balance between performance and creative interpretation.

3. FLUX Family (Black Forest Labs)

The FLUX models, developed by Black Forest Labs, represent the cutting edge of open-source image synthesis. They are renowned for photorealism, text rendering capabilities, and exceptional trainability.

Due to this open architecture, FLUX is highly adaptable for custom model creation. Currently, most LoRAs on Scenario are trained on FLUX 1 Dev (as well as FLUX Kontext), with training support for the FLUX 2 family coming soon. The new FLUX 2 generation brings significant improvements in detail and prompt compliance.

Scenario offers a comprehensive range of FLUX tiers:

FLUX 2 (Max) features Multi-Reference support (up to 10 images) for perfect character and style consistency. It offers extreme realism for complex materials and supports exact brand colors via hex code matching.
FLUX 2 (Pro) delivers the absolute highest fidelity and coherence, capable of handling the most demanding prompts with superior dynamic range and texture.
FLUX 2 (Flex) offers a versatile balance, maintaining high visual standards while being more adaptable for various creative iterations.
FLUX 2 (Dev) is an open-weight optimized version, preferred by users who want raw control and a specific "developer" aesthetic often used for further fine-tuning.
Legacy versions like FLUX 1.1 (Pro/Ultra) and FLUX 1 (Krea/Dev/Schnell) remain available for users who prefer the specific characteristics of the first generation.

4. GPT Family (OpenAI)

GPT-Image models (GPT-4o) are versatile tools for generation and editing, built for high prompt accuracy and spatial reasoning.

GPT Image 1.5 is the latest and most recommended powerhouse. This "instruction-first" model features a logic-first architecture for complex layouts and highly legible text. It is 4x faster than previous versions and excels at localized edits, allowing you to modify specific parts of an image while maintaining consistent lighting, composition, and identity.
GPT Image 1 operates in High Quality mode and delivers the best visual fidelity for the first generation, ideal for polished assets and detailed creative work.
GPT Image 1 (Medium) offers a balanced option, producing slightly less detail while significantly reducing generation time and compute costs.
GPT Image 1 (Low) is optimized for speed and rapid iteration, making it a great choice for exploring ideas quickly or generating draft content at scale.

5. Ideogram Family

The Ideogram family currently includes four versions: Turbo, Balanced, Quality, and Character. These variants are designed to strike different balances between speed and output fidelity. Turbo is ideal for fast iterations, Balanced offers a middle ground, and Quality emphasizes refined results with greater visual polish.

Ideogram supports a maximum image resolution of 1536×640, making it a solid choice for banner-like layouts or cinematic crops. One of its standout features is mask, which allows users to restrict edits to specific areas of the image—perfect for targeted refinements or compositional adjustments without disrupting the entire image.

In addition, Ideogram offers the Magic Prompt feature that intelligently enhances user input, helping generate more coherent and visually rich images without the need for verbose or overly descriptive prompts. This makes it particularly beginner-friendly and effective for iterative creative workflows.

Ideogram Character is the newest addition to the Ideogram lineup, built for single image consistency. With just one reference photo, ideally a well lit frontal or three quarter headshot, you can generate the same character across diverse scenes, poses, lighting, outfits, and styles while keeping them instantly recognizable. It excels at realism, producing natural and cohesive portraits or lifestyle shots.

6. Seedream Family (ByteDance)

Seedream models are powerful generators known for their artistic versatility and vivid color reproduction. They excel at creating visually striking images that balance realism with stylized aesthetics, often favored for concept art and creative exploration.

The family has expanded with more powerful iterations:

Seedream 4.5 is the newest and most advanced tier, pushing the boundaries of image coherence, lighting, and composition. It is designed for users seeking top-tier visual impact.
Seedream 4 provides a robust generation engine that serves as a solid standard for high-quality assets.
Seedream 3 remains available as a reliable option for specific stylistic preferences established in previous workflows.Dreamina 3.1, also referred to as Seedream 3.1, builds on the strengths of Seedream 3 while introducing an official five element prompt structure covering subject, description, style, context and narrative that ensures optimal high fidelity outputs. It excels at rendering water environments and reflections, earning recognition for its elevated aesthetic quality and richly detailed, versatile artistic styles.

Dreamina 3.1 is highly flexible and maintains strong adherence to prompts across a wide range of styles. Its structured prompt format ensures comprehensive scene coverage, offering predictable and professional results. This makes it ideal for landscapes, portraits, artistic creations, water scenes and any project requiring immersive narrative context and emotional depth.

Seedream 4.0 is the next evolution in ByteDance’s creative model family, enhancing both quality and versatility. It expands resolution and rendering fidelity while further refining multilingual text generation, making it even more dependable for complex prompts in English and Chinese. The model emphasizes cinematic composition and stylized visual richness, while introducing improved control for creators through structured prompt handling and adaptive rendering modes. With stronger handling of textures, realism, and narrative-driven imagery, Seedream 4.0 is positioned as a powerful tool for professional-grade storyboards, concept art, and emotionally immersive visual storytelling.

7. Recraft Family

The Recraft family focuses on stylistic diversity and vector quality, offering tools that cater to both illustrative and technical domains. Its latest model, Recraft V3, can generate images up to 1707×1024 and provides the option to choose within a variety of stylistic modes. Whether you're aiming for hand-drawn warmth, retro pixel art charm, or a handmade 3D aesthetic, Recraft V3 delivers with surprising fidelity.

More impressively, Recraft V3 SVG enables the generation of vector-based images in the SVG format. This is especially valuable for designers needing scalable assets with clean vectors and sharp detail, suitable for branding, or animation pipelines.

8. Luma Photon

Luma Photon is built with fidelity and reference-driven generation in mind. It supports up to 2048×1152 at 16:9 and up to 1536×1536 at 1:1 outputs, giving creators ample flexibility for different formats.

The model can leverage three types of reference inputs: Character, Style, and Composition, enabling a high degree of creative control. Its standard generation mode prioritizes realism and style accuracy, while Luma Photon Flash provides faster, more cost-efficient results with slightly reduced quality, ideal for bulk generation or rapid prototyping.

Luma Photon shines in producing clean results across minimalist 2D design, polished digital styles, and detailed 3D environments. It's equally proficient in generating character-centric art and richly layered backgrounds, making it an excellent fit for worldbuilding and visual storytelling.

9. Minimax Image 01

Minimax Image 01 is a text-to-image and image-to-image model recognized for its prompt accuracy, photorealistic detail, and visually balanced compositions. It supports a wide range of aspect ratios and produces images up to 1024×1024 pixels.

The model is especially strong in realism, delivering lifelike lighting, shadows, and textures for both characters and objects. Users can provide a reference image to generate the same character in different scenes and outfits with consistent appearance.

With the ability to generate up to 9 images per request, Minimax Image 01 offers an efficient, high-quality workflow and accessible pricing, making it an excellent choice for anyone looking for consistent, high-fidelity results from clear, well-defined prompts.

10. Qwen Image

Specialized models known for advanced text rendering and precise spatial manipulation.

Qwen Image: A powerful diffusion model that generates high-fidelity visuals (up to 2048px) with a unique mastery of text rendering. It produces exceptional image quality while ensuring industry-leading clarity for typography, signage, and logos in both English and Chinese.
Qwen Edit Multi-Angle: Also known as Camera Control, this model adjusts camera perspectives, reframes compositions, and shifts focus without needing extra source images. Ideal for product rendering and cinematic framing while preserving lighting and object integrity.

11. Retro Diffusion Family

Retro Diffusion models are specialized tools designed specifically for pixel art and game asset creation. Unlike general-purpose models, these are fine-tuned to respect pixel grids, limited color palettes, and retro aesthetics, making them indispensable for indie game developers and pixel artists.

There are three distinct variations tailored to specific workflow needs:

Retro Diffusion Tile is optimized for creating seamless textures, environmental tiles, and game maps (isometric or top-down), ensuring assets align perfectly on a grid.
Retro Diffusion Plus is the all-rounder for high-quality pixel illustrations, character portraits, and full scenes with a refined retro look.
Retro Diffusion Animation is specifically trained to generate sprite sheets or sequential frames, aiding in the creation of game character movements and effects.

12. REVE Family

Reve Create

A high-fidelity text-to-image model designed to generate original visuals with exceptional attention to detail, color, and style. It is optimized for creating polished, production-ready assets from scratch. Best for:

Concept Art: Generating high-quality original scenes and characters.
Marketing Assets: Creating visually striking images for promotional use.
Ideation: Rapidly visualizing new concepts with high coherence.

Reve Edit

An instruction-based model that allows you to modify existing images using natural language commands. It lets you edit specific regions or objects while preserving the original lighting and composition. Best for:

Text Instructions: Making changes with commands like "remove the shadows" or "add glasses".
Inpainting: Seamlessly adding or removing objects within a scene.
Precision: Editing details without altering the rest of the image.

Reve Remix

A creative fusion model specialized in transforming existing images. It allows you to reinterpret visuals by blending multiple references or applying new artistic styles while keeping the core structure. Best for:

Style Transfer: Applying a new art style (e.g., "watercolor") to an existing image.
Image Fusion: Merging elements from multiple reference images into one.
Experimentation: Exploring different aesthetic variations of a single concept.

13. Hunyuan Image

A powerful transformer-based model developed by Tencent, designed to rival top-tier models like Flux and Midjourney. It features a "native multimodal" architecture that excels at understanding long, complex prompts (1,000+ characters) and includes a "World Knowledge" engine to intelligently fill in missing details in scenes. Best for:

Text Rendering: Accurately displaying English and Chinese text within the image (posters, signs, UI).
Complex Prompts: Handling dense, multi-subject descriptions with high adherence.
Photorealism: Generating highly realistic lighting, textures, and compositions.

14. P-Image (Pruna AI)

Powered by Pruna AI’s compression technology, this model family focuses on efficiency and accessibility. P-Image models are "distilled" to be smaller and faster, significantly reducing the computational cost (credits) and generation time while maintaining impressive visual quality for standard assets. Best for:

Cost Efficiency: The most budget-friendly option for high-volume generation.
Consistency: Reliable output for style-consistent assets (icons, props, items).
Batch Workflows: Ideal when you need to generate hundreds of variations quickly.

15. Z-Image Turbo

High-speed models engineered for rapid iteration using "Turbo" distillation (8 steps or fewer) for near real-time generation.

Z-Image Turbo: The fastest option on the platform, perfect for instant brainstorming, rapid prototyping, and drafting initial sketches.
Z-Image Turbo ControlNet: Combines high-speed generation with ControlNet conditioning. It allows you to guide the output using reference images (Canny, Depth, Pose) for precise structural alignment and consistent composition while maintaining full prompt control.

16. Stable Diffusion Family (Stability AI)

SDXL is a open-source image generation model that debuted in 2023. Despite the emergence of newer proprietary models since its release, SDXL can generate detailed visuals.

As an open-source model, SDXL has been widely adopted and adapted, forming the backbone of many creative workflows in concept art, visual design, and prototyping.

Was this helpful?