Image Generation: Base Models

Scenario typically classifies image generation models into two main categories:

A. Base Models (aka Foundation Models)

Base Models are large, general-purpose AI models developed by companies such as OpenAI, Google, Bytedance, Black Forest Labs, Stability AI, and others. These models are designed for broad versatility, capable of interpreting both short and long prompts across a wide range of visual styles.

They’ve been trained on millions of images and require substantial time, computational resources, and data to develop. As generalist models, they are not tailored to any specific aesthetic or subject, but instead aim for broad applicability.

B. Custom Models (aka Finetunes / LoRA Models)

Custom Models are specialized models fine-tuned on a smaller dataset—typically between 5 and 50 images—focused on a specific subject, style, or visual identity. Training these models takes anywhere from 15 minutes to a few hours.

While they depend on Base Models to function, Custom Models produce outputs that reflect the unique characteristics of their training data. They are ideal for creators who want highly specific and consistent results.

Scenario offers two types of Custom Models:

Platform Models: Pre-trained, “public” models provided by Scenario for all users. These serve as examples and sources of inspiration. They can be mixed with your own models via “Compose”
Your Models: Private models trained by users within their own workspace.

This article focuses on the Base Models available in Scenario for image generation. From early models like Stable Diffusion 1.5 to cutting-edge systems such as Imagen 4 or GPT-1, these models are evolving rapidly—offering greater control, richer aesthetic diversity, and improved output quality. With several distinct model families, each optimized for different strengths, creators can adapt their workflows to suit a wide range of use cases—from product design to storytelling, animation, and more.

Available Base Models on Scenario

Scenario currently offers a wide selection of Base Models, spanning several leading model families such as Flux, GPT Image, Stable Diffusion, Imagen, Seedream, Luma Photon, Recraft, Ideogram, Minimax Image 01, and Nvidia Sana.

Each Base Model comes with its own strengths, parameters, and distinctive features. In the following sections, we’ll walk through the main capabilities of each Base Model available on Scenario. Experiment with prompts, compare outputs, and explore different styles—this will help you find the best fit for your creative vision.

Flux Family (Black Forest Labs)

The FLUX family of image generation models offers a wide spectrum of capabilities designed to meet the needs of creators across professional, commercial, and experimental domains. This versatility makes FLUX a compelling choice for users ranging from individual artists to large-scale production teams.

At the top of the series is FLUX1.1 [Pro Ultra], the most advanced and refined model in the lineup. It delivers exceptional image fidelity, rich artistic rendering, and precise compositional control—ideal for high-end creative workflows that demand visual sophistication and consistency. Whether used for cinematic preproduction, editorial content, or premium marketing assets, Pro Ultra is designed to deliver top-tier results.

Just beneath it, FLUX1.1 [Pro] emphasizes speed without compromising quality. It boasts six times faster generation speeds than earlier versions while supporting ultra-high-resolution outputs up to 2K. This combination of performance and resolution makes it particularly suitable for fast-paced commercial use cases such as advertising, digital design, and entertainment.

The original FLUX.1 [Pro] continues to serve as a strong choice for professionals who require precise prompt interpretation and richly detailed visuals. Its reliability in handling complex creative inputs makes it a valuable tool for enterprise applications where accuracy, control, and consistency are key.

FLUX.1 [Dev] strikes a practical balance between image quality and flexible performance. As the primary model used for fine-tuning, it’s a popular choice for custom workflows, domain-specific adaptation, and iterative testing. It supports both exploration and precision, making it a favorite among technical users and creative technologists.

Rounding out the family is FLUX.1 [Schnell], a lightweight model engineered for speed and accessibility. It’s optimized for rapid generation and real-time responsiveness, making it a perfect match for personal projects, early-stage ideation, and local development environments. While it prioritizes efficiency, it still produces respectable visual quality for fast prototyping or creative sketching.

Imagen Family (Google)

Google’s Imagen family is known for its high-fidelity output and smooth stylistic adaptability. Both Imagen 3 and Imagen 4 are available in Fast, Standard, and Ultra versions, offering creators a spectrum of speed and quality options. Imagen 4, the latest generation, supports resolutions up to 1408×768.

These models deliver rich visual detail with special attention to textures, materials, and photorealistic finishes. Fabric drape, skin tone gradients, lighting interactions, and even text placement are handled with notable accuracy. Imagen is particularly adept at balancing style flexibility, performing well across realism, abstraction, editorial, and illustrated looks, while preserving strong compositional awareness and color fidelity.

It is best suited for creative applications that demand nuanced visuals and structured layouts, such as product design mockups, concept art, technical scenes and editorial illustrations.

Imagen3:

GPT Family (OpenAI)

GPT-Image models, also known as GPT-4o images, are versatile AI tools available on Scenario for both image generation and editing. Built with a strong emphasis on prompt accuracy and spatial reasoning, these models are capable of interpreting complex instructions.

What makes these models particularly powerful is their dual functionality. You can use them to generate entirely new images from scratch or to edit existing visuals using natural language instructions.

There are three performance tiers available, each tailored to a different balance of quality and speed. GPT Image 1 operates in High Quality mode and delivers the best visual fidelity, ideal for polished assets and detailed creative work. GPT Image 1 (Medium) offers a more balanced option, producing slightly less detail while significantly reducing generation time and compute costs, well-suited for day-to-day creative tasks. GPT Image 1 (Low) is optimized for speed and rapid iteration, making it a great choice for exploring ideas quickly or generating draft content at scale.

Ideogram Family

The Ideogram family currently includes three versions: Turbo, Balanced, and Quality. These variants are designed to strike different balances between speed and output fidelity. Turbo is ideal for fast iterations, while Quality emphasizes refined results with greater visual polish.

Ideogram supports a maximum image resolution of 1536×640, making it a solid choice for banner-like layouts or cinematic crops. One of its standout features is mask, which allows users to restrict edits to specific areas of the image, perfect for targeted refinements or compositional adjustments without disrupting the entire image.

In addition, Ideogram offers the Magic Prompt feature that intelligently enhances user input, helping generate more coherent and visually rich images without the need for verbose or overly descriptive prompts. This makes it particularly beginner-friendly and effective for iterative creative workflows.

Recraft Family

The Recraft family focuses on stylistic diversity and vector quality, offering tools that cater to both illustrative and technical domains. Its latest model, Recraft V3, can generate images up to 1707×1024 and provides the option to choose within a variety of stylistic modes. Whether you're aiming for hand-drawn warmth, retro pixel art charm, or a handmade 3D aesthetic, Recraft V3 delivers with surprising fidelity.

More impressively, Recraft V3 SVG enables the generation of vector-based images in the SVG format. This is especially valuable for designers needing scalable assets with clean vectors and sharp detail, suitable for branding, or animation pipelines.

Seedream 3 (Bytedance)

Seedream 3 is a versatile model built for flexibility across quality, and text render fidelity. Users can generate images anywhere between 512px and 2048px, with three rendering modes Small, Regular, and Big, that let you fine-tune quality and compute usage.

What makes Seedream 3 particularly compelling is its reliable generating images with text in both English and Chinese, opening up visual storytelling. Its stylistic strengths lie in minimalistic illustration, digital art, and 3D style, with a notable knack for strong scene composition and film-like framing. This makes it a strong candidate for storyboard and narrative-driven content.

Luma Photon

Luma Photon is built with fidelity and reference-driven generation in mind. It supports up to 2048×1152 at 16:9 and up to 1536×1536 at 1:1 outputs, giving creators ample flexibility for different formats.

The model can leverage three types of reference inputs: Character, Style, and Composition, enabling a high degree of creative control. Its standard generation mode prioritizes realism and style accuracy, while Luma Photon Flash provides faster, more cost-efficient results with slightly reduced quality, ideal for bulk generation or rapid prototyping.

Luma Photon shines in producing clean results across minimalist 2D design, polished digital styles, and detailed 3D environments. It's equally proficient in generating character-centric art and richly layered backgrounds, making it an excellent fit for worldbuilding and visual storytelling.

Minimax Image 01

Minimax Image 01 is a text-to-image and image-to-image model recognized for its prompt accuracy, photorealistic detail, and visually balanced compositions. It supports a wide range of aspect ratios and produces images up to 1024×1024 pixels.

The model is especially strong in realism, delivering lifelike lighting, shadows, and textures for both characters and objects. Users can provide a reference image to generate the same character in different scenes and outfits with consistent appearance.

With the ability to generate up to 9 images per request, Minimax Image 01 offers an efficient, high-quality workflow and accessible pricing, making it an excellent choice for anyone looking for consistent, high-fidelity results from clear, well-defined prompts.

Stable Diffusion Family (Stability AI)

SDXL and SD 1.5 are open-source image generation model that debuted in 2023. Despite the emergence of newer proprietary models since its release, SDXL can generate detailed visuals.

As an open-source model, SDXL has been widely adopted and adapted, forming the backbone of many creative workflows in concept art, visual design, and prototyping.

Was this helpful?