Hi, how can we help you today?

Introduction to 3D Generation


What is Image-to-3D?

Image-to-3D is Scenario's feature that converts any 2D image into a 3D model (mesh and textures). Whether you're working with images generated in Scenario or uploaded from an external source, this tool transforms flat visuals into usable 3D geometry.

The generated models export in industry-standard GLB format and can integrate directly into Unity, Blender, or any 3D software in your pipeline. This makes Image-to-3D particularly valuable for game developers, concept artists, VFX professionals, and 3D prototyping workflows where rapid asset creation is essential.


How Image-to-3D Works

Image-to-3D uses specialized diffusion models trained on large datasets of 3D objects and their corresponding 2D representations. These models reconstruct three-dimensional geometry by analyzing the visual information in your input image, including depth cues, lighting, shadows, and object boundaries.

The process involves two main stages:

  1. Geometry Generation: The model creates the base 3D mesh structure based on the shape and form visible in your image

  2. Texture Generation: The system generates and applies texture maps that match the visual appearance of the original image

Unlike traditional photogrammetry that requires multiple camera angles, Image-to-3D can work with a single image, though some models support multi-view inputs for enhanced accuracy.


Available 3D Models

Scenario provides access to a range of specialized 3D generation models, each optimized for different use cases and quality requirements. As of June 2025, the following models are available:

Hunyuan3D (Fast, 2.0 and 2.1)

Developed by Tencent, Hunyuan3D uses a two-stage generation pipeline: it first creates a bare mesh using Hunyuan3D-DiT (a flow-based diffusion model), then synthesizes high-resolution texture maps. This model is great for generating detailed geometry with vivid textures and supports both single-image and multi-view inputs. Its settings allow for extensive customization through parameters.

Hunyuan3D 2.1, in particular, produces good quality models with PBR texture maps and is the most recommended option for general use.


Trellis

Built on Microsoft's Structured LATent (SLAT) architecture, Trellis combines both structural and texture information in its latent representation. This approach enables more accurate shape reconstruction and better texture coherence across the 3D surface.

Trellis is especially effective when generating models from multiple images and is well suited for creating assets with less realistic, more stylized visual styles.


Hunyuan Multi-View (MV)

An enhanced version of Hunyuan3D 2.0, Hunyuan Multi-View accepts multiple input images from different angles of the same subject or object. This multi-view approach significantly improves reconstruction accuracy by giving the model a more complete understanding of the object's 3D structure.

It is possible to generate high-quality models using multiple images with the Hunyuan 2.0 Multi-View model.


Direct3D-S2

Developed by NJU-3DV, Direct3D-S2 is a scalable 3D generation framework based on sparse volumes that utilizes Spatial Sparse Attention (SSA) for efficient high-resolution generation. This model can generate detailed 3D models at 1024³ resolution using significantly fewer computational resources than traditional volumetric approaches


Understanding Generation Parameters

Understanding these settings is essential for optimizing your results when generating 3D models. Each parameter directly impacts the final quality and fidelity of your model. "Steps" increases the overall quality, especially improving the textures. "Guidance" ensures that the shape and texture of the model closely match your reference image. "Num. Faces" controls the number of polygons, making the model's shapes more detailed and accurate. Mastering these parameters allows you to achieve the best possible outcome for your specific project.

Input Images

You can provide one to four input images for 3D generation, depending on the selected model. For optimal results, use high-resolution images with no background. Images that resemble 3D renders or have dimensional qualities produce significantly better reconstruction results than flat, illustrative artwork. The model interprets depth cues, lighting, and form more effectively when the source material already suggests three-dimensional structure.


Step Count

Controls the number of denoising iterations the diffusion model performs during generation. Each step refines the 3D output by gradually reducing noise and improving detail quality.

  • Lower values (10-20 steps): Faster generation with basic detail level

  • Medium values (30-50 steps): Balanced quality and speed for most use cases

  • Higher values (50+ steps): Maximum detail quality with longer processing times

Recommendation: Start with 30 steps for most generation and test incrementaly. Increase only when fine detail is key, as diminishing returns occur beyond 50 steps.


Face Count

Determines the polygon density of the generated mesh. This directly affects both the geometric detail level and file size of your 3D model.

  • Low (1K-10K faces): Suitable for prototyping, mobile games, VR, or distant background objects

  • Medium (10K-40K faces): Ideal for most game assets and real-time applications

  • High (40K+ faces): Best for hero character, close-up viewing, or high-fidelity renders

Important: Higher face counts create more detailed geometry but result in larger files and increased rendering costs. Consider your target platform's performance requirements when selecting face count.


Guidance

Controls how strictly the model adheres to the input image versus allowing creative interpretation to fill missing information.

  • Low guidance (1.0-3.0): Model has more freedom to interpret and complete unseen areas, potentially creating more plausible 3D geometry but may deviate from the original image

  • Medium guidance (3.0-7.5): Balanced approach maintaining image fidelity while allowing reasonable 3D interpretation

  • High guidance (7.5-10): Strict adherence to input image, which may preserve details but can introduce artifacts in areas where 3D interpretation is ambiguous

Recommendation: Use medium guidance (5.0-7.5) for most cases. Increase only when preserving specific visual details is critical.


Step-by-Step Generation Process

Step 1: Access Generate 3D page

You can launch 3D Generation in different ways:

  • From existing images: Open any image in your Scenario gallery, click the three-dot menu, and select "Convert to 3D"

  • From main menu: Navigate to "3D" in the main “Create” menu to open the interface, and start uploading new images


Step 2: Select Your Generative Model

The interface loads with a default AI model, pre-selected. Click the model name in the top-left corner to browse available options. Consider your specific needs:

  • Choose Hunyuan3D 2.1 or Direct 3D-S2 for high-quality, detailed outputs

  • Select Trellis or Hunyuan Multi-View if you have multiple angles of your object


Step 3: Configure Input Images

For single-view models, your selected image appears in the input area. For multi-view models like Hunyuan Multi-View, you'll see options to add additional images on the left side of the interface.

When using multi-view:

  • Ensure all images show the same object with the same proportion

  • Include different angles (front, left side, right side, and back)

  • Maintain consistent lighting across images

  • Keep the images in the correct orientation


Step 4: Adjust Generation Settings

Configure the parameters based on your requirements:

  • Step Count: Start with 30 for balanced results, test and adjust as needed

  • Face Count: Choose based on your intended use (20k-60k for most applications)

  • Guidance: Begin with 5 for optimal balance before testing incrementaly


Step 5: Generate Your 3D Model

Click "Generate" to begin processing. Generation time varies based on selected model complexity, chosen step count, face count settings and current server load (especially for initializing the model if it’s “cold”)


Step 6: Review and Inspect

Once generation completes, a 3D preview loads directly in the Scenario interface. Use the built-in viewer to inspect the model orbit, zoom, pan or toogle wireframe / contrast & lighting.


Step 7: Compare and Iterate

Generate multiple versions using different models or settings to compare results. This helps you identify the best approach for your specific image and use case. Pay attention to geometry accuracy and completenes, texture quality, topology suitability for your intended use.


Step 8: Download and Export

When satisfied with results, download your 3D model in the appropriate format:

  • GLB: Recommended for most applications, includes geometry and textures in a single file

  • OBJ (soon): Traditional format with separate texture files, widely supported across 3D software


Best Practices for Optimal Results

Remove Backgrounds

Background elements can confuse the 3D reconstruction process, leading to unwanted geometry or texture artifacts. Clean, isolated subjects produce significantly better results than images with complex backgrounds. Even when your image appears to have a simple background, removing it entirely helps the model focus on the primary object.

Implementation: Use Scenario's built-in background removal tool directly from the 3D generation interface, or prepare your images beforehand using Scenario's Remove Background feature.


Upscale Input Images

Increasing your input image resolution to 2x or 4x the original size often dramatically improves texture quality in the final 3D model.Higher resolution inputs provide more texture detail for the model to work with during the texture synthesis stage. This is particularly important because 3D models need to maintain visual quality when viewed from multiple angles and distances.

Recommendation: Use Scenario's Enhance tool before converting to 3D, especially for images smaller than 1024x1024 pixels.

Optimize Image Characteristics

Certain image qualities consistently produce better 3D reconstruction results:

  • Clean, stylized artwork: Images that already resemble 3D or cel-shaded artwork convert more successfully than rough sketches or ultra-detailed realistic photos

  • Clear object boundaries: Well-defined edges help the model distinguish between the object and background

  • Consistent lighting: Avoid images with extreme shadows or multiple light sources that might confuse depth perception

  • Single, prominent subject: Images focusing on one main object work better than complex scenes with multiple elements


Understanding Output Limitations

Topology Considerations

AI-generated 3D models typically require retopology for production use in animation or game development. The generated topology prioritizes visual accuracy over optimal edge flow for deformation.

Current AI generation tools creates visually accurate models but don't (yet) produce the clean, quad-based topology that professional animators require. Plan for retopology workflows if your models need rigging or animation.


Texture Mapping

Generated models include UV mapping, but the layout may not follow traditional texturing conventions. For projects requiring custom texture work, you may need to re-UV map the model.


File Size Management

Higher face counts create more detailed models but significantly increase file sizes. Consider your target platform's constraints:

  • Mobile/VR: Keep face counts under 5K for optimal performance

  • Desktop games: 10K-20K faces work well for most assets

  • Rendering/visualization: Higher face counts acceptable for non-real-time use


Integration with Scenario Workflows

Custom Model Integration

Image-to-3D works seamlessly with Scenario's custom-trained models. Generate images using your trained style or character models, then convert them to 3D to maintain visual consistency across your asset pipeline.

Workflow example: Train a style model for your game's art direction → Generate character or prop images → Convert to 3D models → Export for use in your 3D software


Specialized Starting Models

Scenario provides several image generation models optimized for 3D conversion:

  • 3D Blocky Elements: Creates images with clear geometric forms ideal for 3D reconstruction

  • Toy Box: Generates toy-like objects with simple, clean shapes

  • Neo3D Realism: Produces realistic objects with good depth cues for 3D conversion

  • and many more, including foundation models like Flux, GPT Image, and more


Asset Organization

Generated 3D models integrate with Scenario's content management system. Use Collections to organize your 3D assets alongside their source images, and apply Tags for easy retrieval in larger projects.


Quality Expectations

Image-to-3D is great for creating visually convincing 3D models for concept work, prototyping, and assets viewed from limited angles. For hero assets requiring close inspection or animation, consider the generated model as a starting point for further refinement.


Future Developments

Image-to-3D capabilities continue evolving rapidly. Upcoming improvements include enhanced mesh quality, better texture resolution support, and expanded model options. Check Scenario's product updates and Knowledge Base for the latest features and best practices.


Was this helpful?