What is Image-to-3D?
Image-to-3D is Scenario's feature that converts any 2D image into a 3D model (mesh and textures). Whether you're working with images generated in Scenario or uploaded from an external source, this tool transforms flat visuals into usable 3D geometry.

The generated models export in industry-standard GLB format and can integrate directly into Unity, Blender, or any 3D software in your pipeline. This makes Image-to-3D particularly valuable for game developers, concept artists, VFX professionals, and 3D prototyping workflows where rapid asset creation is essential.

How Image-to-3D Works
Image-to-3D uses specialized diffusion models trained on large datasets of 3D objects and their corresponding 2D representations. These models reconstruct three-dimensional geometry by analyzing the visual information in your input image, including depth cues, lighting, shadows, and object boundaries.
The process involves two main stages:
Geometry Generation: The model creates the base 3D mesh structure based on the shape and form visible in your image
Texture Generation: The system generates and applies texture maps that match the visual appearance of the original image
Unlike traditional photogrammetry that requires multiple camera angles, Image-to-3D can work with a single image, though some models support multi-view inputs for enhanced accuracy.
Available 3D Models
Scenario provides access to a range of specialized 3D generation models, each optimized for different use cases and quality requirements. As of June 2025, the following models are available:
Hunyuan3D (Fast, 2.0 and 2.1)
Developed by Tencent, Hunyuan3D uses a two-stage generation pipeline: it first creates a bare mesh using Hunyuan3D-DiT (a flow-based diffusion model), then synthesizes high-resolution texture maps. This model is great for generating detailed geometry with vivid textures and supports both single-image and multi-view inputs. Its settings allow for extensive customization through parameters.
Hunyuan3D 2.1, in particular, produces good quality models with PBR texture maps and is the most recommended option for general use.

Trellis
Built on Microsoft's Structured LATent (SLAT) architecture, Trellis combines both structural and texture information in its latent representation. This approach enables more accurate shape reconstruction and better texture coherence across the 3D surface.
Trellis is especially effective when generating models from multiple images and is well suited for creating assets with less realistic, more stylized visual styles.

Hunyuan Multi-View (MV)
An enhanced version of Hunyuan3D 2.0, Hunyuan Multi-View accepts multiple input images from different angles of the same subject or object. This multi-view approach significantly improves reconstruction accuracy by giving the model a more complete understanding of the object's 3D structure.
It is possible to generate high-quality models using multiple images with the Hunyuan 2.0 Multi-View model.

Direct3D-S2
Developed by NJU-3DV, Direct3D-S2 is a scalable 3D generation framework based on sparse volumes that utilizes Spatial Sparse Attention (SSA) for efficient high-resolution generation. This model can generate detailed 3D models at 1024³ resolution using significantly fewer computational resources than traditional volumetric approaches

Understanding Generation Parameters
Understanding these settings is essential for optimizing your results when generating 3D models. Each parameter directly impacts the final quality and fidelity of your model. "Steps" increases the overall quality, especially improving the textures. "Guidance" ensures that the shape and texture of the model closely match your reference image. "Num. Faces" controls the number of polygons, making the model's shapes more detailed and accurate. Mastering these parameters allows you to achieve the best possible outcome for your specific project.

Input Images
You can provide one to four input images for 3D generation, depending on the selected model. For optimal results, use high-resolution images with no background. Images that resemble 3D renders or have dimensional qualities produce significantly better reconstruction results than flat, illustrative artwork. The model interprets depth cues, lighting, and form more effectively when the source material already suggests three-dimensional structure.

Step Count
Controls the number of denoising iterations the diffusion model performs during generation. Each step refines the 3D output by gradually reducing noise and improving detail quality.
Lower values (10-20 steps): Faster generation with basic detail level
Medium values (30-50 steps): Balanced quality and speed for most use cases
Higher values (50+ steps): Maximum detail quality with longer processing times
Recommendation: Start with 30 steps for most generation and test incrementaly. Increase only when fine detail is key, as diminishing returns occur beyond 50 steps.

Face Count
Determines the polygon density of the generated mesh. This directly affects both the geometric detail level and file size of your 3D model.
Low (1K-10K faces): Suitable for prototyping, mobile games, VR, or distant background objects
Medium (10K-40K faces): Ideal for most game assets and real-time applications
High (40K+ faces): Best for hero character, close-up viewing, or high-fidelity renders
Important: Higher face counts create more detailed geometry but result in larger files and increased rendering costs. Consider your target platform's performance requirements when selecting face count.

Guidance
Controls how strictly the model adheres to the input image versus allowing creative interpretation to fill missing information.
Low guidance (1.0-3.0): Model has more freedom to interpret and complete unseen areas, potentially creating more plausible 3D geometry but may deviate from the original image
Medium guidance (3.0-7.5): Balanced approach maintaining image fidelity while allowing reasonable 3D interpretation
High guidance (7.5-10): Strict adherence to input image, which may preserve details but can introduce artifacts in areas where 3D interpretation is ambiguous
Recommendation: Use medium guidance (5.0-7.5) for most cases. Increase only when preserving specific visual details is critical.

Step-by-Step Generation Process
Step 1: Access Generate 3D page
You can launch 3D Generation in different ways:
From existing images: Open any image in your Scenario gallery, click the three-dot menu, and select "Convert to 3D"
From main menu: Navigate to "3D" in the main “Create” menu to open the interface, and start uploading new images

Step 2: Select Your Generative Model
The interface loads with a default AI model, pre-selected. Click the model name in the top-left corner to browse available options. Consider your specific needs:
Choose Hunyuan3D 2.1 or Direct 3D-S2 for high-quality, detailed outputs
Select Trellis or Hunyuan Multi-View if you have multiple angles of your object

Step 3: Configure Input Images
For single-view models, your selected image appears in the input area. For multi-view models like Hunyuan Multi-View, you'll see options to add additional images on the left side of the interface.

When using multi-view:
Ensure all images show the same object with the same proportion
Include different angles (front, left side, right side, and back)
Maintain consistent lighting across images
Keep the images in the correct orientation

Step 4: Adjust Generation Settings
Configure the parameters based on your requirements:
Step Count: Start with 30 for balanced results, test and adjust as needed
Face Count: Choose based on your intended use (20k-60k for most applications)
Guidance: Begin with 5 for optimal balance before testing incrementaly
Step 5: Generate Your 3D Model
Click "Generate" to begin processing. Generation time varies based on selected model complexity, chosen step count, face count settings and current server load (especially for initializing the model if it’s “cold”)
Step 6: Review and Inspect
Once generation completes, a 3D preview loads directly in the Scenario interface. Use the built-in viewer to inspect the model orbit, zoom, pan or toogle wireframe / contrast & lighting.

Step 7: Compare and Iterate
Generate multiple versions using different models or settings to compare results. This helps you identify the best approach for your specific image and use case. Pay attention to geometry accuracy and completenes, texture quality, topology suitability for your intended use.
Step 8: Download and Export
When satisfied with results, download your 3D model in the appropriate format:
GLB: Recommended for most applications, includes geometry and textures in a single file
OBJ (soon): Traditional format with separate texture files, widely supported across 3D software
Best Practices for Optimal Results
Remove Backgrounds
Background elements can confuse the 3D reconstruction process, leading to unwanted geometry or texture artifacts. Clean, isolated subjects produce significantly better results than images with complex backgrounds. Even when your image appears to have a simple background, removing it entirely helps the model focus on the primary object.
Implementation: Use Scenario's built-in background removal tool directly from the 3D generation interface, or prepare your images beforehand using Scenario's Remove Background feature.

Upscale Input Images
Increasing your input image resolution to 2x or 4x the original size often dramatically improves texture quality in the final 3D model.Higher resolution inputs provide more texture detail for the model to work with during the texture synthesis stage. This is particularly important because 3D models need to maintain visual quality when viewed from multiple angles and distances.

Optimize Image Characteristics
Certain image qualities consistently produce better 3D reconstruction results:
Clean, stylized artwork: Images that already resemble 3D or cel-shaded artwork convert more successfully than rough sketches or ultra-detailed realistic photos
Clear object boundaries: Well-defined edges help the model distinguish between the object and background
Consistent lighting: Avoid images with extreme shadows or multiple light sources that might confuse depth perception
Single, prominent subject: Images focusing on one main object work better than complex scenes with multiple elements

Understanding Output Limitations
Topology Considerations
AI-generated 3D models typically require retopology for production use in animation or game development. The generated topology prioritizes visual accuracy over optimal edge flow for deformation.
Current AI generation tools creates visually accurate models but don't (yet) produce the clean, quad-based topology that professional animators require. Plan for retopology workflows if your models need rigging or animation.
Texture Mapping
Generated models include UV mapping, but the layout may not follow traditional texturing conventions. For projects requiring custom texture work, you may need to re-UV map the model.
File Size Management
Higher face counts create more detailed models but significantly increase file sizes. Consider your target platform's constraints:
Mobile/VR: Keep face counts under 5K for optimal performance
Desktop games: 10K-20K faces work well for most assets
Rendering/visualization: Higher face counts acceptable for non-real-time use
Integration with Scenario Workflows
Custom Model Integration
Image-to-3D works seamlessly with Scenario's custom-trained models. Generate images using your trained style or character models, then convert them to 3D to maintain visual consistency across your asset pipeline.
Workflow example: Train a style model for your game's art direction → Generate character or prop images → Convert to 3D models → Export for use in your 3D software

Specialized Starting Models
Scenario provides several image generation models optimized for 3D conversion:
3D Blocky Elements: Creates images with clear geometric forms ideal for 3D reconstruction
Toy Box: Generates toy-like objects with simple, clean shapes
Neo3D Realism: Produces realistic objects with good depth cues for 3D conversion
and many more, including foundation models like Flux, GPT Image, and more

Asset Organization
Generated 3D models integrate with Scenario's content management system. Use Collections to organize your 3D assets alongside their source images, and apply Tags for easy retrieval in larger projects.
Quality Expectations
Image-to-3D is great for creating visually convincing 3D models for concept work, prototyping, and assets viewed from limited angles. For hero assets requiring close inspection or animation, consider the generated model as a starting point for further refinement.
Future Developments
Image-to-3D capabilities continue evolving rapidly. Upcoming improvements include enhanced mesh quality, better texture resolution support, and expanded model options. Check Scenario's product updates and Knowledge Base for the latest features and best practices.
Was this helpful?