Introduction to 3D Generation

What is Image-to-3D?

Image-to-3D is Scenario's feature that converts any 2D image into a 3D model (mesh and textures). Whether you're working with images generated in Scenario or uploaded from an external source, this tool transforms flat visuals into usable 3D geometry.

The generated models export in industry-standard GLB format and can integrate directly into Unity, Blender, or any 3D software in your pipeline. This makes Image-to-3D particularly valuable for game developers, concept artists, VFX professionals, and 3D prototyping workflows where rapid asset creation is essential.

How Image-to-3D Works

Image-to-3D uses specialized diffusion models trained on large datasets of 3D objects and their corresponding 2D representations. These models reconstruct three-dimensional geometry by analyzing the visual information in your input image, including depth cues, lighting, shadows, and object boundaries.

The process involves two main stages:

Geometry Generation: The model creates the base 3D mesh structure based on the shape and form visible in your image
Texture Generation: The system generates and applies texture maps that match the visual appearance of the original image

Unlike traditional photogrammetry that requires multiple camera angles, Image-to-3D can work with a single image, though some models support multi-view inputs for enhanced accuracy.

Available 3D Models

Scenario provides access to a range of specialized 3D generation models, each optimized for different use cases and quality requirements. As of June 2025, the following models are available:

Hunyuan3D (Fast, 2.0, 2.1 and Multi-View)

Developed by Tencent, Hunyuan3D uses a two-stage generation pipeline: it first creates a bare mesh using Hunyuan3D-DiT (a flow-based diffusion model), then synthesizes high-resolution texture maps. This model is great for generating detailed geometry with vivid textures and supports both single-image and multi-view inputs. Its settings allow for extensive customization through parameters.

Hunyuan3D 2.1, in particular, produces good quality models with PBR texture maps and is the most recommended option for general use.

An enhanced version of Hunyuan3D 2.0, Hunyuan Multi-View accepts multiple input images from different angles of the same subject or object. This multi-view approach significantly improves reconstruction accuracy by giving the model a more complete understanding of the object's 3D structure.

It is possible to generate high-quality models using multiple images with the Hunyuan 2.0 Multi-View model.

Rodin Hyper3D

Rodin is Scenario’s 3D model generation suite designed for fast, flexible, and high-quality asset creation. It supports both Image-to-3D and Text-to-3D workflows, allowing users to generate 3D models from images or prompts. Rodin offers different generation modes—Sketch, Regular, Detail, and Smooth—each tailored for specific levels of detail, poly count, and texture resolution. It’s especially well-suited for game-ready assets, character modeling, or rapid prototyping.

Tripo 2.5

Tripo is a next-generation 3D generation engin, known for producing high-fidelity models with photorealistic detail. Built on TripoSR (a state-of-the-art AI architecture), it will reconstructing realistic geometry and textures from one or more image. Tripo is ideal for users seeking high-resolution outputs for visualization, product mockups, and VFX pipelines. It supports both artistic and real-world object inputs.

PartCrafter

PartCrafter is the first open-source, image-to-3D generative model that transforms a single RGB image into 2–16 separate 3D meshes, semantically meaningful, all in one step. It can produces explicit meshes suitable for further editing, animation, or 3D printing—no segmentation or manual intervention required.

Unlike existing “single-block” AI mesh generators, PartCrafter separates your input object into defined components it can recognize (such as arms, wheels, panels, etc). These parts are cleanly segmented, each with its own geometry.

PartCrafter empowers 3D creators to generate modular, editable 3D assets directly from images, streamlining workflows for game development, animation, and design.

Trellis

Built on Microsoft's Structured LATent (SLAT) architecture, Trellis combines both structural and texture information in its latent representation. This approach enables more accurate shape reconstruction and better texture coherence across the 3D surface.

Trellis is especially effective when generating models from multiple images and is well suited for creating assets with less realistic, more stylized visual styles.

Direct3D-S2

Developed by NJU-3DV, Direct3D-S2 is a scalable 3D generation framework based on sparse volumes that utilizes Spatial Sparse Attention (SSA) for efficient high-resolution generation. This model can generate detailed 3D models at 1024³ resolution using significantly fewer computational resources than traditional volumetric approaches

Understanding Generation Parameters

Understanding these settings is essential for optimizing your results when generating 3D models. Each parameter directly impacts the final quality and fidelity of your model. "Steps" increases the overall quality, especially improving the textures. "Guidance" ensures that the shape and texture of the model closely match your reference image. "Num. Faces" controls the number of polygons, making the model's shapes more detailed and accurate. Mastering these parameters allows you to achieve the best possible outcome for your specific project.

Input Images

You can provide one to four input images for 3D generation, depending on the selected model. For optimal results, use high-resolution images with no background. Images that resemble 3D renders or have dimensional qualities produce significantly better reconstruction results than flat, illustrative artwork. The model interprets depth cues, lighting, and form more effectively when the source material already suggests three-dimensional structure.

Step Count

Controls the number of denoising iterations the diffusion model performs during generation. Each step refines the 3D output by gradually reducing noise and improving detail quality.

Lower values (10-20 steps): Faster generation with basic detail level
Medium values (30-50 steps): Balanced quality and speed for most use cases
Higher values (50+ steps): Maximum detail quality with longer processing times

Recommendation: Start with 30 steps for most generation and test incrementaly. Increase only when fine detail is key, as diminishing returns occur beyond 50 steps.

Face Count

Determines the polygon density of the generated mesh. This directly affects both the geometric detail level and file size of your 3D model.

Low (1K-10K faces): Suitable for prototyping, mobile games, VR, or distant background objects
Medium (10K-40K faces): Ideal for most game assets and real-time applications
High (40K+ faces): Best for hero character, close-up viewing, or high-fidelity renders

Important: Higher face counts create more detailed geometry but result in larger files and increased rendering costs. Consider your target platform's performance requirements when selecting face count.

Guidance

Controls how strictly the model adheres to the input image versus allowing creative interpretation to fill missing information.

Low guidance (1.0-3.0): Model has more freedom to interpret and complete unseen areas, potentially creating more plausible 3D geometry but may deviate from the original image
Medium guidance (3.0-7.5): Balanced approach maintaining image fidelity while allowing reasonable 3D interpretation
High guidance (7.5-10): Strict adherence to input image, which may preserve details but can introduce artifacts in areas where 3D interpretation is ambiguous

Recommendation: Use medium guidance (5.0-7.5) for most cases. Increase only when preserving specific visual details is critical.

Step-by-Step Generation Process

Step 1: Access Generate 3D page

You can launch 3D Generation in different ways:

From existing images: Open any image in your Scenario gallery, click the three-dot menu, and select "Convert to 3D"
From main menu: Navigate to "3D" in the main “Create” menu to open the interface, and start uploading new images

Step 2: Select Your Generative Model

The interface loads with a default AI model, pre-selected. Click the model name in the top-left corner to browse available options. Consider your specific needs:

Choose Hunyuan 2.1, Rodin or Tripo for high-quality, detailed outputs
Select a “Multi-View” model if you have multiple angles of your object

Step 3: Configure Input Images

For single-view models, your selected image appears in the input area. For multi-view models like Hunyuan Multi-View, you'll see options to add additional images on the left side of the interface.

When using multi-view:

Ensure all images show the same object with the same proportion
Include different angles (front, left side, right side, and back)
Maintain consistent lighting across images
Keep the images in the correct orientation

Step 4: Adjust Generation Settings

Configure the parameters based on your requirements:

Step Count: Start with 30 for balanced results, test and adjust as needed
Face Count: Choose based on your intended use (20k-60k for most applications)
Guidance: Begin with 5 for optimal balance before testing incrementaly

Step 5: Generate Your 3D Model

Click "Generate" to begin processing. Generation time varies based on selected model complexity, chosen step count, face count settings and current server load (especially for initializing the model if it’s “cold”)

Step 6: Review and Inspect

Once generation completes, a 3D preview loads directly in the Scenario interface. Use the built-in viewer to inspect the model orbit, zoom, pan or toogle wireframe / contrast & lighting.

Step 7: Compare and Iterate

Generate multiple versions using different models or settings to compare results. This helps you identify the best approach for your specific image and use case. Pay attention to geometry accuracy and completenes, texture quality, topology suitability for your intended use.

Step 8: Download and Export

When satisfied with results, download your 3D model in the appropriate format:

GLB: Recommended for most applications, includes geometry and textures in a single file
OBJ (soon): Traditional format with separate texture files, widely supported across 3D software

Best Practices for Optimal Results

Remove Backgrounds

Background elements can confuse the 3D reconstruction process, leading to unwanted geometry or texture artifacts. Clean, isolated subjects produce significantly better results than images with complex backgrounds. Even when your image appears to have a simple background, removing it entirely helps the model focus on the primary object.

Implementation: Use Scenario's built-in background removal tool directly from the 3D generation interface, or prepare your images beforehand using Scenario's Remove Background feature.

Upscale Input Images

Increasing your input image resolution to 2x or 4x the original size often dramatically improves texture quality in the final 3D model.Higher resolution inputs provide more texture detail for the model to work with during the texture synthesis stage. This is particularly important because 3D models need to maintain visual quality when viewed from multiple angles and distances.

Recommendation: Use Scenario's Enhance tool before converting to 3D, especially for images smaller than 1024x1024 pixels.

Optimize Image Characteristics

Certain image qualities consistently produce better 3D reconstruction results:

Clean, stylized artwork: Images that already resemble 3D or cel-shaded artwork convert more successfully than rough sketches or ultra-detailed realistic photos
Clear object boundaries: Well-defined edges help the model distinguish between the object and background
Consistent lighting: Avoid images with extreme shadows or multiple light sources that might confuse depth perception
Single, prominent subject: Images focusing on one main object work better than complex scenes with multiple elements

Understanding Output Limitations

Topology Considerations

AI-generated 3D models typically require retopology for production use in animation or game development. The generated topology prioritizes visual accuracy over optimal edge flow for deformation.

Current AI generation tools creates visually accurate models but don't (yet) produce the clean, quad-based topology that professional animators require. Plan for retopology workflows if your models need rigging or animation.

Texture Mapping

Generated models include UV mapping, but the layout may not follow traditional texturing conventions. For projects requiring custom texture work, you may need to re-UV map the model.

File Size Management

Higher face counts create more detailed models but significantly increase file sizes. Consider your target platform's constraints:

Mobile/VR: Keep face counts under 5K for optimal performance
Desktop games: 10K-20K faces work well for most assets
Rendering/visualization: Higher face counts acceptable for non-real-time use

Integration with Scenario Workflows

Custom Model Integration

Image-to-3D works seamlessly with Scenario's custom-trained models. Generate images using your trained style or character models, then convert them to 3D to maintain visual consistency across your asset pipeline.

Workflow example: Train a style model for your game's art direction → Generate character or prop images → Convert to 3D models → Export for use in your 3D software

Specialized Starting Models

Scenario provides several image generation models optimized for 3D conversion:

3D Blocky Elements: Creates images with clear geometric forms ideal for 3D reconstruction
Toy Box: Generates toy-like objects with simple, clean shapes
Neo3D Realism: Produces realistic objects with good depth cues for 3D conversion
and many more, including foundation models like Flux, GPT Image, and more

Asset Organization

Generated 3D models integrate with Scenario's content management system. Use Collections to organize your 3D assets alongside their source images, and apply Tags for easy retrieval in larger projects.

Quality Expectations

Image-to-3D is great for creating visually convincing 3D models for concept work, prototyping, and assets viewed from limited angles. For hero assets requiring close inspection or animation, consider the generated model as a starting point for further refinement.

Future Developments

Image-to-3D capabilities continue evolving rapidly. Upcoming improvements include enhanced mesh quality, better texture resolution support, and expanded model options. Check Scenario's product updates and Knowledge Base for the latest features and best practices.

Was this helpful?