Introducing Segmentation Mode

Segmentation mode is a setting in ControlNet that extracts compositional information from a provided reference image

Segmentation mode finds all the objects in an image and outlines their outer edges to create a representational segmented shape.

What are good examples of reference images?

     Segmentation mode is a very versatile mode with a strong ability to detect objects and shapes. There are no images that are inherently bad, however it is important to choose use cases that work with the segmentations intended function.

     The primary function of segmentation mode is to block out areas of space. It is designed to interpret how objects create blocks of space in a reference image, and use that as a guide to where information in a provided prompt is populated.

     Segmentation mode ignores all linework, coloration, shading, and details of images in your reference image. This means that a shape of a person could be use to create an altogether different object without confusion by segmentation mode.

     Segmentation mode relies very heavily on the prompt and model information for overarching guidance and populates that data into the segments it creates.


As you can see above, no details of the original characters face or body remain present in the final output. Instead, the images created retain the information that the output should be centered on a solid color background with the general shape retained.

Feature highlights

     Every mode in ControlNet has different features. We’ve shared the primary feature highlights below.

Retaining general shape

     Segmentation is particularly good at outlining the overall shape that should guide an image. This works well with recognizable shapes, such as cars, portraits, buildings, and other easy to identify silhouettes.

As you can see below, the prompt and model can easily change a car into a boat or spaceship using segmentation mode, while the silhouette of the car is recognizable enough to create a near perfect 3D style example when prompted.



We see this again with the golem example. The shape of the mecha is very recognizable and well defined - segmentation mode understands to interpret the shape of the body. However, it includes nuanced details, and depending on the purpose of the model, the results are either very close to a golden golem, or in the case of the gem model, hardly reminiscent at all.




Reinterpreting reference images

     Segmentation mode can also be creatively employed to generate images completely unrelated to the original image. In this case, you can in fact prompt in a way that indicates no relationship to the subject of the original image, and segmentation mode will only use the shape provided in the original image as reference.

In the below example, you can see which hard lines and shapes in the final image were drawn from the simple act of segmenting an unrelated concept as reference image.

grid (2)Conclusion

     Segmentation mode is a blunt but useful tool in the image creation process. To review, when you are using pose mode, take into consideration information about your refence images:

  • What is the shape and structure of the input image?
  • Identify if the shapes in your reference image are undeniably recognizable, and may influence what objects are populated in your final image.
  • Be sure to employ prompt engineering tools for the best results

Writing a Good Prompt

Thanks for reading, and enjoy creating with segmentation mode!