Learning ControlNet for Beginners

ControlNet is a series of settings, which gives users targeted and nuanced control of their outputs. Users simply put in a reference image, select a mode, and retain important information from their image that other tools fail to keep


In the past, it was difficult to communicate to an AI model what is should be referencing within an input image. ControlNet solves this problem by introducing an efficient method for Scenario Models to produce high quality results with additional input conditions. By employing ControlNet, users can provide reference images to clearly communicate specific details about their image goals.

When Should You Use ControlNet?

     The best time to use ControlNet Modes is when you are trying to emulate the structure, linework, or general architecture of your reference image. Different modes can pick up poses, edges, lines, and even depth!

     Img2Img is recommended if you are trying to retain color values, such a background, garment color, or otherwise.

ControlNet Modes

     You can read more about our ControlNet Modes in our Advanced User article. However, we'd like to share with you four powerful modes to get you started. During the ControlNet process, images are converted into Mode Maps, which are then are used to create your images. We recommend starting with one of these four options, and have provided a quick reference guide to help you choose.

Pose Mode

     Pose mode is ideal for character creation. It works best with realistic or semi-realistic human images, as that is what it was trained on. Pose mode is not as useful for non-character work, but incredibly powerful at detecting faces and poses!

Depth Mode

     Depth Mode is a wonderful tool for differentiating the background and foreground of your reference image, as well as the leveled elements in an image. As you can see below, it retains both the outer structure, as well as many of the finer details, of the original image.

Structure Mode

     Structure Mode picks out and highlights all the fine edgework in an image, focusing mainly on what it considers the subject. Structure mode looks the most like the original input, but as is true with ControlNet, it will not carry over any of the original reference colors.

Segmentation Mode

     Segmentation mode only notices the areas of space taken up by subjects in an image. It tends to recognize the difference between the foreground and background, as well as typically being able to tell what different objects are in your reference image. However, it will pull in most of it's composition information from your model, and less from the image itself.


     ControlNet opens up a whole world of new possibilities, can provide better results from otherwise overfit models, and allows artistic vision to be more clearly executed. We recommend checking out some of our other articles, listed below, for more in depth information.