Introducing Pose Mode

Pose mode is a setting in ControlNet that extracts compositional information from a provided reference image. Pose mode focuses on human body, face, and hand shapes, using the information it detects in your reference image

What are good examples of reference images?

Pose mode will render the highest accuracy results if the reference images provided look like realistic humans. Pose mode can detect cartoon bodies and faces to a limited degree, however the less realistic, the less detail it will be able to pick up. See some of our examples below:

Examples of Good References

     As you can see, pose mode will recognize the pose and face in a reference image, so long as it retains a sense of realism. Pose mode was trained on photos of human beings, so the closer the input image is to a photo in it’s detail and realism, the more accurate the result.


Examples of bad references

     For every ControlNet mode that is used, there is a process where the image is pre-processed into a map. In instances where a map isnt detected, nothing is generated. For pose mode that is true in cases where the images have unrealistic body proportions or very cartoon-like faces.


As you can see here, although in this image the pose is detected, the face is not. This often leads to extreme degradation of the face in final images:


Feature highlights

     Every mode in ControlNet has different features. We’ve shared the primary feature highlights below.

Accurate pose consistency

One of the main features of pose mode is, as the name suggests, it’s ability to accurately identify and recreate poses, without carrying over stylistic details of the original reference image. You can check out some examples of this here:


     Vastly different character prompts can still output very consistent pose information from the original image. We suggest following these tips for best results:

  • Make sure that the full pose and face are visible in your reference image.
  • Pick realistic or semi realistic reference images.

Face and body structure

Another feature of pose mode is the coherent ability it has to identify the general proportions and structure of a portrait or full body image. This means that if you input a particular face, there is a high likelihood that pose mode will map the general structure of the face - think of it like sharing the X-ray of a person, rather than their full appearance.






     A few more tips, particularly important for images of head on portraits, where the face is amplified:

  • If you are attempting to generate images of a specific character, try to find reference poses with a similar body type. Pose mode will carry some of the information of the original reference image such as body proportions, shoulder width, and height.
  • When using a reference image of a face, the structure of the underlying face will be taken into account. You may notice differences in your image outputs if you use reference faces with very different face structures.

         Pose mode can be a very powerful tool in character creation. To review, when you are using pose mode, take into consideration information about your refence images:

      • Realism
      • Proportions
      • Face structure
      • Visibility

      Thanks for reading, and enjoy creating with pose mode!