Articles on: Generators

What is the text encoder?

Text encoding is the process used to more accurately train the CLIP program to more accurately interpret the image dataset. The CLIP program was originally trained on billions of images and the tokens trained get their words and phrases from metatags.

Fine-tuning the text encoder seems to produce the best results, especially with faces.

It generates more true to data images
Allows better prompt interpretability, i.e it can handle more complex prompts.

By fine-tuning the text encoder, you can achieve better results and handle more complex prompts. Overall, text encoding can help improve the performance of your model and enable it to generate more accurate and detailed images.

You can learn more about text encoding training parameters with our walk through guide to advanced training.

Updated on: 31/01/2023

Was this article helpful?

Share your feedback


Thank you!