Prompt engineering is an important skillset in generative art creation. In this tutorial we are going to walk through different steps you can take to test, design datasets and execute iterations from your custom finetuned generator.
Testing without Prompts
The first thing you’ll want to do is to test your generator without any prompts. This can help you identify any issues or areas for improvement, and give you a better understanding of the capabilities and limitations of your generator. This will also allow you to see if your generator is overfit or underfit.
In this tutorial we will be using our Bubbleverse concept art trained generator. You can see the dataset and learn more about the training process over at our training tutorial.
Here you can see a few of my outputs. Many of the examples show an alien landscape, which indicates to me that this part of the dataset is likely slightly overtrained. This would be perfect if I were trying to train that specific style of terrain, but in this case I am attempting to just train in the style.
If it were too overfit, it would look very grainy and the images would be nearly the same. If it were underfit, it would look very primitive and cartoonish. In this case the examples are likely a little overfit, however we can balance them out with positive and negative prompts. We can also take this opportunity to train more images, which can be used in a new training session to get more nuanced results.
When a model is extremely overfit you should go through the original dataset, make adjustments, and retrain your Generator. In this example you could try removing a few examples of that landscape style image. Then retrain the model from scratch and test again. If it seems underfit, then it mostly likely had too few images which led to too few training steps.
One helpful technique for testing your generator is to create a list of prompts that cover a range of different subjects, styles, and compositions. This will allow you to get a sense of how your generator handles different types of input, and identify any areas where it may struggle or perform poorly.
Let's review the images generated by a prompt, and make note of any issues or areas for improvement. This can help identify patterns or trends that may be affecting the quality of the output.
In this case I decided to prompt a character. I know that there are characters in my dataset with distinct features, however I didn’t see any in the test without a prompt we ran above but I want to test this with a prompt to make sure the generator wasn’t more overfit than I expected.
I also decided to shift a few settings. I bumped up my Sampling Steps to 75 and adjusted the Guidance Setting. Guidance tells the AI how closely it should follow the parameters of the prompts I’ve put in. Typically it’s best to keep the Guidance under 15, and in the case of custom trained models, between 7 and 10 is a good rule of thumb.
After checking the output, pictured below, I feel very comfortable that the model is not too overfit. Typically I train models like this, in part, to allow me the opportunity to make more assets for hybrid datasets. These will be used to finetune character, object, and landscape generators that can be use in a more nuanced way.
I am not so happy with the face - I can tell that it would be useful to train a few more characters into the dataset with more detailed faces. This is also an indication that a larger dataset may also be a good choice for this model.
If I had planned to use one single generator to create assets for every aspect of a game world, I would want to use text encoding. It is possible to create highly nuanced, multisubject and scene datasets. However, it take a lot of practice to be able to curate that kind of output consistently.
In the case of the Bubbleverse training Generator, I knew that there was some overfitting of space landscapes. So, one prompt exercise I always make sure to try in that kind of situation is to pick a similar subject to the subject that is overfit - in this case another landscape - which looks very different from the overfit version. Here you’ll see I decided to generate a landscape model of an open plain space.
I kept the Guidance and Sampling Steps the same as the last prompt. Typically once I know which parameters work best, I try to keep them as consistent as possible.
I feel good about the design style, but the color isn’t quite there. This is further indication that if I want to create things outside of my original dataset scope, I’ll need to make some adjustments. There are a few ways I could address this.
- I could rewrite the open plain as open plain-- The prompt expression "-" tells the model to bring the weight down on that token.
- I could create more assets using this current model with some heavy prompts and use that to train a new dataset.
- I could retrain the dataset with fewer landscape images.
If I’m working in a more restrictive world, it won’t matter, because I’ll never leave the boundary of my generator. However, in this use case I’m assuming that I may want to be able to train many different layers and aspects of the world.
Using CLIP Interrogator
In some cases, you may want to create entirely new and unusual subjects. That might be easy to prompt, but if you want to create something very similar to your original dataset subjects, with a few very notably different characteristics. This is a case where longer prompts can be useful.
Typically, when you’re using a generic model designed from Stable Diffusion, you need to add a lot of prompts to get a good result. That’s because the basic Stable Diffusion model is not finetuned or guided. When you prompt with a token it uses that information as a reference point in it’s inference process. However, if you don’t have enough tokens, it needs to grab other points from latent space which can get strange results.
Finetuned models solve that problem by creating tokens with a lot of information baked in. So, typically, you need fewer words. The situations where you need longer prompts tend to be when you’re trying to direct your model a little further away from it’s finetuning.
The CLIP Interrogator is an excellent tool for identifying a good, long, base prompt. I like to use this when I’m trying to emulate a general style, while still avoiding copying an image. It creates a point of reference that is removed from whatever image you put into the CLIP Interrogator.
As you can see, this redefined the character parameters a bit. Now, if I want to continue creating a more nuanced character style hybrid, I can keep prompting differences in characters to create variety.
Retraining or New Trainings
Once you’ve identified any areas for improvement, you can begin to make adjustments to your generator to address these issues. This may involve fine-tuning your dataset, adjusting your regularization settings, or adding additional prompts to provide more context and guidance for the generator. At this time users will need to train a new model, and in the future there will be the capability to add onto existing models.
Overall, the key to successful prompt engineering is to be patient and persistent, and to be willing to make iterative improvements to your generator based on the results of your tests. With time and practice, you’ll be able to develop a strong understanding of how to effectively use prompts to generate high-quality images.