Articles on: Tutorials

Tutorial: How to Prompt

Prompt Engineering: Concept Art Tutorial

Prompt engineering is an important skillset in generative art creation. In this tutorial we are going to walk through different steps you can take to test, design datasets and execute iterations from your custom finetuned generator.

Testing without Prompts

The first thing you’ll want to do is to test your generator without any prompts. This can help you identify any issues or areas for improvement, and give you a better understanding of the capabilities and limitations of your generator. This will also allow you to see if your generator is overfit or underfit.

In this tutorial we will be using our Bubbleverse concept art trained generator. You can see the dataset and learn more about the training process over at our training tutorial

Here you can see a few of my outputs. Many of the examples show an alien landscape, which indicates to me that this part of the dataset is likely slightly overtrained. This would be perfect if I were trying to train that specific style of terrain, but in this case I am attempting to just train in the style.

If it were too overfit, it would look very grainy and the images would be nearly the same. If it were underfit, it would look very primitive and cartoonish. In this case the examples are likely a little overfit, however we can balance them out with positive and negative prompts. We can also take this opportunity to train more images, which can be used in a new training session to get more nuanced results.

When a model is extremely overfit you should go through the original dataset, and remove a few examples of that landscape style image. Then I would retrain the model from scratch and test again. If it seems underfit, then it mostly likely had too few images which led to too few training steps.

Test Prompting

One helpful technique for testing your generator is to create a list of prompts that cover a range of different subjects, styles, and compositions. This will allow you to get a sense of how your generator handles different types of input, and identify any areas where it may struggle or perform poorly.

It’s also a good idea to review the images generated by your prompts, and make note of any issues or areas for improvement. This can help you identify patterns or trends that may be affecting the quality of your output.

In this case I decided to prompt a character. I know that there are characters in my dataset with distinct features, however I didn’t see any in my test prompt. I wanted to test out to make sure the generator wasn’t more overfit than I expected.

I also decided to shift a few settings. I bumped up my sampling steps to 75 and adjusted my CFG. The CFG is a guidance scale that tells the AI how closely it should follow the parameters of the prompts I’ve put in. Typically it’s best to keep the CFG under 15, and in the case of custom trained models, between 7 and 10 is a good rule of thumb.

After checking the output, pictured below, I feel very comfortable that the model is not too overfit. Typically I train models like this, in part, to allow me the opportunity to make more assets for hybrid datasets. These will be used to finetune character, object, and landscape generators that can be use in a more nuanced way.

I am not so happy with the face - I can tell that it would be useful to train a few more characters into the dataset with more detailed faces. This is also an indication that a larger dataset may also be a good choice for this model.

If I had planned to use one single generator to create assets for every aspect of a game world, I would want to use text encoding. It is possible to create highly nuanced, multisubject and scene datasets. However, it take a lot of practice to be able to curate that kind of output consistently.

Negative Prompting

In the case of the Bubbleverse training model, I knew that there was some overfitting of space landscapes. So, one prompt exercise I always make sure to try in that kind of situation is to pick a similar subject - in this case another landscape - which looks very different from the overfit version. Here you’ll see I decided to generate a landscape model of an open plain space.

I kept the CFG and step parameters the same as the last prompt. Typically once I know which parameters work best, I try to keep them as consistent as possible.

I feel good about the design style, but the color isn’t quite there. This is further indication that if I want to create things outside of my original dataset scope, I’ll need to make some ajustments. There are a few ways I could address this.

I could rewrite the open plain as ((open plain)). When you use “( )” it tells the model to bring the weight down on that token.
I could create more assets using this current model with some heavy prompts and use that to train a new dataset.
I could retrain the dataset with fewer landscape images.

If I’m working in a more restrictive world, it won’t matter, because I’ll never leave the boundary of my generator. However, in this use case I’m assuming that I may want to be able to train many different layers and aspects of the world.

Using CLIP Interrogator

In some cases, you may want to create entirely new and unusual subjects. That might be easy to prompt, but if you want to create something very similar to your original dataset subjects, with a few very notably different characteristics. This is a case where longer prompts are useful.

Typically, when you’re using a generic model designed from Stable Diffusion, you need to add a lot of prompts to get a good result. That’s because the basic Stable Diffusion model is not finetuned or guided. When you prompt with a token it uses that information as a reference point in it’s inference process. However, if you don’t have enough tokens, it needs to grab other points from latent space which can get strange results.

Finetuned models solve that problem by creating tokens with a lot of information baked in. So, typically, you need fewer words. The situations where you need longer prompts tend to be when you’re trying to direct your model a little further away from it’s finetuning.

The CLIP Interrogator is an excellent tool for identifying a good, long, base prompt. I like to use this when I’m trying to emulate a general style, while still avoiding copying an image. It creates a point of reference that is removed from whatever image you put into the CLIP Interrogator.

As you can see, this redefined the character parameters a bit. Now, if I want to continue creating a more nuanced character style hybrid, I can keep prompting differences in characters to create variety.

Retraining or New Trainings

Once you’ve identified any areas for improvement, you can begin to make adjustments to your generator to address these issues. This may involve fine-tuning your dataset, adjusting your regularization settings, or adding additional prompts to provide more context and guidance for the generator. At this time users will need to train a new model, and in the future there will be the capability to add onto existing models.

Overall, the key to successful prompt engineering is to be patient and persistent, and to be willing to make iterative improvements to your generator based on the results of your tests. With time and practice, you’ll be able to develop a strong understanding of how to effectively use prompts to generate high-quality images.

Updated on: 30/12/2022

Was this article helpful?

Share your feedback


Thank you!