Dive deeper into understanding Scenario's Regularization and Class categories
IntroductionWhen training a custom generator on Scenario, you will be prompted to select either a pre-generated regularization class, custom class, or no regularization class. This is a classification system that determines how the model should handle certain types of data during training. There are several regularization classes available, each of which is designed for specific types of images or styles. Sometimes it takes time to find the perfect class, and we encourage training your set on a few options.
Selecting the appropriate regularization class can help to improve the quality of the generated images and prevent overfitting. It is important to consider the characteristics of your dataset when choosing a regularization class. For example, if your images are highly detailed and photorealistic, you may want to select a regularization class that is designed to handle that type of data. On the other hand, if your images are more stylized and abstract, a different regularization class may be more suitable.
For example, this style dataset would work best as concept art or illustration.
However, although it is also a style and would work with illustration or concept art, the ‘sketch’ regularization class is more specific, so a user might see a result that matches their vision:
Experimenting with different regularization classes can help you to find the one that works best for your specific needs. In this tutorial, we will broadly discuss the various regularization class options and their potential use cases to help you choose the right one for your project.
What is a Regularization Class?
Regularization classes are a way to improve the accuracy of neural networks by introducing noise or randomness to a set of data points, such as pixel or vector images. This helps to prevent overfitting, where a neural network becomes too precise with a specific training set but is unable to make accurate predictions on new data. Using regularization images during training can help a neural network to detect more general patterns and improve its overall accuracy.
Imagine that you are trying to create a detailed, lifelike drawing of a specific person's face. You have a lot of reference photos of the person and are using them to guide your drawing. However, you don't want your drawing to be an exact copy of any one of the reference photos; you want it to be a composite that captures the person's overall appearance.
To do this, you might decide to use a regularization technique in your drawing process. This could be something like "averaging" the different reference photos together or only allowing yourself to use a limited number of fine details from each photo. By adding these constraints to your drawing process, you are regularizing your image generation; you are preventing it from becoming too complex and overfitting to any one reference photo. This can help your drawing to be more representative of the person's overall appearance, rather than being overly influenced by any one reference photo.
What are Regularization Images?
The goal of regularization in image generation (and in machine learning more generally) is to balance the need for detail and accuracy with the need for generalization and flexibility. Just like in the example above, regularization helps to prevent a model from becoming too complex and overfitting to the training data, which can lead to poor performance when the model is applied to new, unseen data.
Regularization images are AI generated images that specify the general style or subject category that your generator would fall in. If you are training a character that you would like to look like a “man”, you would want to provide AI generated images of men to give the training program the distinct impression that it is not training a “woman” based on it’s existing CLIP program.
There are two methods for using regularization images. You can use generic regularization images, such as a wide variety of images of people, animals, cars, etc. Or, you can use a multi-step approach to model training, and use your imperfect model to create a unique set of regularization images. We will get into this second option further in our tutorial on Custom Classes.
You can view a commonly used regularization image collection here. This gives more context to the range of images one might need. Scenario uses more specialized regularization images in our categories.
Identifying What You’re Training
In our article, Creating a Concept Art Generator, we discussed the different general categories that a custom generator might fall under: a style, hybrid, or subject. Here is a quick recap:
When creating a custom generator for a specific style, the focus is on replicating the overall aesthetic, medium, and quality of the desired output. This type of generator is useful for creating full game concept art and defining specific stylization elements. It is not intended to preserve specific individual characters or objects, and is considered a general finetuned model.
A custom generator that creates subjects is not concerned with style and focuses on the specific elements that identify an object or person. An avatar AI program that creates multiple styles of images based on one person's face is an example of a subject generator. These generators are characterized by their ability to identify and replicate specific features that distinguish a subject.
Subject generators are ideal for creating a large number of different assets for a game or other project, as they can be used to generate images of specific characters or objects in a variety of styles. They are particularly useful for creating assets for games with procedurally generated content, as they can be used to generate a large number of different variations of a given character or object.
A hybrid is just as it sounds, it is a mix of both style and subject elements. The majority of users tend to create hybrid generators for gaming assets, and issues arise when they fail to realize that they are not just training one element, but both. Hybrid generators can be challenging to work with, as they require a good understanding of both style and subject elements. They are often used for creating gaming assets and can take some trial and error to get right. It is important to keep in mind that a hybrid generator combines both style and subject elements, rather than just focusing on one or the other.
Scenario Regularization Classes and Categories
Scenario has a number of specialized regularization classes and categories. Below we have listed them, with some suggested uses and examples. In this guide green categories are suitable for beginners, blue are suitable for intermediate users, and red are suitable for more advanced users.
Training art styles is generally easier and more forgiving compared to other types of models, and they tend to be a good starting point if you plan to create many curated custom generators that are more closely tailored to specific subjects or settings. It is also easier to check your training model for overfitting when working with art styles. If you notice that several of the same subjects appear frequently in your image outputs, you may want to remove similar pictures and retrain your model. If you are new to training machine learning models, we recommend considering starting with an art style. This will give you a chance to become familiar with the process and learn how to tune your model to achieve the best results.
This regularization class is ideal for capturing the general aesthetic and style of your characters. It is able to accommodate both a wide range of outputs and a very narrow output scope. However, it may not be the best choice if you are trying to recreate the same character multiple times, as it is designed to produce varied results. Instead, you may want to consider using a different regularization class that is better suited to producing consistent, predictable results.
This regularization class is well-suited for general style training and world building, as it can accommodate a wide range of art mediums and styles, and recognizes many different types of subjects and landscapes. This makes it a versatile and powerful tool for creating diverse, detailed, and realistic images. If your goal is to build a rich and immersive world or to create a wide variety of art styles, this regularization class may be an excellent choice for your project.
The drawing regularization class is specifically designed for creating hand-drawn style images. It is less polished and refined than the general illustration style, but also not as rough and unfinished as the sketch style. If you are looking to create hand-drawn images with a medium level of detail and realism, this regularization class may be a good fit for your project. It can help you to achieve a specific artistic look and feel.
The illustration regularization class is similar to the concept art class in its adaptability, and it tends to be a good default choice for those who are training a general style. It is often a great starting point for new users who are just getting started with machine learning models and want to create a wide range of images with a high level of detail and realism. This regularization class can accommodate a wide range of art styles and mediums.
The painting regularization class is specifically designed for creating images in a painted style. If you want your image output to have the look and feel of a painted work of art, this regularization class may be a good choice for your project. It is important to keep in mind that this class is specific to a particular medium, and the regularization images in your dataset will be compared to images in a similar style. This can help to ensure that your model produces results that are consistent with the painted style you are trying to achieve.
The pattern regularization class is well-suited for creating repetitive imagery and textures, such as a clothing pattern or a specific type of design. For example, you might use this class to create images of a plaid pattern or a repeating geometric design. Building a dataset for this regularization class can be slightly trickier than others, as the dataset will typically be narrower in scope and may require more careful selection of images. This is not necessarily a class that you should start with if you are new to machine learning and image generation.
The sketch regularization class is designed for creating basic, rough, and unfinished images in a sketch-like style. This class is more basic and simplified than the drawing or illustration classes, and it is well-suited for creating quick, rough sketches or rough drafts of an image. The narrowness of the input that is typically associated with this class makes it slightly more intermediate in terms of training, but most users are likely to find it intuitive to work with.
Avatars / People
The character regularization class is best suited for creating specific characters or avatars. It is ideal for creating player characters in an RPG game or important main NPCs, and it can be used to create detailed and lifelike images of individual characters. While you can impose a particular style on your generator when using this class, you can also achieve impressive results from datasets that do not have a consistent style. This can help to ensure that your model is able to produce a wide range of character designs and appearances.
It is important to keep in mind that if you train your model using only one style, your characters may struggle to break out of that imposed style. Instead, it is generally best to use a diverse and varied dataset that allows your model to learn a wide range of styles and characteristics. This can help to make your characters more versatile and flexible, and give you more creative control over the appearance of your characters.
Characters / NPCs / Mobs
While not all of these classes are intermediate in difficulty, it is generally true that hybrid datasets that have a moderately narrow scope of output can be somewhat finicky to work with. While you do not need to have a lot of experience with finetuning to use these classes, it may take a few training rounds to get the results you want.
One of the key aspects to keep in mind when working with regularization classes is the difference between the categories. For example, if you are creating characters in your game that have features of animals, but stand on two legs and wear clothes, it is likely best to use the character regularization class rather than the animal class. Paying attention to the specific characteristics and features of your subjects can help you to select the most appropriate regularization class and achieve the best results.
The animal regularization class is specifically designed for creating images of non-anthropomorphized animals, such as dogs, cats, horses, and pigs. This class is generally easier to work with than some of the other classes in this general category, and it is well-suited for creating detailed and lifelike images of a wide range of animal subjects.
Characters are a hybrid class similar to the avatar/character category. However, this is a chase where it’s better to lean more into style than subject. It’ is also perfectly suitable to use this category for characters whose style you want to maintain.
Crawling Creatures / Creatures / Flying Creatures / Underwater Creatures
You need to discern which category your dataset falls into with the various creature style iterations. However, they are fairly straightforward. Creatures are good categories for monster, non-humanoid beings, and the middle space between animal and character.
Mechas / Robots
The mecha and robot regularization classes are designed for creating images of robotic or machine subjects and hybrids. These classes can be somewhat challenging to work with, due to the level of detail involved in creating these types of images. It is important to be aware of this detail when building your dataset, as it can help to ensure that your model is able to produce high-quality and accurate images. However, it is also important to avoid overfitting, which can occur if you have too many images, with too little range.
The digital assets regularization class is a hybrid class that is designed for creating images of digital assets such as icons, logos, and other graphical elements. This class is generally less complex to work with than some of the other hybrid classes, and it is more focused on creating a specific style of image rather than a particular subject.
App Icons / Badges
While these categories are technically a hybrid class, they are far easier to train than others. There is a relatively standard format for app icons and badges, and it is one that the base model has a lot of context for. You should not need a very large dataset for an app icon model, and likely 8-12 images will do. However, if there is a lot of diversity in your dataset, or you are trying to communicate a very nuanced style, you can create a database of as many as 30. Just bear in mind that if you have two many images in your set that look similar, you run the risk of overfitting.
Cards are another style of image that tends to follow an easy to replicate format. When training cards, it is good to have your card images sized down, inside a 1:1 square with a little border on the edge. Don’t worry, you can easily use our background removal tool before and after you create your generator and new images, to get rid of any empty borders.
Textures / Tiling Textures
Textures are both capable of being incredibly straightforward, and incredibly complex, depending on a users goals. If you are doing a very specialized texture, you may only need 6-8 images, however, like many custom generators, a texture model can be as complex as you choose to make it.
Photography is a good class category for anything that you want additional added layers of depth and realism. We fine that photography can also work well for realistic 3D concept art, in some cases. Except for highly specialized applications, most people who train with a photography class will be training a more general style.
Photography is not conceptually more complex than style, we’ve found users tend to have a little less awareness when it comes to differentiating the delineation between one style and another when it comes to realism. If you find you are struggling to get consistent outputs, take another look at your dataset and make sure that your images aren’t too different from one another in terms of style.
This style of photography is a great class for zoomed in, textural photography. In fact, one might use this class for many of the same cases they would use a texture class. The main difference might be that they want the particular depth and linework highlighted in a specific way, that might get lost in a texture class training.
Drone style tends to be a little easier to train, as it is so distinct. Users should need somewhere between 10-20 images.
Food / Landscape / Nature / Underwater
We’ve included these general styles together, because they tend to all have a similar approach. Be sure not to feed your custom generator too wildly different styles, and you will do just fine. Like drone, 10-20 images should suffice in most cases.
The other category is fairly self explanatory - the images contained within this regularization set include a much broader range. This is ideal for training a photography aesthetic, rather than a more focused food or nature style.
Props / Resources / Vehicles
While these classes have many multiple sub-categories, we found that their similarities were more important than their differences. Like NPCs, these general game assets usually fall into the category of a hybrid class. That is, most developers will likely be designing them within the specific style parameters of a game world, and also want certain subject features to remain consistent.
As we’ve explained before, it can be very easy to overfit a hybrid, and we recommend being mindful of how often bright colors or specific shapes are present within your dataset. If you are training a game asset, like a class of vehicles, for which there’s an incredibly wide variety, you will want a very diverse and may want larger dataset. If you do not want your assets to have much range, keep your dataset low, and try to show as much different as you can illustrate between the images to avoid one particularly image taking over the design.
Wearable assets can be the trickiest. They may not be as difficult if what you are training is a very standard item, but we’re identifying them as more advanced, because there is more to take into consideration.
A wearable or held asset, like a suit of armor or sword, needs to make sense both in the context of being worn and not being worn. Generic concepts, classic fantasy boots for example, might already be contextualized for the AI enough that you don’t need to provide an overwhelming amount of guidance. However, if you are directing a unique wearable, you will want to provide both a sample of images of the item worn, and unworn.
The biggest issue we see with wearables is too few images being provided. It’s not unusual for someone only to have a handful of examples of a desired wearable, that are also not too similar to one another. Feel free to flip your images so you have more than one facing different directions to fill out your dataset. Regardless, you will need at least 8 to get a good training.
Our last class certainly falls back into a style heavy hybrid. However, worldbuilding asset generators tend to be on the easier end of hybrid generators. Like styles, they are a bit more forgiving. They do tend to need larger datasets, depending on the amount of range you are attempting to train.
2D Maps/DND Maps
2D maps are more your classic world map, where DND maps tend to fit in more with a battle map style. Both are fairly straightforward. We cannot express enough how important it is to have a lot of diversity if you are trying to train a range of different styles, even if your dataset is small.
Utilize this category for anything that needs to have a board, with specific steps on it. You regularization images here will reflect a board variety of board game styles, to ensure that your generator fits that intention.
Indoor is also quite straightforward, with more of a hybrid style. It is best to stay within a certain indoor category - house, temple, school - unless you are training a very broad style or using text encoding. Indoor settings tend to have a lot of detail, so if you have a lot of staged decorating elements, try for a larger dataset so it can more easily learn the important details.
Isometric style tends to be very straightforward, and is easily trained. Similar to indoor, if your isometric style is very ranged and contains lots of details, ensure that you are providing enough instance images and steps for it to properly learn all the details.
This category is quite straightforward and tends to be more general. This is a particularly good generator style to start with, as it is a little more challenging than the general styles, while still being very approachable.
Castles / Skyscrapers / Temples / Other buildings
Buildings are a little more specific, as they more resemble props or vehicles in their hybrid style. They land in the middle, between subject and style training. Like any hybrid, just play close attention to your dataset, and be sure to test afterwards to see if anything has overfit.
As Scenario continues to release new features, users will have the ability to text-encode and use custom classes, which will make them less reliant on selecting the perfect predetermined class. However, using predetermined regularization classes can be useful in producing consistent image outputs and tend to be easier for beginners to use with GenAi tools.
Regularization classes provide a set of guidelines or rules for the model to follow during the learning process, helping to improve its performance and encourage it to synthesize high-quality, realistic images. Using predetermined regularization classes, rather than custom classes, can make it easier for beginners to get started with GenAi tools and achieve consistent results, even if they are not yet familiar with the full range of options available to them.