A good training dataset for a generator will vary greatly, depending on your desired output. To curate a good training dataset for a generator, there are a few key steps you can follow.
First, identify the specific style or subject that the generator will be trained with.
Next, gather a dataset that is relevant to the style or subject.
- The dataset can have 5 images min and 30 - 100 images max based on your plan.
- The dataset should be large enough to allow the generator to learn the relevant patterns and relationships ("consistency") while also having some variability between the images
A few good guidelines to follow: - The dataset should not be so large that it becomes unwieldy.
- The aspects of either subject, composition, or aesthetics that you want to maintain in your custom generator should be consistent. In most cases, try to avoid too much variety in the output you are trying to accomplish when you are just beginning.
- Any elements you do not want the AI to specifically remember should be as diversely represented as possible. For example, if you are training an illustration style, be aware that if you include too many images that contain elephants, the AI may assume that elephants are a part of the desired output.
Building a good dataset is a craft, and it will take time to learn. We recommend checking out our tutorials, looking at the sample training sets we’ve included, and testing a few different datasets when you train a new generator
Tips on Training Datasets
Curating a good training dataset is key to creating quality generators, that will reliably produce amazing output.
A typical training dataset contains 15-25 images. We recommend you start with small datasets first before exploring larger ones.
Additionally, you will want to ensure the images have a good enough quality/resolution, as this can negatively impact the performance of the AI model.
A training dataset is preferably made of squared images (which can be cropped and resized directly in the Scenario web app, via the uploader.