The breakthroughs in the performance of deep neural networks (Deep Neural Networks, DNNs) since the beginning of this decade have led to machine learning models, which often significantly outperform existing methods in terms of application-relevant performance.
This is particularly evident in the field of computer vision. Tasks such as image classification, object detection and object pose estimation appear to be outstandingly solvable by learning the statistical distributions of image features and are now almost exclusively solved with DNNs, especially CNNs (Convolutional Neural Networks).
An essential prerequisite for the breakthrough of this approach was the availability of large amounts of data (keyword: big data) to identify the relevant image features and their distributions. However, there are not enough data collections for all interesting problems and even if one is available, the preparation of the data remains an expensive, time-consuming and error-prone undertaking.
In order to solve these problems, one tries to automatically generate the relevant training data from other available data. This approach has the particular advantage that the labeling process can be omitted completely, since the necessary information (such as object poses) is generated when generating the training images.
A problem that arises is the discrepancy in the distributions of the image features between real and synthetically generated data, which is referred to as domain shift or domain gap.
While researchers around the globe presented solutions, most often belonging to either the domain randomization or the domain adaptation approach, there is yet no elegant and efficient solution that works for an arbitrary use case.