Style-transfer GANs for Bridging the Domain Gap in Synthetic Pose Estimator Training
2020 IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR). Proceedings
IEEE International Conference on Artificial Intelligence and Virtual Reality (AIVR) <2020, online>
Given the dependency of current CNN architectures on a large training set, the possibility of using synthetic data is alluring as it allows generating a virtually infinite amount of labeled training data. However, producing such data is a nontrivial task as current CNN architectures are sensitive to the domain gap between real and synthetic data.We propose to adopt general-purpose GAN models for pixellevel image translation, allowing to formulate the domain gap itself as a learning problem. The obtained models are then used either during training or inference to bridge the domain gap. Here, we focus on training the single-stage YOLO6D  object pose estimator on synthetic CAD geometry only, where not even approximate surface information is available. When employing paired GAN models, we use an edge-based intermediate domain and introduce different mappings to represent the unknown surface properties.Our evaluation shows a considerable improvement in model performance when compared to a model trained with the same degree of domain randomization, while requiring only very little additional effort.
STYLE: Style Transfer for Synthetic Training of a YoLo6D Pose Estimator
Darmstadt, TU, Master Thesis, 2020
Supervised training of deep neural networks requires a large amount of training data. Since labeling is time-consuming and error prone and many applications lack data sets of adequate size, research soon became interested in generating this data synthetically, e.g. by rendering images, which makes the annotation free and allows utilizing other sources of available data, for example, CAD models. However, unless much effort is invested, synthetically generated data usually does not exhibit the exact same properties as real-word data. In context of images, there is a difference in the distribution of image features between synthetic and real imagery, a domain gap. This domain gap reduces the transfer-ability of synthetically trained models, hurting their real world inference performance. Current state-of-the-art approaches trying to mitigate this problem concentrate on domain randomization: Overwhelming the model’s feature extractor with enough variation to force it to learn more meaningful features, effectively rendering real-world images nothing more but one additional variation. The main problem with most domain randomization approaches is that it requires the practitioner to decide on the amount of randomization required, a fact research calls "blind" randomization. Domain adaptation in contrast directly tackles the domain gap without the assistance of the practitioner, which makes this approach seem superior. This work deals with training of a DNN-based object pose estimator in three scenarios: First, a small amount of real-world images of the objects of interest is available, second, no images are available, but object specific texture is given, and third, no images and no textures are available. Instead of copying successful randomization techniques, these three problems are tackled mainly with domain adaptation techniques. The main proposition is the adaptation of general-purpose, widely-available, pixel-level style transfer to directly tackle the differences in features found in images from different domains. To that end several approaches are introduced and tested, corresponding to the three different scenarios. It is demonstrated that in scenario one and two, conventional conditional GANs can drastically reduce the domain gap, thereby improving performance by a large margin when compared to non-photo-realistic renderings. More importantly: ready-to-use style transfer solutions improve performance significantly when compared to a model trained with the same degree of randomization, even when there is no real-world data of the target objects available (scenario three), thereby reducing the reliance on domain randomization.