Applied Food Recognition for Vision-Based Self-Checkout Systems
Darmstadt, TU, Bachelor Thesis, 2019
Food recognition has been around almost as long as object detection itself. However, it is still an immensely complicated task due to the nature of food. The same food can come in different shapes, colors and arrangements. In contrast, different food can look almost identical. Therefore, it is crucial to find efficient systems to evaluate images of food. The main goal of this thesis is to provide a way for canteens to use self-checkout systems. It should be able to identify and differentiate between food items on the basis of pictures of the food. Then, a total price should be calculated and a method of payment is provided. This report solved these problems by building on a master’s thesis . We collected a database that was used to train two neural networks. A CNN based on the Inception architecture achieved equal error rate losses of 9% and is responsible for identifying the main dishes on the user’s tray. Another Faster R-CNN was set up to identify side components with a precision of 99.98%. A prototype was set up that is able to classify food on canteen trays from two images. It was equipped with an efficient camera setup and interfaces to a back-end server that handles classifications. The system is ready to be used in canteens. The following report will describe how the system works and which steps have been taken in order to achieve the given accuracies.
A Look at Feet: Recognizing Tailgating via Capacitive Sensing
Distributed, Ambient, and Pervasive Interactions: Technologies and Contexts
International Conference on Distributed, Ambient and Pervasive Interactions (DAPI) <6, 2018, Las Vegas, NV, USA>
At many every day places, the ability to be reliably able to determine how many individuals are within an automated access control area, is of great importance. Especially in high-security areas such as banks and at country borders, access systems like mantraps or drop-arm turnstiles serve this purpose. These automated systems are designed to ensure that only one person can pass through a particular transit area at a time. State of the art systems use camera systems mounted in the ceiling to detect people sneaking in behind authorized individuals to pass through the transit space (tailgating attacks). Our novel method is inspired by recently achieved results in capacitive in-door-localization. Instead of estimating the position of humans, the pervasive capacitance of feet in the transit space is measured to detect tailgating attacks. We explore suitable sensing techniques and sensor-grid layout to be used for that application. In contrast to existing work, we use machine learning techniques for classification of the sensor’s feature vector. The performance is evaluated on hardware-level, by defining its physical effectiveness. Tests with simulated attacks show its performance in comparison with competitive camera-image methods. Our method provides verification of tailgating attacks with an equal-error-rate of 3.5%, which outperforms other methods. We conclude with an evaluation of the amount of data needed for classification and highlight the usefulness of this method when combined with other imaging techniques.
An Integrated Deep Neural Network for Defect Detection in Dynamic Textile Textures
Progress in Artificial Intelligence and Pattern Recognition
International Workshop on Artificial Intelligence and Pattern Recognition (IWAIPR) <6, 2018, Havana, Cuba>
Lecture Notes in Computer Science (LNCS), 11047
This paper presents a comprehensive defect detection method for two common fabric defects groups. Most existing systems require textiles to be spread out in order to detect defects. This method can be applied when the textiles are not spread out and does not require any pre- processing. The deep learning architecture we present is based on transfer learning and localizes and recognizes cuts, holes and stain defects. Classification and localization is combined into a single system combining two different networks. The experiments this paper presents show that even without adding depth information, the network was able to distinguish between stain and shadow. This method has been successful even for textiles in voluminous shape and is less computationally intensive than other state-of-the-art methods.
An Intuitive and Personal Projection Interface for Enhanced Self-management
Distributed, Ambient, and Pervasive Interactions: Technologies and Contexts
International Conference on Distributed, Ambient and Pervasive Interactions (DAPI) <6, 2018, Las Vegas, NV, USA>
Smart environments offer a high potential to improve intuitive and personal interactions in our everyday life. Nowadays, we often get distracted by interfaces and have to adapt ourselves to the technology, instead of the interfaces focusing on the human needs. Especially in work situations, it is important to focus on the essential in terms of goal setting and to have a far-reaching vision about ourselves. Particularly with regard to self-employment, challenges like efficient self-management, regulated work times and sufficient self-reflection arise. Therefore, we present ‘Selv’, a novel transportable device that is intended to increase user productivity and self-reflection by having an overview about obligations, targets and success. ‘Selv’ is an adaptive interface that changes its interactions in order to fit into the user’s everyday routine. Our approach is using a pen on a projected interface. Adapting to our own feeling of naturalness ‘Selv’ learns usual interactions through handwriting recognition. In order to address users needs, it is more likely to built a mutual relationship and to convey a new feeling of an interface in a personal and natural way. This paper includes an elaborate concept and prototypical realization within the internet of things environment. We conclude with an evaluation of testings and improvements in terms of interactions and hardware.
Cinematic Narration in VR – Rethinking Film Conventions for 360 Degrees
Virtual Augmented and Mixed Reality: Applications in Health, Cultural Heritage, and Industry
International Conference Virtual Augmented and Mixed Reality (VAMR) <10, 2018, Las Vegas, NV, USA>
The rapid development of VR technology in the past three years allowed artists, filmmakers and other media producers to create great experiences in this new medium. But filmmakers are, however, facing big challenges, when it comes to cinematic narration in VR. The old, established rules of filmmaking do not apply for VR films and important techniques of cinematography and editing must be completely rethought. Possibly, a new filmic language will be found. But even though filmmakers eagerly experiment with the new medium already, there exist relatively few scientific studies about the differences between classical filmmaking and filmmaking in 360 and VR. We therefore present this study on cinematic narration in VR. In this we give a comprehensive overview of techniques and concepts that are applied in current VR films and games. We place previous research on narration, film, games and human perception into the context of VR experiences and we deduce consequences for cinematic narration in VR. We base our assumptions on a conducted empirical test with 50 participants and on an additional online survey. In the empirical study, we selected 360-degree videos and showed them to a test-group, while the viewer’s behavior and attention was observed and documented. As a result of this paper, we present guidelines which suggest methods of guiding the viewers’ attention as well as approaches to cinematography, staging and editing in VR.
Identifying Cuts and Holes in Fabrics
Darmstadt, TU, Master Thesis, 2018
Quality assurance of fabrics is one of the basic and vital tasks in the textile industries. A human operated task can be error prone. Automatic visual inspection reduces a lot of time as well as the labour cost. Most of the approaches so far have been implemented over the flat spread textiles . Main goal of this thesis is to detect and classify small or fine defects (cuts,holes,stains) as far as possible in in-homogeneous, voluminous shape fabrics. Similarly, The main focus is also on test time computation by minimizing the processing steps. To achieve this goal, deep learning (DL) and computer vision techniques are implemented which seem to be effective in the areas of image classification and object localization. This report will provide a detail overview about how an object detector algorithm can be used to detect defects and localize over the fabric. Most of the approach only classify the defects but not localize them as a part of the network itself. To obtain optimum classification accuracy and the computational cost, I train the RCNN(Faster) model with the labelled defects(bounding box) in the images as an input so that the defect detection becomes real time. The feature maps are extracted from the last convolution layer of the Convolutional Neural Network (CNN) and then classified with the softmax layer (at the end of the fully-connected layer). The regression layer outputs the coordinates of the bounding box. Here, I use different CNN  architecture and the good result is obtained with the VGG16 pretrained model without using disparity map. The details about the methods and evaluation will be presented in subsequent methodology and evaluation respectively. The classification accuracy of 98.05% and 96.70% were obtained in the test set (at threshold 0.5) and in the validation set(at threshold 0.7) respectively with the proposed method.
Predicting OCR Errors in Natural Scene Images
Darmstadt, TU, Bachelor Thesis, 2018
This thesis explores the usage of Image Quality Assessment (IQA) Systems in order to increase the reliability of OCR systems in the natural scene. It proposes to increase the reliability of OCR in natural scene, based on the principle that OCR accuracy is a function of the quality of the input image. This work focus on assessing image quality from video frames in real time in order to pick high-quality images for the OCR process. The IQA system predicts OCR error chances and outputs the image quality problems, such as blur and light effects. The key technology developed for this work is an efficient IQA System built using the MobileNet V1 Convolutional Neural Network (CNN) Architecture. The approach behind the system builds upon past research on CNN based IQA and mainly on transfer learning research in the field done by Bianco et al. , which states that CNNs trained for general object recognition already learn important features for IQA tasks. The final system was pre-trained for object recognition using the ImageNet dataset and had the last fully connected layer retrained for the new IQA classification task using a database with 180k images created in this work. The database is divided into three classes with 60k images each. The "Good" class contains natural scene text boxes that can be read by the Tesseract engine with no errors. The "Light" class contain natural scene text boxes where light effects caused an OCR error and the "Blur" class contains natural scene text boxes where blur caused an OCR error. The CNN achieved a classification accuracy of 99.35% on the validation set containing 17k text boxes images and performs as expected in real natural scene text images.
Speech Emotion Recognition as a Wearable Device for Depressed People
Darmstadt, TU, Master Thesis, Jahr
Recognition of Emotions from speech was first coined by the works of Daellert et.al in 1996, published as the first research paper regarding this topic. It was re-introduced with the concepts of Deep Learning around 10 year back and has been in evolution since the last 2 decades. Communicating with machines have become a hot topic for research and industrial purposes, already seen quite a lot of advancements such as the once found on our smartphones: Cortana, Siri, Google Now to the very sophisticated Artificially Intelligent "Sophia"  robot, with the heart of technology centered around predicting human emotions. This field of automatically recognizing human emotions and affective (mental) states also know commonly as Speech Emotion Recognition (SER) has seen an amalgamation of different technologies thereby blurring the line between SER and Artificial Intelligence. Predicting emotions in a human being, is a challenging task for machines, let alone even we as humans sometimes fail to predict the correct emotions in other person. The advancements in DL have led to many breakthroughs in this field and have come a long way from predicting basic emotions of happiness, anger, sad, fear as explained by Paul Ekman to predicting real time emotions. The real time emotions can be defined as the spontaneous ones which show variations with time as is the case with the real human emotions. A human being exhibits various affective states i.e. emotions in his/her speech, governed by the various external and internal factors. The idea behind this thesis is to predict such spontaneous emotions, in two emotional dimensions called the Arousal- which is said to be accessible to acoustic features such as the vibrations made by the vocal chords in a person. Valence-said to be accessible to the linguistic features such as those particular to a language or linguistic . Also, these two dimensions are correlated and could be thought of as the coordinates of a particular emotion on a graph plot against Time, for example happiness can be translated to (positive valence, high (positive) arousal). Emotions are not constant and change with time, place and person, in other words, they have a contextual nature and therefore this thesis attempts to make this study and provide the results. Depression has been a major and very common mental disease in today’s world. Various factors such as loneliness, stress, family loss could be, among many others potential causes. To predict the mental health of a depressed person using such a SER system is quite a challenging task and this thesis attempts to predict spontaneous emotions by adapting a recurrent convolutional DL speech architecture, in the two emotional dimensions. The acoustic features are extracted using the CNN layers, and the temporal structure of the speech is modelled using the LSTM layers i.e. providing the "contextual information" from the audio signal. And evaluate those continuous quantities using the Regression metrics of DL based on their correlations. The outcome of this thesis is to check the values of the metrics based on the valence, arousal dimensions which are predicted by the network, for a continuous quantity of audio frames (chunks of 40ms) provided as training data. And the corresponding annotations (for each of 40ms audio frames) is also prepared manually for each of the audio files. Concordance Correlation Coefficient is used as the metric to evaluate on the factors such as the covariance, mean etc. indicating the correlations between the arousal, valence predictions made by the network.
Text Localization in Born-Digital Images of Advertisements
Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Iberoamerican Conference on Pattern Recognition (CIARP) <22, 2017, Valparaíso, Chile>
Localizing text in images is an important step in a number of applications and fundamental for optical character recognition. While born-digital text localization might look similar to other complex tasks in this field, it has certain distinct characteristics. Our novel approach combines individual strengths of the commonly used methods: stroke width transform and extremal regions and combines them with a method based on edge-based morphologically growing. We present a parameterfree method with high flexibility to varying text sizes and colorful image elements. We evaluate our method on a novel image database of different retail prospects, containing textual product information. Our results show a higher f-score than competitive methods on that particular task.
The Dark Side of the Face: Exploring the Ultraviolet Spectrum for Face Biometrics
2018 International Conference on Biometrics (ICB)
IAPR International Conference on Biometrics (ICB) <11, 2018, Gold Coast, Australia>
Facial recognition in the visible spectrum is a widelyused application but it is also still a major field of research.In this paper we present melanin face pigmentation (MFP)as a new modality to be used to extend classical face biometrics. Melanin pigmentation are sun-damaged cells thatoccur as revealed and/or unrevealed pattern on human skin.Most MFP can be found in the faces of some people whenusing ultraviolet (UV) imaging. To proof the relevance ofthis feature for biometrics, we present a novel image datasetof 91 multiethnic subjects in both, the visible and the UVspectrum. We show a method to extract the MFP featuresfrom the UV images, using the well known SURF featuresand compare it with other techniques. In order to proof itsbenefits, we use weighted score-level fusion and evaluatethe performance in an one against all comparison. As a resultwe observed a significant amplification of performancewhere traditional face recognition in the visible spectrum isextended with MFP from UV images. We conclude with afuture perspective about the use of these features for futureresearch and discuss observed issues and limitations.
Fiber Defect Detection of Inhomogeneous Voluminous Textiles
Mexican Conference on Pattern Recognition (MCPR) <9, 2017, Huatulco, Mexico>
Quality assurance of dry cleaned industrial textiles is still a mostly manually operated task. In this paper, we present how computer vision and machine learning can be used for the purpose of automating defect detection in this application. Most existing systems require textiles to be spread flat, in order to detect defects. In contrast, we present a novel classification method that can be used when textiles are in inhomogeneous, voluminous shape. Normalization and classification methods are combined in a decision-tree model, in order to detect different kinds of textile defects. We evaluate the performance of our system in realworld settings with images of piles of textiles, taken using stereo vision. Our results show, that our novel classification method using key point pre-selection and convolutional neural networks outperform competitive methods in classification accuracy.
Separation of Subjects in High-Security Locks by Using Capacitive Sensing
Bremen, Hochschule, Master Thesis, 2017
A reliable distinction between one and more than one person in the automated access control is of great importance. When access to high-security area e.g. bank or in border control, here personal interlock is used. These systems ensure without human influence, that only a single individual can pass through a particular transit area (Mantrap Portal). Existing technical approaches use thermal imaging (Body Heat), RGB-D Images, Camera image based and computer vision algorithm to verify if there are one or more persons in the transit area. Other known systems use weight or photo sensor based methods for verification. In this Master's Thesis, we will investigate using capacitive sensors for this application. The most suitable capacitive sensing technique, as well as the number of sensors and their position, will be examined in this work. The performance of the developed system will be measured empirical testing and includes test scenarios in which an attacker tries to spoof the system. The system performance using capacitive sensors will be measured. Receiver operating characteristics (ROC) or Detection Error Tradeoff (DET) curves will show how the developed system performs compared with other solution. The work will conclude with a feasibility analysis of the capacitive sensor technique in a possible practical usage.
Talis - A Design Study for a Wearable Device to Assist People with Depression
2017 IEEE 41st Annual Computer Software and Applications Conference Workshops
IEEE International COMPSAC Workshop on User Centered Design and Adaptive Systems (UCDAS) <4, 2017, Torino, Italy>
One of the major diseases affecting the global population, depression has a strong emotional impact on its sufferers. In this design study, "Talis" is presented as a wearable device which uses emotion recognition as an interface between patient and machine to support psychotherapeutic treatment. We combine two therapy methods, "Cognitive Behavioral Therapy" and "Well- Being Therapy", with interactive methods thought to increase their practical application potential. In this study, we draw on the results obtained in the area of "affective computing" for the use of emotions in empathic devices. The positive and negative phases experienced by the patient are identified through speech recognition and used for direct communication and later evaluation. After considering the design possibilities and suitable hardware, the future realization of such technology appears feasible. In order to design the wearable, user studies and technical experiments were carried out. The results of these suggest that the device could be beneficial for the treatment of patients with depression.
Vision Based Food Recognition
Darmstadt, TU, Master Thesis, 2017
Food recognition has always been a complex task to achieve due to deformable nature of food items. A lot of different food items can look pretty much identical because of their same shape and colour and the wide rage of diversity they are present in. Some time it is even difficult for humans to distinguish between different food items based on their visual characteristics. Main goal of this thesis is to see if it is possible to provide a viable solution to canteen owner so that they can do price calculation of their food products and to analyze different techniques that could be used in food recognition for cashier systems. All this cost estimation should be done in a reasonable amount of time (One customer should not spend more than 5 to 10 secs waiting for cost evaluation of his/her food). Flow of system should also be simple so that every customer can adopt to it very easily. Objective is to provide a simple solution in which purchased items could be identified from an image. Customer would only be required to place tray under a camera and no further interaction from customer point of view would be required in cost evaluation. Enrolment of new components in system should also be easy and less time consuming. To achieve the end target, system would be relying on cutting edge Deep Learning(DL) techniques. As over the past few years (Especially after Alexnet Which came in 2012) DL has shown very promising results in the areas of image classification and image segmentation. DL has also been very helpful when it comes to finding similarities between different images or for Maximum Likely-hood Estimation. Readers of this thesis document will see how different DL techniques can be combined to achieve food classification task. This report will give a detailed analysis about how different techniques, which are used to solve other complex problems like face recognition or signature verification, can be modified to perform food recognition. To achieve optimum classification results I examined different Neural Network(NN) architectures and evaluated their results. Best performance was achieved by training a Convolutional Neural Network( CNN) based on Inception architecture against the Triplet Loss. This method achieved a HTER of 10.3%. It was further investigated that these results could be further enhanced with the application of appropriate supervised image segmentation techniques.
Attack Detection in an Autonomous Entrance System using Optical Flow
7th International Conference on Imaging for Crime Detection and Prevention
International Conference on Imaging for Crime Detection and Prevention (ICDP) <7, 2016, Madrid, Spain>
Unstaffed access control portals are becoming more common in high security areas. Existing systems require expensive hardware, or are sensitive to changing environmental conditions. We present a single camera system for a mantrap which is able to verify that only one individual is in the designated transit area. Our novel approach combines optical flow and machine-learning classification. A database was created that consists of images of attempted attacks and regular verification. The results show that our approach provides competitive results and outperforms detection rates in several attack scenarios.
Combining Low-level Features of Offline Questionnaires for Handwriting Identification
Image Analysis and Recognition
International Conference on Image Analysis and Recognition (ICIAR) <13, 2016, Póvoa de Varzim, Portugal>
When using anonymous offline questionnaires for reviewing services or products it is often not guaranteed that a reviewer does this only once as intended. In this paper an applied combination of different features of handwritten characteristics and its fusion is presented to expose such manipulations. The presented approach covers the aspects of alignment normalization, segmentation, feature extraction, classification and fusion. Nine features from handwritten text, numbers and checkboxes are extracted and used to recognize handwriter duplicates. The proposed method has been tested on a novel database containing pages of handwritten text produced by 1,734 writers. Furthermore we show that the unified biometric decision using a weighted sum combination rule can significantly improve writer identification performance even on low level features.
Emotional User Interface
Darmstadt, Hochschule, Bachelor Thesis, 2016
Rapid Classification of Textile Fabrics Arranged in Piles
Proceedings of the 13th International Joint Conference on e-Business and Telecommunications Volume 5
International Joint Conference on e-Business and Telecommunications (ICETE) <13, 2016, Lisbon, Portugal>
Research on the quality assurance of textiles has been a subject of much interest, particularly in relation to defect detection and the classification of woven fibers. Known systems require the fabric to be flat and spread-out on 2D surfaces in order for it to be classified. Unlike other systems, this system is able to classify textiles when they are presented in piles and in assembly-line like environments. Technical approaches have been selected under the aspects of speed and accuracy using 2D camera image data. A patch-based solution was chosen using an entropy-based pre-selection of small image patches. Interest points as well as texture descriptors combined with principle component analysis were part of this evaluation. The results showed that a classification of image patches resulted in less computational cost but reduced accuracy by 3.67%.
Stereo-Image Normalization of Voluminous Objects Improves Textile Defect Recognition
Advances in Visual Computing. 12th International Symposium, ISVC 2016
International Symposium on Visual Computing (ISVC) <12, 2016, Las Vegas, NV, USA>
The visual detection of defects in textiles is an important application in the textile industry. Existing systems require textiles to be spread flat so they appear as 2D surfaces, in order to detect defects. In contrast, we show classification of textiles and textile feature extraction methods, which can be used when textiles are in inhomogeneous, voluminous shape. We present a novel approach on image normalization to be used in stain-defect recognition. The acquired database consist of images of piles of textiles, taken using stereo vision. The results show that a simple classifier using normalized images outperforms other approaches using machine learning in classification accuracy.
Verification of Single-Person Access in a Mantrap Portal Using RGB-D Images
XII Workshop de Visão Computacional. Proceedings
Workshop de Visão Computacional <2016, Campo Grande, Brasil>
Automatic entrance systems are increasingly gaining importance to guarantee security in e.g. critical infrastructure. A pipeline is presented which verifies that only a single, authorized subject can enter a secured area. Verification scenarios are carried out by using a set of RGB-D images. Features, invariant to rotation and pose are used and classified by different metrics to be applied in real-time. The performance was evaluated by using scenarios in which the system was attacked by a second subject. The results show that the presented approach outerperforms competitive methods. It concludes with a summary of strengths and weaknesses and gives an outlook for future work.
Verifying Isolation in a Mantrap Portal via Thermal Imaging
IWSSIP 2016. Proceedings
International Conference on Systems, Signals and Image Processing (IWSSIP) <23, 2016, Bratislava, Slovakia>
This work presents a system that can be used to ensure that only one individual can pass through a designated transit area (mantrap portal). The developed technical approach uses thermal images to detect humans based on their body heat. A special focus was on the behaviour of the system placed under attack when an intruder tries to overcome the system. The performance was evaluated in empirical testing with a test group, selected according to their physical characteristics. The test scenarios cover changing appearances of individuals and possibly carried objects into the mantrap. Receiver Operating Characteristics (ROC) curves show how the developed system performs. This work concludes with a discussion about a number of challenges and gives an outlook for possible solutions.
Prototypical Development of an In-Shop Advertisment System using Body Dimension Recognition
Darmstadt, Hochschule, Master Thesis, 2014
This thesis outlines a system created to give consumers in the fashion industry an idea of how an item of clothing will look on them before trying it on. In the form of a short video, items of clothing are projected virtually onto an image of the user. Through the use of this system, retailers and manufacturers have the chance to immediately display their clothes on potential customers.
Virtual Fitting Pipeline: Body Dimension Recognition, Cloth Modeling, and On-Body Simulation
VRIPHYS 14: 11th Workshop in Virtual Reality Interactions and Physical Simulations
International Workshop in Virtual Reality Interaction and Physical Simulations (VRIPHYS) <11, 2014, Bremen, Germany>
This paper describes a solution for 3D clothes simulation on human avatars. The proposed approach consists of three parts, the collection of anthropometric human body dimensions, cloths scanning, and the simulation on 3D avatars. The simulation and human machine interaction has been designed for application in a passive In- Shop advertisement system. All parts have been evaluated and adapted under the aim of developing a low-cost automated scanning and post-production system. Human body dimension recognition was achieved by using a landmark detection based approach using both two 2D and 3D cameras for front and profile images. The human silhouettes extraction solution based on 2D images is expected to be more robust to multi-textured background surfaces than existing solutions. Eight measurements corresponding to the norm of body dimensions defined in the standard EN-13402 were used to reconstruct a 3D model of the human body. The performance is evaluated against the ground-truth of our newly acquired database. For 3D scanning of clothes, different scanning methods have been evaluated under apparel, quality and cost aspects. The chosen approach uses state of the art consumer products and describes how they can be combined to develop an automated system. The scanned cloths can be later simulated on the human avatars, which are created based on estimation of human body dimensions. This work concludes with software design suggestions for a consumer oriented solution such as a virtual fitting room using body metrics. A number of future challenges and an outlook for possible solutions are also discussed.