006 Spezielle Computerverfahren
Refine
Departments, institutes and facilities
Document Type
- Doctoral Thesis (4) (remove)
Language
- English (4)
Has Fulltext
- no (4)
Keywords
- Augmented Reality (1)
- Computergrafik (1)
- Human factors (1)
- Machine learning (1)
- Ray tracing (1)
- Virtuelle Realität (1)
- Visuelle Wahrnehmung (1)
- distraction detection (1)
- facial action units (1)
- facial expression analysis (1)
This dissertation presents a probabilistic state estimation framework for integrating data-driven machine learning models and a deformable facial shape model in order to estimate continuous-valued intensities of 22 different facial muscle movements, known as Action Units (AU), defined in the Facial Action Coding System (FACS). A practical approach is proposed and validated for integrating class-wise probability scores from machine learning models within a Gaussian state estimation framework. Furthermore, driven mass-spring-damper models are applied for modelling the dynamics of facial muscle movements. Both facial shape and appearance information are used for estimating AU intensities, making it a hybrid approach. Several features are designed and explored to help the probabilistic framework to deal with multiple challenges involved in automatic AU detection. The proposed AU intensity estimation method and its features are evaluated quantitatively and qualitatively using three different datasets containing either spontaneous or acted facial expressions with AU annotations. The proposed method produced temporally smoother estimates that facilitate a fine-grained analysis of facial expressions. It also performed reasonably well, even though it simultaneously estimates intensities of 22 AUs, some of which are subtle in expression or resemble each other closely. The estimated AU intensities tended to the lower range of values, and were often accompanied by a small delay in onset. This shows that the proposed method is conservative. In order to further improve performance, state-of-the-art machine learning approaches for AU detection could be integrated within the proposed probabilistic AU intensity estimation framework.
Computer graphics research strives to synthesize images of a high visual realism that are indistinguishable from real visual experiences. While modern image synthesis approaches enable to create digital images of astonishing complexity and beauty, processing resources remain a limiting factor. Here, rendering efficiency is a central challenge involving a trade-off between visual fidelity and interactivity. For that reason, there is still a fundamental difference between the perception of the physical world and computer-generated imagery. At the same time, advances in display technologies drive the development of novel display devices. The dynamic range, the pixel densities, and refresh rates are constantly increasing. Display systems enable a larger visual field to be addressed by covering a wider field-of-view, due to either their size or in the form of head-mounted devices. Currently, research prototypes are ranging from stereo and multi-view systems, head-mounted devices with adaptable lenses, up to retinal projection, and lightfield/holographic displays. Computer graphics has to keep step with, as driving these devices presents us with immense challenges, most of which are currently unsolved. Fortunately, the human visual system has certain limitations, which means that providing the highest possible visual quality is not always necessary. Visual input passes through the eye’s optics, is filtered, and is processed at higher level structures in the brain. Knowledge of these processes helps to design novel rendering approaches that allow the creation of images at a higher quality and within a reduced time-frame. This thesis presents the state-of-the-art research and models that exploit the limitations of perception in order to increase visual quality but also to reduce workload alike - a concept we call perception-driven rendering. This research results in several practical rendering approaches that allow some of the fundamental challenges of computer graphics to be tackled. By using different tracking hardware, display systems, and head-mounted devices, we show the potential of each of the presented systems. The capturing of specific processes of the human visual system can be improved by combining multiple measurements using machine learning techniques. Different sampling, filtering, and reconstruction techniques aid the visual quality of the synthesized images. An in-depth evaluation of the presented systems including benchmarks, comparative examination with image metrics as well as user studies and experiments demonstrated that the methods introduced are visually superior or on the same qualitative level as ground truth, whilst having a significantly reduced computational complexity.
This research investigates the efficacy of multisensory cues for locating targets in Augmented Reality (AR). Sensory constraints can impair perception and attention in AR, leading to reduced performance due to factors such as conflicting visual cues or a restricted field of view. To address these limitations, the research proposes head-based multisensory guidance methods that leverage audio-tactile cues to direct users' attention towards target locations. The research findings demonstrate that this approach can effectively reduce the influence of sensory constraints, resulting in improved search performance in AR. Additionally, the thesis discusses the limitations of the proposed methods and provides recommendations for future research.
The detection of human skin in images is a very desirable feature for applications such as biometric face recognition, which is becoming more frequently used for, e.g., automated border or access control. However, distinguishing real skin from other materials based on imagery captured in the visual spectrum alone and in spite of varying skin types and lighting conditions can be dicult and unreliable. Therefore, spoofing attacks with facial disguises or masks are still a serious problem for state of the art face recognition algorithms. This dissertation presents a novel approach for reliable skin detection based on spectral remission properties in the short-wave infrared (SWIR) spectrum and proposes a cross-modal method that enhances existing solutions for face verification to ensure the authenticity of a face even in the presence of partial disguises or masks. Furthermore, it presents a reference design and the necessary building blocks for an active multispectral camera system that implements this approach, as well as an in-depth evaluation. The system acquires four-band multispectral images within T = 50ms. Using a machine-learning-based classifier, it achieves unprecedented skin detection accuracy, even in the presence of skin-like materials used for spoofing attacks. Paired with a commercial face recognition software, the system successfully rejected all evaluated attempts to counterfeit a foreign face.