006 Spezielle Computerverfahren
Refine
Departments, institutes and facilities
- Fachbereich Informatik (60)
- Institute of Visual Computing (IVC) (29)
- Fachbereich Wirtschaftswissenschaften (17)
- Institut für Verbraucherinformatik (IVI) (17)
- Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE) (12)
- Institut für Sicherheitsforschung (ISF) (8)
- Fachbereich Ingenieurwissenschaften und Kommunikation (5)
- Graduierteninstitut (3)
- Institut für KI und Autonome Systeme (A2S) (2)
- Zentrum für Ethik und Verantwortung (ZEV) (2)
Document Type
- Conference Object (46)
- Article (36)
- Part of a Book (7)
- Preprint (5)
- Report (5)
- Contribution to a Periodical (4)
- Doctoral Thesis (4)
- Book (monograph, edited volume) (2)
- Research Data (2)
- Patent (1)
Year of publication
Keywords
- Augmented Reality (5)
- Machine Learning (4)
- Knowledge Graphs (3)
- Machine learning (3)
- Virtual Reality (3)
- deep learning (3)
- facial expression analysis (3)
- haptics (3)
- virtual reality (3)
- 3D user interface (2)
Wie KI Innere Führung lernt
(2022)
Dass sich künstliche Intelligenz (KI) weltweit ausgebreitet hat, ist eine Binsenwahrheit. Die rasche und unaufhaltsame Proliferation von KI der letzten zehn Jahre spricht für sich, und längst ziehen auch Gesetzgeber und Regulierungsbehörden nach, um KI und ihre Technikfolgen einzuhegen. Für Deutschland relevante Gestaltungsanforderungen haben die High-Level Expert Group on Artificial Intelligence der Europäischen Kommission (HLEG AI) und auf nationaler Ebene die Datenethikkommission der Bundesregierung (DEK) und die Enquetekommission Künstliche Intelligenz des Deutschen Bundestags (EKKI) geäußert.
Background: Virtual reality combined with spherical treadmills is used across species for studying neural circuits underlying navigation.
New Method: We developed an optical flow-based method for tracking treadmil ball motion in real-time using a single high-resolution camera.
Results: Tracking accuracy and timing were determined using calibration data. Ball tracking was performed at 500 Hz and integrated with an open source game engine for virtual reality projection. The projection was updated at 120 Hz with a latency with respect to ball motion of 30 ± 8 ms.
Comparison: with Existing Method(s) Optical flow based tracking of treadmill motion is typically achieved using optical mice. The camera-based optical flow tracking system developed here is based on off-the-shelf components and offers control over the image acquisition and processing parameters. This results in flexibility with respect to tracking conditions – such as ball surface texture, lighting conditions, or ball size – as well as camera alignment and calibration.
Conclusions: A fast system for rotational ball motion tracking suitable for virtual reality animal behavior across different scales was developed and characterized.
While humans can effortlessly pick a view from multiple streams, automatically choosing the best view is a challenge. Choosing the best view from multi-camera streams poses a problem regarding which objective metrics should be considered. Existing works on view selection lack consensus about which metrics should be considered to select the best view. The literature on view selection describes diverse possible metrics. And strategies such as information-theoretic, instructional design, or aesthetics-motivated fail to incorporate all approaches. In this work, we postulate a strategy incorporating information-theoretic and instructional design-based objective metrics to select the best view from a set of views. Traditionally, information-theoretic measures have been used to find the goodness of a view, such as in 3D rendering. We adapted a similar measure known as the viewpoint entropy for real-world 2D images. Additionally, we incorporated similarity penalization to get a more accurate measure of the entropy of a view, which is one of the metrics for the best view selection. Since the choice of the best view is domain-dependent, we chose demonstration-based training scenarios as our use case. The limitation of our chosen scenarios is that they do not include collaborative training and solely feature a single trainer. To incorporate instructional design considerations, we included the trainer’s body pose, face, face when instructing, and hands visibility as metrics. To incorporate domain knowledge we included predetermined regions’ visibility as another metric. All of those metrics are taken into account to produce a parameterized view recommendation approach for demonstration-based training. An online study using recorded multi-camera video streams from a simulation environment was used to validate those metrics. Furthermore, the responses from the online study were used to optimize the view recommendation performance with a normalized discounted cumulative gain (NDCG) value of 0.912, which shows good performance with respect to matching user choices.
Neutral buoyancy has been used as an analog for microgravity from the earliest days of human spaceflight. Compared to other options on Earth, neutral buoyancy is relatively inexpensive and presents little danger to astronauts while simulating some aspects of microgravity. Neutral buoyancy removes somatosensory cues to the direction of gravity but leaves vestibular cues intact. Removal of both somatosensory and direction of gravity cues while floating in microgravity or using virtual reality to establish conflicts between them has been shown to affect the perception of distance traveled in response to visual motion (vection) and the perception of distance. Does removal of somatosensory cues alone by neutral buoyancy similarly impact these perceptions? During neutral buoyancy we found no significant difference in either perceived distance traveled nor perceived size relative to Earth-normal conditions. This contrasts with differences in linear vection reported between short- and long-duration microgravity and Earth-normal conditions. These results indicate that neutral buoyancy is not an effective analog for microgravity for these perceptual effects.
Vection underwater
(2022)
Using Visual and Auditory Cues to Locate Out-of-View Objects in Head-Mounted Augmented Reality
(2021)
„Industrie 4.0“ und weitere Schlagwörter wie „Big Data“, „Internet der Dinge“ oder „Cyber-physical Systems“ werden gegenwärtig in der Wirtschaft häufig aufgegriffen. Ausgangspunkt hierfür ist die Vernetzung von IT-Technologien sowie die durchgängige Digitalisierung. Nicht nur die Geschäftsfelder und Business-Modelle der Unternehmen selbst unterliegen dabei ei-nem entsprechend radikalen Wandel, dieser bezieht sich auch auf die Arbeitsumgebungen der Mitarbeiter, sowie den privaten und den öffentlichen Raum (Botthof, 2015; Hartmann, 2015).
It is challenging to provide users with a haptic weight sensation of virtual objects in VR since current consumer VR controllers and software-based approaches such as pseudo-haptics cannot render appropriate haptic stimuli. To overcome these limitations, we developed a haptic VR controller named Triggermuscle that adjusts its trigger resistance according to the weight of a virtual object. Therefore, users need to adapt their index finger force to grab objects of different virtual weights. Dynamic and continuous adjustment is enabled by a spring mechanism inside the casing of an HTC Vive controller. In two user studies, we explored the effect on weight perception and found large differences between participants for sensing change in trigger resistance and thus for discriminating virtual weights. The variations were easily distinguished and associated with weight by some participants while others did not notice them at all. We discuss possible limitations, confounding factors, how to overcome them in future research and the pros and cons of this novel technology.
Towards self-explaining social robots. Verbal explanation strategies for a needs-based architecture
(2019)
In order to establish long-term relationships with users, social companion robots and their behaviors need to be comprehensible. Purely reactive behavior such as answering questions or following commands can be readily interpreted by users. However, the robot's proactive behaviors, included in order to increase liveliness and improve the user experience, often raise a need for explanation. In this paper, we provide a concept to produce accessible “why-explanations” for the goal-directed behavior an autonomous, lively robot might produce. To this end we present an architecture that provides reasons for behaviors in terms of comprehensible needs and strategies of the robot, and we propose a model for generating different kinds of explanations.
This dissertation presents a probabilistic state estimation framework for integrating data-driven machine learning models and a deformable facial shape model in order to estimate continuous-valued intensities of 22 different facial muscle movements, known as Action Units (AU), defined in the Facial Action Coding System (FACS). A practical approach is proposed and validated for integrating class-wise probability scores from machine learning models within a Gaussian state estimation framework. Furthermore, driven mass-spring-damper models are applied for modelling the dynamics of facial muscle movements. Both facial shape and appearance information are used for estimating AU intensities, making it a hybrid approach. Several features are designed and explored to help the probabilistic framework to deal with multiple challenges involved in automatic AU detection. The proposed AU intensity estimation method and its features are evaluated quantitatively and qualitatively using three different datasets containing either spontaneous or acted facial expressions with AU annotations. The proposed method produced temporally smoother estimates that facilitate a fine-grained analysis of facial expressions. It also performed reasonably well, even though it simultaneously estimates intensities of 22 AUs, some of which are subtle in expression or resemble each other closely. The estimated AU intensities tended to the lower range of values, and were often accompanied by a small delay in onset. This shows that the proposed method is conservative. In order to further improve performance, state-of-the-art machine learning approaches for AU detection could be integrated within the proposed probabilistic AU intensity estimation framework.
Towards explaining deep learning networks to distinguish facial expressions of pain and emotions
(2018)
Deep learning networks are successfully used for object and face recognition in images and videos. In order to be able to apply such networks in practice, for example in hospitals as a pain recognition tool, the current procedures are only suitable to a limited extent. The advantage of deep learning methods is that they can learn complex non-linear relationships between raw data and target classes without limiting themselves to a set of hand-crafted features provided by humans. However, the disadvantage is that due to the complexity of these networks, it is not possible to interpret the knowledge that is stored inside the network. It is a black-box learning procedure. Explainable Artificial Intelligence (AI) approaches mitigate this problem by extracting explanations for decisions and representing them in a human-interpretable form. The aim of this paper is to investigate the explainable AI method Layer-wise Relevance Propagation (LRP) and apply it to explain how a deep learning network distinguishes facial expressions of pain from facial expressions of emotions such as happiness and disgust.
Towards an Interaction-Centered and Dynamically Constructed Episodic Memory for Social Robots
(2020)
Self-motion perception is a multi-sensory process that involves visual, vestibular, and other cues. When perception of self-motion is induced using only visual motion, vestibular cues indicate that the body remains stationary, which may bias an observer’s perception. When lowering the precision of the vestibular cue by for example, lying down or by adapting to microgravity, these biases may decrease, accompanied by a decrease in precision. To test this hypothesis, we used a move-to-target task in virtual reality. Astronauts and Earth-based controls were shown a target at a range of simulated distances. After the target disappeared, forward self-motion was induced by optic flow. Participants indicated when they thought they had arrived at the target’s previously seen location. Astronauts completed the task on Earth (supine and sitting upright) prior to space travel, early and late in space, and early and late after landing. Controls completed the experiment on Earth using a similar regime with a supine posture used to simulate being in space. While variability was similar across all conditions, the supine posture led to significantly higher gains (target distance/perceived travel distance) than the sitting posture for the astronauts pre-flight and early post-flight but not late post-flight. No difference was detected between the astronauts’ performance on Earth and onboard the ISS, indicating that judgments of traveled distance were largely unaffected by long-term exposure to microgravity. Overall, this constitutes mixed evidence as to whether non-visual cues to travel distance are integrated with relevant visual cues when self-motion is simulated using optic flow alone.
The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models (KGEMs). However, representations based on a single modality are inherently limited. To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs. This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler (INDRA) consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against two baseline models trained on either one of the modalities (i.e., text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.083. Additionally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. Finally, the source code and pre-trained STonKGs models are available at https://github.com/stonkgs/stonkgs and https://huggingface.co/stonkgs/stonkgs-150k.
MOTIVATION
The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models (KGEMs). However, representations based on a single modality are inherently limited.
RESULTS
To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs (KGs). This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations in a shared embedding space. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler (INDRA) consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against three baseline models trained on either one of the modalities (i.e., text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.084 (i.e., from 0.881 to 0.965). Finally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications.
AVAILABILITY
We make the source code and the Python package of STonKGs available at GitHub (https://github.com/stonkgs/stonkgs) and PyPI (https://pypi.org/project/stonkgs/). The pre-trained STonKGs models and the task-specific classification models are respectively available at https://huggingface.co/stonkgs/stonkgs-150k and https://zenodo.org/communities/stonkgs.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Die Entwicklung intelligenter Technologien zur Unterstützung im Alltag und in den eigenen vier Wänden begleitet unsere Gesellschaft schon seit dem Zeitalter des Personal Computers. Mit dem Aufkommen des Internet der Dinge und begünstigt durch immer kleiner und günstiger werdende Hardware ergeben sich neue Potenziale, die das Thema Smart Home attraktiver als je zuvor werden lassen. Eine Vielzahl der aktuell im Markt verfügbaren Lösungen adressiert die Bedürfnisse Komfort, Sicherheit und effiziente Energienutzung. Die versprochene Intelligenz – smartness, wie sie der Begriff selbst suggeriert – wird vor allem bei Lösungen im privaten Nachrüstbereich überwiegend durch die Interaktion der Nutzer selbst und entsprechende regelbasierte Konfigurationen erzeugt. Diese notwendige Art der Interaktion und die damit verbundenen Aufwände sind jedoch von starker Bedeutung für das gesamte Nutzungserlebnis Smart Home und führen nicht selten zu Frustration oder gar Resignation in der Nutzung.
In recent years, the ability of intelligent systems to be understood by developers and users has received growing attention. This holds in particular for social robots, which are supposed to act autonomously in the vicinity of human users and are known to raise peculiar, often unrealistic attributions and expectations. However, explainable models that, on the one hand, allow a robot to generate lively and autonomous behavior and, on the other, enable it to provide human-compatible explanations for this behavior are missing. In order to develop such a self-explaining autonomous social robot, we have equipped a robot with own needs that autonomously trigger intentions and proactive behavior, and form the basis for understandable self-explanations. Previous research has shown that undesirable robot behavior is rated more positively after receiving an explanation. We thus aim to equip a social robot with the capability to automatically generate verbal explanations of its own behavior, by tracing its internal decision-making routes. The goal is to generate social robot behavior in a way that is generally interpretable, and therefore explainable on a socio-behavioral level increasing users' understanding of the robot's behavior. In this article, we present a social robot interaction architecture, designed to autonomously generate social behavior and self-explanations. We set out requirements for explainable behavior generation architectures and propose a socio-interactive framework for behavior explanations in social human-robot interactions that enables explaining and elaborating according to users' needs for explanation that emerge within an interaction. Consequently, we introduce an interactive explanation dialog flow concept that incorporates empirically validated explanation types. These concepts are realized within the interaction architecture of a social robot, and integrated with its dialog processing modules. We present the components of this interaction architecture and explain their integration to autonomously generate social behaviors as well as verbal self-explanations. Lastly, we report results from a qualitative evaluation of a working prototype in a laboratory setting, showing that (1) the robot is able to autonomously generate naturalistic social behavior, and (2) the robot is able to verbally self-explain its behavior to the user in line with users' requests.
Selection Performance and Reliability of Eye and Head Gaze Tracking Under Varying Light Conditions
(2024)
Dieses Dokument präsentiert eine Zusammenfassung der Dissertation der Autorin. In dieser Dissertation [Ha20] wurde ein neuartiger und hybrider Ansatz für die Scha ̈tzung der Intensität von Gesichtsmuskelbewegungen (Action Unit (AU)) vorgeschlagen und validiert. Dieser Ansatz basiert auf einer Gauß’schen Zustandsschätzung und kombiniert ein verformbares, AU-basiertes Gesichtsformmodell, ein viskoelastisches Modell der Gesichtsmuskelbewegung, mehrere erscheinungsbasierten AU-Klassifikatoren und eine Methode zur Erkennung von Gesichtspunkten. Es wurden mehrere Erweiterungen vorgeschlagen und in das Zustandsschätzungs-Framework integriert, um mit den personenspezifischen Eigenschaften sowie technischen und praktischen Herausforderungen umzugehen.Die mit der vorgeschlagenen Methode erzeugten AU-Intensitätsschätzungen wurden für die automatische Erkennung von Schmerzen und für die Analyse von Fahrerablenkung eingesetzt.
Robust Identification and Segmentation of the Outer Skin Layers in Volumetric Fingerprint Data
(2022)
Despite the long history of fingerprint biometrics and its use to authenticate individuals, there are still some unsolved challenges with fingerprint acquisition and presentation attack detection (PAD). Currently available commercial fingerprint capture devices struggle with non-ideal skin conditions, including soft skin in infants. They are also susceptible to presentation attacks, which limits their applicability in unsupervised scenarios such as border control. Optical coherence tomography (OCT) could be a promising solution to these problems. In this work, we propose a digital signal processing chain for segmenting two complementary fingerprints from the same OCT fingertip scan: One fingerprint is captured as usual from the epidermis (“outer fingerprint”), whereas the other is taken from inside the skin, at the junction between the epidermis and the underlying dermis (“inner fingerprint”). The resulting 3D fingerprints are then converted to a conventional 2D grayscale representation from which minutiae points can be extracted using existing methods. Our approach is device-independent and has been proven to work with two different time domain OCT scanners. Using efficient GPGPU computing, it took less than a second to process an entire gigabyte of OCT data. To validate the results, we captured OCT fingerprints of 130 individual fingers and compared them with conventional 2D fingerprints of the same fingers. We found that both the outer and inner OCT fingerprints were backward compatible with conventional 2D fingerprints, with the inner fingerprint generally being less damaged and, therefore, more reliable.
Over the last decades, different kinds of design guides have been created to maintain consistency and usability in interactive system development. However, in the case of spatial applications, practitioners from research and industry either have difficulty finding them or perceive such guides as lacking relevance, practicability, and applicability. This paper presents the current state of scientific research and industry practice by investigating currently used design recommendations for mixed reality (MR) system development. We analyzed and compared 875 design recommendations for MR applications elicited from 89 scientific papers and documentation from six industry practitioners in a literature review. In doing so, we identified differences regarding four key topics: Focus on unique MR design challenges, abstraction regarding devices and ecosystems, level of detail and abstraction of content, and covered topics. Based on that,we contribute to the MR design research by providing three factors for perceived irrelevance and six main implications for design recommendations that are applicable in scientific and industry practice.
In this paper, we introduce an optical sensor system, which is integrated into an industrial push-button. The sensor allows to classify the type of material that is in contact with the button when pressed into different material categories on the basis of the material's so called "spectral signature". An approach for a safety sensor system at circular table saws on the same base has been introduced previously on SIAS-2007. This contactless working sensor is able to distinguish reliably between skin, textiles, leather and various other kinds of materials. A typical application for this intelligent push-button is the use at possibly dangerous machines, whose operating instructions include either the prohibition or the obligation to wear gloves during the work at the machine. An exemple of machines at which no gloves are allowed are pillar drilling machines, because of the risk of getting caught in the drill chuck and being turned in by the machine. In many cases this causes very serious hand injuries. Depending on the application needs, the sensor system integrated into the push-button can be configured flexibly by software to prevent the operator from accidentally starting a machine with or without gloves, which can decrease the risk of severe accidents significantly. Especially two-hand controls are incentive to manipulation for easier handling. By equipping both push-buttons of a two-hand control with material classification properties, the user is forced to operate the controls with his bare fingers. That limitation disallows the manipulation of a two-hand control by a simple rodding device.
ProtSTonKGs: A Sophisticated Transformer Trained on Protein Sequences, Text, and Knowledge Graphs
(2022)
While most approaches individually exploit unstructured data from the biomedical literature or structured data from biomedical knowledge graphs, their union can better exploit the advantages of such approaches, ultimately improving representations of biology. Using multimodal transformers for such purposes can improve performance on context dependent classication tasks, as demonstrated by our previous model, the Sophisticated Transformer Trained on Biomedical Text and Knowledge Graphs (STonKGs). In this work, we introduce ProtSTonKGs, a transformer aimed at learning all-encompassing representations of protein-protein interactions. ProtSTonKGs presents an extension to our previous work by adding textual protein descriptions and amino acid sequences (i.e., structural information) to the text- and knowledge graph-based input sequence used in STonKGs. We benchmark ProtSTonKGs against STonKGs, resulting in improved F1 scores by up to 0.066 (i.e., from 0.204 to 0.270) in several tasks such as predicting protein interactions in several contexts. Our work demonstrates how multimodal transformers can be used to integrate heterogeneous sources of information, paving the foundation for future approaches that use multiple modalities for biomedical applications.
BACKGROUND
Given the unreliable self-report in patients with dementia, pain assessment should also rely on the observation of pain behaviors, such as facial expressions. Ideal observers should be well trained and should observe the patient continuously in order to pick up any pain-indicative behavior; which are requisitions beyond realistic possibilities of pain care. Therefore, the need for video-based pain detection systems has been repeatedly voiced. Such systems would allow for constant monitoring of pain behaviors and thereby allow for a timely adjustment of pain management in these fragile patients, who are often undertreated for pain.
METHODS
In this road map paper we describe an interdisciplinary approach to develop such a video-based pain detection system. The development starts with the selection of appropriate video material of people in pain as well as the development of technical methods to capture their faces. Furthermore, single facial motions are automatically extracted according to an international coding system. Computer algorithms are trained to detect the combination and timing of those motions, which are pain-indicative.
RESULTS/CONCLUSION
We hope to encourage colleagues to join forces and to inform end-users about an imminent solution of a pressing pain-care problem. For the near future, implementation of such systems can be foreseen to monitor immobile patients in intensive and postoperative care situations.
Modern GPUs come with dedicated hardware to perform ray/triangle intersections and bounding volume hierarchy (BVH) traversal. While the primary use case for this hardware is photorealistic 3D computer graphics, with careful algorithm design scientists can also use this special-purpose hardware to accelerate general-purpose computations such as point containment queries. This article explains the principles behind these techniques and their application to vector field visualization of large simulation data using particle tracing.
Computer graphics research strives to synthesize images of a high visual realism that are indistinguishable from real visual experiences. While modern image synthesis approaches enable to create digital images of astonishing complexity and beauty, processing resources remain a limiting factor. Here, rendering efficiency is a central challenge involving a trade-off between visual fidelity and interactivity. For that reason, there is still a fundamental difference between the perception of the physical world and computer-generated imagery. At the same time, advances in display technologies drive the development of novel display devices. The dynamic range, the pixel densities, and refresh rates are constantly increasing. Display systems enable a larger visual field to be addressed by covering a wider field-of-view, due to either their size or in the form of head-mounted devices. Currently, research prototypes are ranging from stereo and multi-view systems, head-mounted devices with adaptable lenses, up to retinal projection, and lightfield/holographic displays. Computer graphics has to keep step with, as driving these devices presents us with immense challenges, most of which are currently unsolved. Fortunately, the human visual system has certain limitations, which means that providing the highest possible visual quality is not always necessary. Visual input passes through the eye’s optics, is filtered, and is processed at higher level structures in the brain. Knowledge of these processes helps to design novel rendering approaches that allow the creation of images at a higher quality and within a reduced time-frame. This thesis presents the state-of-the-art research and models that exploit the limitations of perception in order to increase visual quality but also to reduce workload alike - a concept we call perception-driven rendering. This research results in several practical rendering approaches that allow some of the fundamental challenges of computer graphics to be tackled. By using different tracking hardware, display systems, and head-mounted devices, we show the potential of each of the presented systems. The capturing of specific processes of the human visual system can be improved by combining multiple measurements using machine learning techniques. Different sampling, filtering, and reconstruction techniques aid the visual quality of the synthesized images. An in-depth evaluation of the presented systems including benchmarks, comparative examination with image metrics as well as user studies and experiments demonstrated that the methods introduced are visually superior or on the same qualitative level as ground truth, whilst having a significantly reduced computational complexity.
Advances in computer graphics enable us to create digital images of astonishing complexity and realism. However, processing resources are still a limiting factor. Hence, many costly but desirable aspects of realism are often not accounted for, including global illumination, accurate depth of field and motion blur, spectral effects, etc. especially in real‐time rendering. At the same time, there is a strong trend towards more pixels per display due to larger displays, higher pixel densities or larger fields of view. Further observable trends in current display technology include more bits per pixel (high dynamic range, wider color gamut/fidelity), increasing refresh rates (better motion depiction), and an increasing number of displayed views per pixel (stereo, multi‐view, all the way to holographic or lightfield displays). These developments cause significant unsolved technical challenges due to aspects such as limited compute power and bandwidth. Fortunately, the human visual system has certain limitations, which mean that providing the highest possible visual quality is not always necessary. In this report, we present the key research and models that exploit the limitations of perception to tackle visual quality and workload alike. Moreover, we present the open problems and promising future research targeting the question of how we can minimize the effort to compute and display only the necessary pixels while still offering a user full visual experience.
Noncooperative Game Theory
(2016)
The perceptual upright results from the multisensory integration of the directions indicated by vision and gravity as well as a prior assumption that upright is towards the head. The direction of gravity is signalled by multiple cues, the predominant of which are the otoliths of the vestibular system and somatosensory information from contact with the support surface. Here, we used neutral buoyancy to remove somatosensory information while retaining vestibular cues, thus "splitting the gravity vector" leaving only the vestibular component. In this way, neutral buoyancy can be used as a microgravity analogue. We assessed spatial orientation using the oriented character recognition test (OChaRT, which yields the perceptual upright, PU) under both neutrally buoyant and terrestrial conditions. The effect of visual cues to upright (the visual effect) was reduced under neutral buoyancy compared to on land but the influence of gravity was unaffected. We found no significant change in the relative weighting of vision, gravity, or body cues, in contrast to results found both in long-duration microgravity and during head-down bed rest. These results indicate a relatively minor role for somatosensation in determining the perceptual upright in the presence of vestibular cues. Short-duration neutral buoyancy is a weak analogue for microgravity exposure in terms of its perceptual consequences compared to long-duration head-down bed rest.
Taste is a complex phenomenon that depends on the individual experience and is a matter of collective negotiation and mediation. On the contrary, it is uncommon to include taste and its many facets in everyday design, particularly online shopping for fresh food products. To realize this unused potential, we conducted two Co-Design workshops. Based on the participants’ results in the workshops, we prototyped and evaluated a click-dummy smart-phone app to explore consumers’ needs for digital taste depiction. We found that emphasizing the natural qualities of food products, external reviews, and personalizing features lead to a reflection on the individual taste experience. The self-reflection through our design enables consumers to develop their taste competencies and thus strengthen their autonomy in decision-making. Ultimately, exploring taste as a social experience adds to a broader understanding of taste beyond a sensory phenomenon.
This research investigates the efficacy of multisensory cues for locating targets in Augmented Reality (AR). Sensory constraints can impair perception and attention in AR, leading to reduced performance due to factors such as conflicting visual cues or a restricted field of view. To address these limitations, the research proposes head-based multisensory guidance methods that leverage audio-tactile cues to direct users' attention towards target locations. The research findings demonstrate that this approach can effectively reduce the influence of sensory constraints, resulting in improved search performance in AR. Additionally, the thesis discusses the limitations of the proposed methods and provides recommendations for future research.
In this paper, we provide a participatory design study of a mobile health platform for older adults that provides an integrative perspective on health data collected from different devices and apps. We illustrate the diversity and complexity of older adults’ perspectives in the context of health and technology use, the challenges which follow on for the design of mobile health platforms that support active and healthy ageing (AHA) and our approach to addressing these challenges through a participatory design (PD) process. Interviews were conducted with older adults aged 65+ in a two-month study with the goal of understanding perspectives on health and technologies for AHA support. We identified challenges and derived design ideas for a mobile health platform called “My-AHA”. For researchers in this field, the structured documentation of our procedures and results, as well as the implications derived provide valuable insights for the design of mobile health platforms for older adults.
It is only a matter of time until autonomous vehicles become ubiquitous; however, human driving supervision will remain a necessity for decades. To assess the drive's ability to take control over the vehicle in critical scenarios, driver distractions can be monitored using wearable sensors or sensors that are embedded in the vehicle, such as video cameras. The types of driving distractions that can be sensed with various sensors is an open research question that this study attempts to answer. This study compared data from physiological sensors (palm electrodermal activity (pEDA), heart rate and breathing rate) and visual sensors (eye tracking, pupil diameter, nasal EDA (nEDA), emotional activation and facial action units (AUs)) for the detection of four types of distractions. The dataset was collected in a previous driving simulation study. The statistical tests showed that the most informative feature/modality for detecting driver distraction depends on the type of distraction, with emotional activation and AUs being the most promising. The experimental comparison of seven classical machine learning (ML) and seven end-to-end deep learning (DL) methods, which were evaluated on a separate test set of 10 subjects, showed that when classifying windows into distracted or not distracted, the highest F1-score of 79%; was realized by the extreme gradient boosting (XGB) classifier using 60-second windows of AUs as input. When classifying complete driving sessions, XGB's F1-score was 94%. The best-performing DL model was a spectro-temporal ResNet, which realized an F1-score of 75%; when classifying segments and an F1-score of 87%; when classifying complete driving sessions. Finally, this study identified and discussed problems, such as label jitter, scenario overfitting and unsatisfactory generalization performance, that may adversely affect related ML approaches.