pub H-BRS | 004 Datenverarbeitung; Informatik

HyenaPixel: Global Image Context with Convolutions (2024)

Spravil, Julian ; Houben, Sebastian ; Behnke, Sven

In vision tasks, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, convolution requires multiple stacked layers and a hierarchical structure for large context. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to the non-causal two-dimensional image space. We scale the Hyena convolution kernels beyond the feature map size up to 191$\times$191 to maximize the ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 83.0% and 83.5%, respectively, while outperforming other large-kernel networks. Combining HyenaPixel with attention further increases accuracy to 83.6%. We attribute the success of attention to the lack of spatial bias in later stages and support this finding with bidirectional Hyena.

Constructing Realities – Professionals’ Approaches to Prototyping XR Experiences (2024)

Krauß, Veronika

In recent years, eXtended Reality (XR) technology like Augmented Reality and Virtual Reality became both technically feasible as well as affordable which lead to a drastic demand of professionally designed and developed applications. However, this demand combined with a rapid pace of innovation revealed a lack of design tool support for professional interaction designers as well as a knowledge gap regarding their approaches and needs. To address this gap, this thesis engages with the work of professional XR interaction designers in a qualitative research into XR interaction design approach. Therefore, this thesis applies two complementary lenses stemming from scientific design and social practice theory discourses to observe, describe, analyze, and understand professional XR interaction designers' challenges and approaches with a focus on application prototyping.

Reflective navigation: individual behaviors and group behaviors (2004)

Kluge, Boris ; Prassler, Erwin

A Thousand Worlds: Scenery Specification and Generation for Simulation-Based Testing of Mobile Robot Navigation Stacks (2023)

Parra, Samuel ; Ortega, Argentina ; Schneider, Sven ; Hochgeschwender, Nico

A Study of Demonstration-Based Learning of Upper-Body Motions in the Context of Robot-Assisted Therapy (2023)

Quiroga, Natalia ; Mitrevski, Alex ; Plöger, Paul G.

Harnessing Sheaf Theory for Enhanced Air Quality Monitoring: Overcoming Conventional Limitations with Topology-Inspired Self-correcting Algorithm (2023)

Pham, Anh-Duy ; Le, An Dinh ; Le, Chuong Dinh ; Pham, Hoang Viet ; Vo, Hien Bich

A Hard- and Software System for Improving Automation in Metal Oxide Gassensor Production and Reliable Operation as Compact and Mobile Multi-Sensor Array (2023)

Hammer, Christof

Automated Testing of Standard Conformance for Robots (2023)

Sohail, Salman Omar ; Schneider, Sven ; Hochgeschwender, Nico

Adaptive Compliant Robot Control with Failure Recovery for Object Press-Fitting (2023)

Sharma, Ekansh ; Henke, Christoph ; Mitrevski, Alex ; Plöger, Paul G.

Multi-modal Emotion Categorization in Oral History Interviews (2023)

Viswanath, Anargh

This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.

Continual Learning in Object Detection (2023)

Tran Tien, Huy

Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.

RECol: Reconstruction Error Columns for Outlier Detection (2023)

Herurkar, Dayananda ; Meier, Mario ; Hees, Jörn

Explainable production planning under partial observability in high-precision manufacturing (2023)

Weichert, Dorina ; Kister, Alexander ; Volbach, Peter ; Houben, Sebastian ; Trost, Marcus ; Wrobel, Stefan

Gesture Recognition Model with Multi-Tracking Capture System for Human-Robot Interaction (2023)

Nguyen, Khang H. V. ; Pham, Anh-Duy ; Minh, Tri Bien ; Phan, Thi Truc Thao ; Do, Xuan Phu

The design space of building user-centered AI user interfaces for smart heating systems (2023)

Jin, Lu ; Boden, Alexander

Smart heating systems are one of the core components of smart homes. A large portion of domestic energy consumption is derived from HVAC (heating, ventilation and air conditioning) systems, making them a relevant topic of the efforts to support an energy transition in private housing. For that reason, the technology has attracted attention both from the academic and the industry communities. User interfaces of smart heating systems have evolved from simple adjusting knobs to advanced data visualization interfaces, that allow for more advanced setting such as time tables and status information. With the advent of AI, we are interested in exploring how the interfaces will be evolving to build the connection between user needs and underlying AI system. Hence, this paper is targeted to provide early design implications towards an AI-based user interface for smart heating systems.

A Qualitative Exploration of User-Perceived Risks of AI to Inform Design and Policy (2023)

Recki, Lena ; Lawo, Dennis ; Krauß, Veronika ; Pins, Dominik

AI systems pose unknown challenges for designers, policymakers, and users which aggravates the assessment of potential harms and outcomes. Although understanding risks is a requirement for building trust in technology, users are often excluded from legal assessments and explanations of AI hazards. To address this issue we conducted three focus groups with 18 participants in total and discussed the European proposal for a legal framework for AI. Based on this, we aim to build a (conceptual) model that guides policymakers, designers, and researchers in understanding users’ risk perception of AI systems. In this paper, we provide selected examples based on our preliminary results. Moreover, we argue for the benefits of such a perspective.

User-friendly Explanatory Dialogues (2023)

Alizadeh, Fatemeh ; Pins, Dominik ; Stevens, Gunnar

When dialogues with voice assistants (VAs) fall apart, users often become confused or even frustrated. To address these issues and related privacy concerns, Amazon recently introduced a feature allowing Alexa users to inquire about why it behaved in a certain way. But how do users perceive this new feature? In this paper, we present preliminary results from research conducted as part of a three-year project involving 33 German households. This project utilized interviews, fieldwork, and co-design workshops to identify common unexpected behaviors of VAs, as well as users’ needs and expectations for explanations. Our findings show that, contrary to its intended purpose, the new feature actually exacerbates user confusion and frustration instead of clarifying Alexa's behavior. We argue that such voice interactions should be characterized as explanatory dialogs that account for VA’s unexpected behavior by providing interpretable information and prompting users to take action to improve their current and future interactions.

9. Usable Security und Privacy Workshop (2023)

Lo Iacono, Luigi ; Schmitt, Hartmut ; Feth, Denis ; Heinemann, Andreas

Ziel der neunten Ausgabe des wissenschaftlichen Workshops "Usable Security und Privacy" auf der Mensch und Computer 2023 ist es, aktuelle Forschungs- und Praxisbeiträge auf diesem Gebiet zu präsentieren und mit den Teilnehmer:innen zu diskutieren. Getreu dem Konferenzmotto "Building Bridges" soll mit dem Workshop ein etabliertes Forum fortgeführt und weiterentwickelt werden, in dem sich Expert:innen, Forscher:innen und Praktiker:innen aus unterschiedlichen Domänen transdisziplinär zum Thema Usable Security und Privacy austauschen können. Das Thema betrifft neben dem Usability- und Security-Engineering unterschiedliche Forschungsgebiete und Berufsfelder, z. B. Informatik, Ingenieurwissenschaften, Mediengestaltung und Psychologie. Der Workshop richtet sich an interessierte Wissenschaftler:innen aus all diesen Bereichen, aber auch ausdrücklich an Vertreter:innen der Wirtschaft, Industrie und öffentlichen Verwaltung.

Modellbasierte Simulation und Prädiktion der Herzfrequenz im Ausdauersport zur Einschätzung der Leistungsentwicklung (2023)

Ludwig, Melanie

Eine Überprüfung der Leistungsentwicklung im Radsport geht bis heute mit der Durchführung einer spezifischen Leistungsdiagnostik unter Verwendung vorgegebener Testprotokolle einher. Durch die zwischenzeitlich stark gestiegene Popularität von »wearable devices« ist es gleichzeitig heutzutage sehr einfach, die Herzfrequenz im Alltag und bei sportlichen Aktivitäten aufzuzeichnen. Doch eine geeignete Modellierung der Herzfrequenz, die es ermöglicht, Rückschlüsse über die Leistungsentwicklung ziehen zu können, fehlt bislang. Die Herzfrequenzaufzeichnungen in Kombination mit einer phänomenologisch interpretierbaren Modellierung zu nutzen, um auf möglichst direkte Weise und ohne spezifische Anforderungen an die Trainingsfahrten Rückschlüsse über die Leistungsentwicklung ziehen zu können, bietet die Chance, sowohl im professionellen Radsport wie auch in der ambitionierten Radsportpraxis den Erkenntnisgewinn über die eigene Leistungsentwicklung maßgeblich zu vereinfachen. In der vorliegenden Arbeit wird ein neuartiges und phänomenologisch interpretierbares Modell zur Simulation und Prädiktion der Herzfrequenz beim Radsport vorgestellt und im Rahmen einer empirischen Studie validiert. Dieses Modell ermöglicht es, die Herzfrequenz (sowie andere Beanspruchungsparameter aus Atemgasanalysen) mit adäquater Genauigkeit zu simulieren und bei vorgegebener Wattbelastung zu prognostizieren. Weiterhin wird eine Methode zur Reduktion der Anzahl der kalibrierbaren freien Modellparameter vorgestellt und in zwei empirischen Studien validiert. Nach einer individualisierten Parameterreduktion kann das Modell mit lediglich einem einzigen freien Parameter verwendet werden. Dieser verbleibende freie Parameter bietet schließlich die Möglichkeit, im zeitlichen Verlauf mit dem Verlauf der Leistungsentwicklung verglichen zu werden. In zwei unterschiedlichen Studien zeigt sich, dass der freie Modellparameter grundsätzlich in der Lage zu sein scheint, den Verlauf der Leistungsentwicklung über die Zeit abzubilden.

Jahresbericht 2022 (2023)

Der neue Jahresberichte des Instituts für IT-Service (ITS) für das Jahr 2022.

Adaptive Compliant Robot Control with Failure Recovery for Object Press-Fitting (2023)

Sharma, Ekansh ; Henke, Christoph ; Mitrevski, Alex ; Plöger, Paul G.

Loading of shipping containers for dairy products often includes a press-fit task, which involves manually stacking milk cartons in a container without using pallets or packaging. Automating this task with a mobile manipulator can reduce worker strain, and also enhance the efficiency and safety of the container loading process. This paper proposes an approach called Adaptive Compliant Control with Integrated Failure Recovery (ACCIFR), which enables a mobile manipulator to reliably perform the press-fit task. We base the approach on a demonstration learning-based compliant control framework, such that we integrate a monitoring and failure recovery mechanism for successful task execution. Concretely, we monitor the execution through distance and force feedback, detect collisions while the robot is performing the press-fit task, and use wrench measurements to classify the direction of collision; this information informs the subsequent recovery process. We evaluate the method on a miniature container setup, considering variations in the (i) starting position of the end effector, (ii) goal configuration, and (iii) object grasping position. The results demonstrate that the proposed approach outperforms the baseline demonstration-based learning framework regarding adaptability to environmental variations and the ability to recover from collision failures, making it a promising solution for practical press-fit applications.

On the Suitability of Representations for Quality Diversity Optimization of Shapes (2023)

Scarton, Ludovico ; Hagg, Alexander

The representation, or encoding, utilized in evolutionary algorithms has a substantial effect on their performance. Examination of the suitability of widely used representations for quality diversity optimization (QD) in robotic domains has yielded inconsistent results regarding the most appropriate encoding method. Given the domain-dependent nature of QD, additional evidence from other domains is necessary. This study compares the impact of several representations, including direct encoding, a dictionary-based representation, parametric encoding, compositional pattern producing networks, and cellular automata, on the generation of voxelized meshes in an architecture setting. The results reveal that some indirect encodings outperform direct encodings and can generate more diverse solution sets, especially when considering full phenotypic diversity. The paper introduces a multi-encoding QD approach that incorporates all evaluated representations in the same archive. Species of encodings compete on the basis of phenotypic features, leading to an approach that demonstrates similar performance to the best single-encoding QD approach. This is noteworthy, as it does not always require the contribution of the best-performing single encoding.

Evaluation of LoRa in a Real-World Smart City: Selected Insights and Findings (2023)

Horstmann, Thorsten ; Rademacher, Michael ; Roobi, Marco

Analysing the Safety and Security of a UV-C Disinfection Robot (2023)

Nurchalifah, Desiana ; Blumenthal, Sebastian ; Lo Iacono, Luigi ; Hochgeschwender, Nico

Domain-specific languages for kinematic chains and their solver algorithms: lessons learned for composable models (2023)

Schneider, Sven ; Hochgeschwender, Nico ; Bruyninckx, Herman