004 Datenverarbeitung; Informatik
Refine
Departments, institutes and facilities
- Fachbereich Informatik (792)
- Institute of Visual Computing (IVC) (231)
- Institut für Cyber Security & Privacy (ICSP) (133)
- Institut für funktionale Gen-Analytik (IFGA) (68)
- Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE) (52)
- Fachbereich Wirtschaftswissenschaften (35)
- Institut für Verbraucherinformatik (IVI) (31)
- Fachbereich Ingenieurwissenschaften und Kommunikation (22)
- Institut für Sicherheitsforschung (ISF) (21)
- Graduierteninstitut (20)
Document Type
- Conference Object (718)
- Article (192)
- Report (53)
- Preprint (44)
- Part of a Book (36)
- Master's Thesis (31)
- Doctoral Thesis (27)
- Conference Proceedings (18)
- Book (monograph, edited volume) (15)
- Research Data (8)
Year of publication
Language
- English (1154) (remove)
Keywords
- Robotics (12)
- Virtual Reality (10)
- Machine Learning (9)
- virtual reality (9)
- Quality diversity (7)
- DPA (6)
- Robotik (6)
- 3D user interface (5)
- Benchmarking (5)
- Measurement (5)
The Information and Communication Technology (ICT) sector is a significant global industry, and addressing climate change is of critical importance. This paper aims to assess the resources utilized by the ICT sector, the associated negative environmental impacts, and potential mitigation measures. In order to understand these aspects, this study attempts to categorize the resources used by ICT, analyze the amount consumed and the resulting negative impacts, and determine what measures exist to mitigate them. An economic and empirical evaluation shows a negative trend in ICT’s resource consumption, mainly due to increased energy consumption and rising carbon emissions from devices such as smartphones and data centers. The investigated countermeasures focus on Green IT strategies that encompass energy efficiency, carbon awareness, and hardware efficiency principles as outlined by the Green Software Foundation. Special attention is given to reducing the environmental footprint of data center operations and smartphones. This paper concludes that Green IT strategies, although promising in theory, are often not implemented at an industry level.
In vision tasks, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, convolution requires multiple stacked layers and a hierarchical structure for large context. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to the non-causal two-dimensional image space. We scale the Hyena convolution kernels beyond the feature map size up to 191$\times$191 to maximize the ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 83.0% and 83.5%, respectively, while outperforming other large-kernel networks. Combining HyenaPixel with attention further increases accuracy to 83.6%. We attribute the success of attention to the lack of spatial bias in later stages and support this finding with bidirectional Hyena.
In recent years, eXtended Reality (XR) technology like Augmented Reality and Virtual Reality became both technically feasible as well as affordable which lead to a drastic demand of professionally designed and developed applications. However, this demand combined with a rapid pace of innovation revealed a lack of design tool support for professional interaction designers as well as a knowledge gap regarding their approaches and needs. To address this gap, this thesis engages with the work of professional XR interaction designers in a qualitative research into XR interaction design approach. Therefore, this thesis applies two complementary lenses stemming from scientific design and social practice theory discourses to observe, describe, analyze, and understand professional XR interaction designers' challenges and approaches with a focus on application prototyping.
This paper presents the b-it-bots RoboCup@Work team and its current hardware and functional architecture for the KUKA youBot robot. We describe the underlying software framework and the developed capabilities required for operating in industrial environments including features such as reliable and precise navigation, flexible manipulation, robust object recognition and task planning. New developments include an approach to grasp vertical objects, placement of objects by considering the empty space on a workstation, and the process of porting our code to ROS2.
Neuromorphic computing aims to mimic the computational principles of the brain in silico and has motivated research into event-based vision and spiking neural networks (SNNs). Event cameras (ECs) capture local, independent changes in brightness, and offer superior power consumption, response latencies, and dynamic ranges compared to frame-based cameras. SNNs replicate neuronal dynamics observed in biological neurons and propagate information in sparse sequences of ”spikes”. Apart from biological fidelity, SNNs have demonstrated potential as an alternative to conventional artificial neural networks (ANNs), such as in reducing energy expenditure and inference time in visual classification. Although potentially beneficial for robotics, the novel event-driven and spike-based paradigms remain scarcely explored outside the domain of aerial robots.
To investigate the utility of brain-inspired sensing and data processing in a robotics application, we developed a neuromorphic approach to real-time, online obstacle avoidance on a manipulator with an onboard camera. Our approach adapts high-level trajectory plans with reactive maneuvers by processing emulated event data in a convolutional SNN, decoding neural activations into avoidance motions, and adjusting plans in a dynamic motion primitive formulation. We conducted simulated and real experiments with a Kinova Gen3 arm performing simple reaching tasks involving static and dynamic obstacles. Our implementation was systematically tuned, validated, and tested in sets of distinct task scenarios, and compared to a non-adaptive baseline through formalized quantitative metrics and qualitative criteria.
The neuromorphic implementation facilitated reliable avoidance of imminent collisions in most scenarios, with 84% and 92% median success rates in simulated and real experiments, where the baseline consistently failed. Adapted trajectories were qualitatively similar to baseline trajectories, indicating low impacts on safety, predictability and smoothness criteria. Among notable properties of the SNN were the correlation of processing time with the magnitude of perceived motions (captured in events) and robustness to different event emulation methods. Preliminary tests with a DAVIS346 EC showed similar performance, validating our experimental event emulation method. These results motivate future efforts to incorporate SNN learning, utilize neuromorphic processors, and target other robot tasks to further explore this approach.
This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
The continuously increasing number of biomedical scholarly publications makes it challenging to construct document recommendation algorithms that can efficiently navigate through literature. Such algorithms would help researchers in finding similar, relevant, and related publications that align with their research interests. Natural Language Processing offers various alternatives to compare publications, ranging from entity recognition to document embeddings. In this paper, we present the results of a comparative analysis of vector-based approaches to assess document similarity in the RELISH corpus. We aim to determine the best approach that resembles relevance without the need for further training. Specifically, we employ five different techniques to generate vectors representing the text in the documents. These techniques employ a combination of various Natural Language Processing frameworks such as Word2Vec, Doc2Vec, dictionary-based Named Entity Recognition, and state-of-the-art models based on BERT. To evaluate the document similarity obtained by these approaches, we utilize different evaluation metrics that account for relevance judgment, relevance search, and re-ranking of the relevance search. Our results demonstrate that the most promising approach is an in-house version of document embeddings, starting with word embeddings and using centroids to aggregate them by document.
Smart heating systems are one of the core components of smart homes. A large portion of domestic energy consumption is derived from HVAC (heating, ventilation and air conditioning) systems, making them a relevant topic of the efforts to support an energy transition in private housing. For that reason, the technology has attracted attention both from the academic and the industry communities. User interfaces of smart heating systems have evolved from simple adjusting knobs to advanced data visualization interfaces, that allow for more advanced setting such as time tables and status information. With the advent of AI, we are interested in exploring how the interfaces will be evolving to build the connection between user needs and underlying AI system. Hence, this paper is targeted to provide early design implications towards an AI-based user interface for smart heating systems.
AI systems pose unknown challenges for designers, policymakers, and users which aggravates the assessment of potential harms and outcomes. Although understanding risks is a requirement for building trust in technology, users are often excluded from legal assessments and explanations of AI hazards. To address this issue we conducted three focus groups with 18 participants in total and discussed the European proposal for a legal framework for AI. Based on this, we aim to build a (conceptual) model that guides policymakers, designers, and researchers in understanding users’ risk perception of AI systems. In this paper, we provide selected examples based on our preliminary results. Moreover, we argue for the benefits of such a perspective.
When dialogues with voice assistants (VAs) fall apart, users often become confused or even frustrated. To address these issues and related privacy concerns, Amazon recently introduced a feature allowing Alexa users to inquire about why it behaved in a certain way. But how do users perceive this new feature? In this paper, we present preliminary results from research conducted as part of a three-year project involving 33 German households. This project utilized interviews, fieldwork, and co-design workshops to identify common unexpected behaviors of VAs, as well as users’ needs and expectations for explanations. Our findings show that, contrary to its intended purpose, the new feature actually exacerbates user confusion and frustration instead of clarifying Alexa's behavior. We argue that such voice interactions should be characterized as explanatory dialogs that account for VA’s unexpected behavior by providing interpretable information and prompting users to take action to improve their current and future interactions.
This thesis investigates the benefit of rubrics for grading short answers using an active learning mechanism. Automating short answer grading using Natural Language Processing (NLP) is one of the active research areas in the education domain. This could save time for the evaluator and invest more time in preparing for the lecture. Most of the research on short answer grading was treated as a similarity task between reference and student answers. However, grading based on reference answers does not account for partial grades and does not provide feedback. Also, the grading is automatic that tries to replace the evaluator. Hence, using rubrics for short answer grading with active learning eliminates the drawbacks mentioned earlier.
Initially, the proposed approach is evaluated on the Mohler dataset, popularly used to benchmark the methodology. This phase is used to determine the parameters for the proposed approach. Therefore, the approach with the selected parameter exceeds the performance of current State-Of-The-Art (SOTA) methods resulting in the Pearson correlation value of 0.63 and Root Mean Square Error (RMSE) of 0.85. The proposed approach has surpassed the SOTA methods by almost 4%.
Finally, the benchmarked approach is used to grade the short answer based on rubrics instead of reference answers. The proposed approach evaluates short answers from Autonomous Mobile Robot (AMR) dataset to provide scores and feedback (formative assessment) based on the rubrics. The average performance of the dataset results in the Pearson correlation value of 0.61 and RMSE of 0.83. Thus, this research has proven that rubrics-based grading achieves formative assessment without compromising performance. In addition, the rubrics have the advantage of generalizability to all answers.
Loading of shipping containers for dairy products often includes a press-fit task, which involves manually stacking milk cartons in a container without using pallets or packaging. Automating this task with a mobile manipulator can reduce worker strain, and also enhance the efficiency and safety of the container loading process. This paper proposes an approach called Adaptive Compliant Control with Integrated Failure Recovery (ACCIFR), which enables a mobile manipulator to reliably perform the press-fit task. We base the approach on a demonstration learning-based compliant control framework, such that we integrate a monitoring and failure recovery mechanism for successful task execution. Concretely, we monitor the execution through distance and force feedback, detect collisions while the robot is performing the press-fit task, and use wrench measurements to classify the direction of collision; this information informs the subsequent recovery process. We evaluate the method on a miniature container setup, considering variations in the (i) starting position of the end effector, (ii) goal configuration, and (iii) object grasping position. The results demonstrate that the proposed approach outperforms the baseline demonstration-based learning framework regarding adaptability to environmental variations and the ability to recover from collision failures, making it a promising solution for practical press-fit applications.
The representation, or encoding, utilized in evolutionary algorithms has a substantial effect on their performance. Examination of the suitability of widely used representations for quality diversity optimization (QD) in robotic domains has yielded inconsistent results regarding the most appropriate encoding method. Given the domain-dependent nature of QD, additional evidence from other domains is necessary. This study compares the impact of several representations, including direct encoding, a dictionary-based representation, parametric encoding, compositional pattern producing networks, and cellular automata, on the generation of voxelized meshes in an architecture setting. The results reveal that some indirect encodings outperform direct encodings and can generate more diverse solution sets, especially when considering full phenotypic diversity. The paper introduces a multi-encoding QD approach that incorporates all evaluated representations in the same archive. Species of encodings compete on the basis of phenotypic features, leading to an approach that demonstrates similar performance to the best single-encoding QD approach. This is noteworthy, as it does not always require the contribution of the best-performing single encoding.