Refine
Departments, institutes and facilities
Document Type
- Master's Thesis (12)
- Doctoral Thesis (2)
- Bachelor Thesis (1)
- Report (1)
Keywords
- Emergency support system (2)
- Mobile sensors (2)
- 3D-Laserscanner (1)
- 3D-Punktwolke (1)
- Alize (1)
- Batch Normalization (1)
- Bioinformatics (1)
- Deep Learning (1)
- ICP (1)
- Interactive visualization (1)
In the eld of accessing and visualization mobile sensors and their recorded data, di erent approaches were realized. The OGC1 Sensor observation Service supplies a standard to access these information, stored on servers. To be able to access these servers, an interface must be developed and implemented. The result should be a con gurable development framework for web-based GIS clients supporting the OGC sensor observation services. In particular the framework should allow continuous position updates of mobile sensors. Visualization features like charts, bounding boxes of sensors and data series should be included.
This work extends the affordance-inspired robot control architecture introduced in the MACS project [35] and especially its approach to integrate symbolic planning systems given in [24] by providing methods to automated abstraction of affordances to high-level operators. It discusses how symbolic planning instances can be generated automatically based on these operators and introduces an instantiation method to execute the resulting plans. Preconditions and effects of agent behaviour are learned and represented in Gärdenfors conceptual spaces framework. Its notion of similarity is used to group behaviours to abstract operators based on the affordance-inspired, function-centred view on the environment. Ways on how the capabilities of conceptual spaces to map subsymbolic to symbolic representations to generate PDDL planning domains including affordance-based operators are discussed. During plan execution, affordance-based operators are instantiated by agent behaviour based on the situation directly before its execution. The current situation is compared to past ones and the behaviour that has been most successful in the past is applied. Execution failures can be repaired by action substitution. The concept of using contexts to dynamically change dimension salience as introduced by Gärdenfors is realized by using techniques from the field of feature selection. The approach is evaluated using a 3D simulation environment and implementations of several object manipulation behaviours.
The task of this thesis is to develop an OGC-compliant Sensor Observation Service (SOS) { a component of the SWE { for GPS related sensor data in this context. It should, in contrast to existing implementations, support full mobility of the sensors and be con gurable with respect to adding di erent kinds of sensors. In particular, mobile phones should be considered as sensors, which transmit their data to the SOS server through the transactional SOS interface.
This report presents an approach on a quadrotor dynamics stabilization based on ICP SLAM. Because the quadrotor lacks sensory information to detect its horizontal drift an additional sensor as Hokuyo-UTM has been used to perform on-line ICP-based SLAM. The obtained position estimates were used in control loops to maintain desired position and orientation of the vehicle. Such attitude parameters as height, yaw and position in space were controlled based on the laser data. As a result the quadrotor demonstrated two significant for autonomous navigation capabilities: performance of on-line SLAMon a flying vehicle and maintaining desired position in 3D space. Visual approach on optical flow based on Pyramid Lucas-Kanade algorithm has been touched and tested in different environmental conditions though hasn't been implemented in the control loop. Also the performance of the Hokuyo laser scanner and the related to it ICP SLAM algorithm have been tested in different environmental conditions indoors, outdoors and in presence of smoke. Results are presented and discussed. The requirement of performing on-line SLAM algorithm and to carry quite heavy equipment for it forced to seek a solution to increase the payload of the quadrotor with its computational power. A new hardware and distributed software architectures are therefore presented in the report.
This work aims to create a natural language generation (NLG) base for further development of systems for automatic examination questions generation and automatic summarization in Hochschule Bonn-Rhein-Sieg and Fraunhofer IAIS, respectively. Nowadays both tasks are very relevant. The first can significantly simplify the university teachers' work and the second to be of assistance for a faster retrieval of knowledge from an excessively large amount of information that people often work with. We focus on the search for an efficient and robust approach to the controlled NLG problem. Therefore, though the initial idea of the project was the usage of the generative adversarial neural networks (GANs), we switched our attention to more robust and easily-controllable autoencoders. Thus, in this work we implement an autoencoder for unsupervised discovery of latent space representations of text, and show the ability of the system to generate new sentences based on this latent space. Apart from that, we apply Gaussian mixture techniques in order to obtain meaningful text clusters and thereby try to create a tool that would allow us to generate sentences relevant to the semantics of the Gaussian clusters, e.g. positive or negative reviews or examination questions on certain topic. The developed system is tested on several datasets and compared to GANs' performance.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
Neural network based object detectors are able to automatize many difficult, tedious tasks. However, they are usually slow and/or require powerful hardware. One main reason is called Batch Normalization (BN) [1], which is an important method for building these detectors. Recent studies present a potential replacement called Self-normalizing Neural Network (SNN) [2], which at its core is a special activation function named Scaled Exponential Linear Unit (SELU). This replacement seems to have most of BNs benefits while requiring less computational power. Nonetheless, it is uncertain that SELU and neural network based detectors are compatible with one another. An evaluation of SELU incorporated networks would help clarify that uncertainty. Such evaluation is performed through series of tests on different neural networks. After the evaluation, it is concluded that, while indeed faster, SELU is still not as good as BN for building complex object detector networks.
In the field of domestic service robots, recovery from faults is crucial to promote user acceptance. In this context, this work focuses on some specific faults which arise from the interaction of a robot with its real world environment. Even a well-modelled robot may fail to perform its tasks successfully due to external faults which occur because of an infinite number of unforeseeable and unmodelled situations. Through investigating the most frequent failures in typical scenarios which have been observed in real-world demonstrations and competitions using the autonomous service robots Care-O-Bot III and youBot, we identified four different fault classes caused by disturbances, imperfect perception, inadequate planning operator or chaining of action sequences. This thesis then presents two approaches to handle external faults caused by insufficient knowledge about the preconditions of the planning operator. The first approach presents reasoning on detected external faults using knowledge about naive physics. The naive physics knowledge is represented by the physical properties of objects which are formalized in a logical framework. The proposed approach applies a qualitative version of physical laws to these properties in order to reason. By interpreting the reasoning results the robot identifies the information about the situations which can cause the fault. Applying this approach to simple manipulation tasks like picking and placing objects show that naive physics holds great possibilities for reasoning on unknown external faults in robotics. The second approach includes missing knowledge about the execution of an action through learning by experimentation. Firstly, it investigates such representation of execution specific knowledge that can be learned for one particular situation and reused for situations which deviate from the original. The combination of symbolic and geometric models allows us to represent action execution knowledge effectively. This representation is called action execution model (AEM) here. The approach provides a learning strategy which uses a physical simulation for generating the training data to learn both symbolic and geometric aspects of the model. The experimental analysis, performed on two physical robots, shows that AEM can reliably describe execution specific knowledge and thereby serving as a potential model for avoiding the occurrence of external faults.
This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
Recent advances in Natural Language Processing have substantially improved contextualized representations of language. However, the inclusion of factual knowledge, particularly in the biomedical domain, remains challenging. Hence, many Language Models (LMs) are extended by Knowledge Graphs (KGs), but most approaches require entity linking (i.e., explicit alignment between text and KG entities). Inspired by single-stream multimodal Transformers operating on text, image and video data, this thesis proposes the Sophisticated Transformer trained on biomedical text and Knowledge Graphs (STonKGs). STonKGs incorporates a novel multimodal architecture based on a cross encoder that uses the attention mechanism on a concatenation of input sequences derived from text and KG triples, respectively. Over 13 million so-called text-triple pairs, coming from PubMed and assembled using the Integrated Network and Dynamical Reasoning Assembler (INDRA), were used in an unsupervised pre-training procedure to learn representations of biomedical knowledge in STonKGs. By comparing STonKGs to an NLP- and a KG-baseline (operating on either text or KG data) on a benchmark consisting of eight fine-tuning tasks, the proposed knowledge integration method applied in STonKGs was empirically validated. Specifically, on tasks with a comparatively small dataset size and a larger number of classes, STonKGs resulted in considerable performance gains, beating the F1-score of the best baseline by up to 0.083. Both the source code as well as the code used to implement STonKGs are made publicly available so that the proposed method of this thesis can be extended to many other biomedical applications.
This project focuses on object detection in dense volume data. There are several types of dense volume data, namely Computed Tomography (CT) scan, Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI). This work focuses on CT scans. CT scans are not limited to the medical domain; they are also used in industries. CT scans are used in airport baggage screening, assembly lines, and the object detection systems in these places should be able to detect objects fast. One of the ways to address the issue of computational complexity and make the object detection systems fast is to use low-resolution images. Low-resolution CT scanning is fast. The entire process of scanning and detection can be made faster by using low-resolution images. Even in the medical domain, to reduce the rad iation dose, the exposure time of the patient should be reduced. The exposure time of patients could be reduced by allowing low-resolution CT scans. Hence it is essential to find out which object detection model has better accuracy as well as speed at low-resolution CT scans. However, the existing approaches did not provide details about how the model would perform when the resolution of CT scans is varied. Hence in this project, the goal is to analyze the impact of varying resolution of CT scans on both the speed and accuracy of the model. Three object detection models, namely RetinaNet, YOLOv3, and YOLOv5, were trained at various resolutions. Among the three models, it was found that YOLOv5 has the best mAP and f1 score at multiple resolutions on the DeepLesion dataset. RetinaNet model h as the least inference time on the DeepLesion dataset. From the experiments, it could be asserted that sacrificing mean average precision (mAP) to improve inference time by reducing resolution is feasible.
An analysis of sharing string objects with the Java Virtual Machine was conducted; they are the most used objects in Java programs and they are immutable - thus they are read-only and easily identified. While the results are promising, it is clear that sharing more objects would result in better performance. Automatic object selection for sharing is non-trivial, because in the current state only read-only objects can be shared. This attribute can not be easily determined during runtime by an algorithm; the developer on the other hand can. This thesis presents the development of an Application Programmer Interface (API) that allows programmers to use the Java Virtual Machine (JVM) internal sharing functionality. Furthermore, we present the usage of the sharing API. Open-source software was used as real-world test cases. Afterwards the evaluation shows that the ratio between memory savings and start-up time overhead is reasonable.
Segmentierung von 3D-Daten
(2011)
Die vorliegende Arbeit wird im Rahmen eines Projektes des Fraunhofer Instituts IAIS erstellt. Hier geht es um die Entwicklung eines neuen 3D-Laserscanners. Basierend auf diesem 3D-Laserscanner soll eine Sicherheits-Anwendung realisiert werden. Für eine Softwarekomponente - die Segmentierung von 3D-Daten - wird der Stand der Forschung untersucht und es werden drei Segmentierungs-Verfahren ausgewählt und implementiert. Der RANSAC-Algorithmus wird zur Detektion von Ebenen eingesetzt. In dieser Arbeit wird er um ein Abbruchkriterium erweitert, welches die Gesamtlaufzeit bei der Segmentierung von mehreren Ebenen verringert.
In der vorliegenden Arbeit wird ein Verfahren zur Segmentierung von Außenszenen und Terrain-Klassifkation entwickelt. Dazu werden 360 Grad-Laserscanner-Aufnahmen von Straßen, Gebäudefassaden und Waldwegen aufgenommen. Von diesen Aufnahmen werden verschiedene visuelle Repräsentationen in 2D erstellt. Dazu werden die Distanzinformationen und Winkelübergänge der Polarkoordinaten, die Remissionswerte und der Normalenvektor eingesetzt. Die Berechnung des Normalenvektors wird über ein modernes Verfahren mit einerniedrigen Laufzeit durchgeführt. Anschließend werden Oberflächeneigenschaften innerhalb einer Punktwolke analysiert und vier Klassen unterschieden: Untergrund, Vegetation, Hindernis und Himmel. Die Segmentierung und Klassifkation geschieht in einem Schritt. Dazuwird die Varianz auf den N ormalen über eine Filtermaske berechnet und ein Deskriptor erstellt. Der Deskriptor beinhaltet die Normalenvektoren und die Normalenvarianz fürdie x-, y- und z-Achse. Die Ergebnisse werden als Überblendung auf dem Remissionsbilddargestellt. Die Auswertung wird über eigens erstellte Ground-Truth-Daten vorgenommen. Dazu wird das Remissionsbild genutzt und der Ground-Truth mit verschiedenen Farben eingezeichnet. Die Klassifkationsergebnisse sind in Precision-Recall-Diagrammen dargestellt.
Skill generalisation and experience acquisition for predicting and avoiding execution failures
(2023)
For performing tasks in their target environments, autonomous robots usually execute and combine skills. Robot skills in general and learning-based skills in particular are usually designed so that flexible skill acquisition is possible, but without an explicit consideration of execution failures, the impact that failure analysis can have on the skill learning process, or the benefits of introspection for effective coexistence with humans. Particularly in human-centered environments, the ability to understand, explain, and appropriately react to failures can affect a robot's trustworthiness and, consequently, its overall acceptability. Thus, in this dissertation, we study the questions of how parameterised skills can be designed so that execution-level decisions are associated with semantic knowledge about the execution process, and how such knowledge can be utilised for avoiding and analysing execution failures. The first major segment of this work is dedicated to developing a representation for skill parameterisation whose objective is to improve the transparency of the skill parameterisation process and enable a semantic analysis of execution failures. We particularly develop a hybrid learning-based representation for parameterising skills, called an execution model, which combines qualitative success preconditions with a function that maps parameters to predicted execution success. The second major part of this work focuses on applications of the execution model representation to address different types of execution failures. We first present a diagnosis algorithm that, given parameters that have resulted in a failure, finds a failure hypothesis by searching for violations of the qualitative model, as well as an experience correction algorithm that uses the found hypothesis to identify parameters that are likely to correct the failure. Furthermore, we present an extension of execution models that allows multiple qualitative execution contexts to be considered so that context-specific execution failures can be avoided. Finally, to enable the avoidance of model generalisation failures, we propose an adaptive ontology-assisted strategy for execution model generalisation between object categories that aims to combine the benefits of model-based and data-driven methods; for this, information about category similarities as encoded in an ontology is integrated with outcomes of model generalisation attempts performed by a robot. The proposed methods are exemplified in terms of various use cases - object and handle grasping, object stowing, pulling, and hand-over - and evaluated in multiple experiments performed with a physical robot. The main contributions of this work include a formalisation of the skill parameterisation problem by considering execution failures as an integral part of the skill design and learning process, a demonstration of how a hybrid representation for parameterising skills can contribute towards improving the introspective properties of robot skills, as well as an extensive evaluation of the proposed methods in various experiments. We believe that this work constitutes a small first step towards more failure-aware robots that are suitable to be used in human-centered environments.
In order to help journalists investigate inside large audiovisual archives, as maintained by news broadcast agencies, the multimedia data must be indexed by text-based search engies. By automatically creating a transcript through automatic speech recognition (ASR), the spoken word becomes accessible to text search, and queries for keywords are made possible. But stil, important contextual information like the identity of the speaker is not captured. Especially when gathering original footage in the political domain, the identity of the speaker can be the most important query constraint, although this name may not be prominent in the words spoken. It is thus desireable to have this information provided explicitely to the search engine. To provide this information, the archive must be an alyzed by automatic Speaker Identification (SID). While this research topic has seen substantial gains in accuracy and robustness over last years, it has not yet established itself as a helpful, large-scale tool outside the research community. This thesis sets out to establish a workflow to provide automatic speaker identification. Its application is to help journalists searching on speeches given in the German parliament (Bundestag). This is a contribution to the News-Stream 3.0 project, a BMBF funded research project that addresses accessibility of various data sources for journalists.