Fachbereich Informatik
Refine
H-BRS Bibliography
- yes (26)
Document Type
- Master's Thesis (19)
- Bachelor Thesis (5)
- Report (1)
- Study Thesis (1)
Year of publication
Language
- English (26) (remove)
Has Fulltext
- no (26) (remove)
Keywords
- Emergency support system (2)
- Mobile sensors (2)
- Alize (1)
- Automation (1)
- Batch Normalization (1)
- Computer Game (1)
- Distributed Systems (1)
- Human Muscle (1)
- ICP (1)
- Interactive visualization (1)
Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of input data. Hence, monitoring data drift becomes essential to maintain the model’s desired performance. Based on the conducted review of the literature on drift detection, statistical hypothesis testing enables to investigate whether incoming data is drifting from training data. Because Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) have shown to be reliable distance measures between multivariate distributions in the literature review, both were selected from several existing techniques for experimentation. For the scope of this work, the image classification use case was experimented with using the Stream-51 dataset. Based on the results from different drift experiments, both MMD and KS showed high Area Under Curve values. However, KS exhibited faster performance than MMD with fewer false positives. Furthermore, the results showed that using the pre-trained ResNet-18 for feature extraction maintained the high performance of the experimented drift detectors. Furthermore, the results showed that the performance of the drift detectors highly depends on the sample sizes of the reference (training) data and the test data that flow into the pipeline’s monitor. Finally, the results also showed that if the test data is a mixture of drifting and non-drifting data, the performance of the drift detectors does not depend on how the drifting data are scattered with the non-drifting ones, but rather their amount in the test set
This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
This project focuses on object detection in dense volume data. There are several types of dense volume data, namely Computed Tomography (CT) scan, Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI). This work focuses on CT scans. CT scans are not limited to the medical domain; they are also used in industries. CT scans are used in airport baggage screening, assembly lines, and the object detection systems in these places should be able to detect objects fast. One of the ways to address the issue of computational complexity and make the object detection systems fast is to use low-resolution images. Low-resolution CT scanning is fast. The entire process of scanning and detection can be made faster by using low-resolution images. Even in the medical domain, to reduce the rad iation dose, the exposure time of the patient should be reduced. The exposure time of patients could be reduced by allowing low-resolution CT scans. Hence it is essential to find out which object detection model has better accuracy as well as speed at low-resolution CT scans. However, the existing approaches did not provide details about how the model would perform when the resolution of CT scans is varied. Hence in this project, the goal is to analyze the impact of varying resolution of CT scans on both the speed and accuracy of the model. Three object detection models, namely RetinaNet, YOLOv3, and YOLOv5, were trained at various resolutions. Among the three models, it was found that YOLOv5 has the best mAP and f1 score at multiple resolutions on the DeepLesion dataset. RetinaNet model h as the least inference time on the DeepLesion dataset. From the experiments, it could be asserted that sacrificing mean average precision (mAP) to improve inference time by reducing resolution is feasible.
In the field of autonomous robotics, sensors have played a major role in defining the scope of technology and to a great extent, limitations of it as well. This cycle of constant updates and hence technological advancement has made given birth to some serious industries which were once inconceivable. Industries like autonomous driving which has a serious impact on safety and security of people, also has an equally harsh implication on the dynamics and economics of the market. With sensors like LiDAR and RADAR delivering 3D measurements as point clouds, there is a necessity to process the raw measurements directly and many research groups are working on the same. A sizable research has gone in solving the task of object detection on 2D images. In this thesis we aim to develop a LiDAR based 3D object detection scheme. We combine the ideas of PointPillars and feature pyramid networks from 2D vision to propose Pillar-FPN. The proposed method directly takes 3D point clouds as input and outputs a 3D bounding box. Our pipeline consists of multiple variations of proposed Pillar-FPN at the feature fusion level that are described in the results section. We have trained our model on the KITTI train dataset and evaluated it on KITTI validation dataset.
This work aims to create a natural language generation (NLG) base for further development of systems for automatic examination questions generation and automatic summarization in Hochschule Bonn-Rhein-Sieg and Fraunhofer IAIS, respectively. Nowadays both tasks are very relevant. The first can significantly simplify the university teachers' work and the second to be of assistance for a faster retrieval of knowledge from an excessively large amount of information that people often work with. We focus on the search for an efficient and robust approach to the controlled NLG problem. Therefore, though the initial idea of the project was the usage of the generative adversarial neural networks (GANs), we switched our attention to more robust and easily-controllable autoencoders. Thus, in this work we implement an autoencoder for unsupervised discovery of latent space representations of text, and show the ability of the system to generate new sentences based on this latent space. Apart from that, we apply Gaussian mixture techniques in order to obtain meaningful text clusters and thereby try to create a tool that would allow us to generate sentences relevant to the semantics of the Gaussian clusters, e.g. positive or negative reviews or examination questions on certain topic. The developed system is tested on several datasets and compared to GANs' performance.
Neural network based object detectors are able to automatize many difficult, tedious tasks. However, they are usually slow and/or require powerful hardware. One main reason is called Batch Normalization (BN) [1], which is an important method for building these detectors. Recent studies present a potential replacement called Self-normalizing Neural Network (SNN) [2], which at its core is a special activation function named Scaled Exponential Linear Unit (SELU). This replacement seems to have most of BNs benefits while requiring less computational power. Nonetheless, it is uncertain that SELU and neural network based detectors are compatible with one another. An evaluation of SELU incorporated networks would help clarify that uncertainty. Such evaluation is performed through series of tests on different neural networks. After the evaluation, it is concluded that, while indeed faster, SELU is still not as good as BN for building complex object detector networks.
Estimation of Prediction Uncertainty for Semantic Scene Labeling Using Bayesian Approximation
(2018)
With the advancement in technology, autonomous and assisted driving are close to being reality. A key component of such systems is the understanding of the surrounding environment. This understanding about the environment can be attained by performing semantic labeling of the driving scenes. Existing deep learning based models have been developed over the years that outperform classical image processing algorithms for the task of semantic labeling. However, the existing models only produce semantic predictions and do not provide a measure of uncertainty about the predictions. Hence, this work focuses on developing a deep learning based semantic labeling model that can produce semantic predictions and their corresponding uncertainties. Autonomous driving needs a real-time operating model, however the Full Resolution Residual Network (FRRN) [4] architecture, which is found as the best performing architecture during literature search, is not able to satisfy this condition. Hence, a small network, similar to FRRN, has been developed and used in this work. Based on the work of [13], the developed network is then extended by adding dropout layers and the dropouts are used during testing to perform approximate Bayesian inference. The existing works on uncertainties, do not have quantitative metrics to evaluate the quality of uncertainties estimated by a model. Hence, the area under curve (AUC) of the receiver operating characteristic (ROC) curves is proposed and used as an evaluation metric in this work. Further, a comparative analysis about the influence of dropout layer position, drop probability and the number of samples, on the quality of uncertainty estimation is performed. Finally, based on the insights gained from the analysis, a model with optimal configuration of dropout is developed. It is then evaluated on the Cityscape dataset and shown to be outperforming the baseline model with an AUC-ROC of about 90%, while the latter having AUC-ROC of about 80%.
In order to help journalists investigate inside large audiovisual archives, as maintained by news broadcast agencies, the multimedia data must be indexed by text-based search engies. By automatically creating a transcript through automatic speech recognition (ASR), the spoken word becomes accessible to text search, and queries for keywords are made possible. But stil, important contextual information like the identity of the speaker is not captured. Especially when gathering original footage in the political domain, the identity of the speaker can be the most important query constraint, although this name may not be prominent in the words spoken. It is thus desireable to have this information provided explicitely to the search engine. To provide this information, the archive must be an alyzed by automatic Speaker Identification (SID). While this research topic has seen substantial gains in accuracy and robustness over last years, it has not yet established itself as a helpful, large-scale tool outside the research community. This thesis sets out to establish a workflow to provide automatic speaker identification. Its application is to help journalists searching on speeches given in the German parliament (Bundestag). This is a contribution to the News-Stream 3.0 project, a BMBF funded research project that addresses accessibility of various data sources for journalists.
Scientists and engineers are using a distributed system Remote Component Environment (RCE) to design and simulate complex systems like airplanes, ships and satellites. During the simulation, RCE executes local and remote code. Remote code is classified as untrusted code. The execution of remote code comprises potential security risks for the host system of RCE. Additionally, RCE provides full access to system resources. The objective of this thesis is to implement a sandbox prototype to reduce the vulnerability of RCE during the execution of remote code.