Refine
Departments, institutes and facilities
Document Type
- Master's Thesis (65) (remove)
Year of publication
Keywords
- Active Learning (2)
- Computer Vision (2)
- Emergency support system (2)
- Mobile sensors (2)
- Object Detection (2)
- deep learning (2)
- object detection (2)
- 0-1-Integer-Problem (1)
- 3D-Lokalisierung (1)
- 3D-Scanner (1)
Object detectors have improved considerably in the last years by using advanced Convolutional Neural Networks (CNNs) architectures. However, many detector hyper-parameters are not generally tuned, and they are used with values set by the detector authors. Blackbox optimization methods have gained more attention in recent years because of its ability to optimize the hyper-parameters of various machine learning algorithms and deep learning models. However, these methods are not explored in improving CNN-based object detector's hyper-parameters. In this research work, we propose the use of blackbox optimization methods such as Gaussian Process based Bayesian Optimization (BOGP), Sequential Model-based Algorithm Configuration (SMAC), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to tune the hyper-parameters in Faster R-CNN and Single Shot MultiBox Detector (SSD). In Faster R-CNN, tuning the input image size, prior box anchor scales and ratios using BOGP, SMAC, and CMA-ES has increased the performance around 1.5% in terms of Mean Average Precision (mAP) on PASCAL VOC. Tuning the anchor scales of SSD has increased the mAP by 3% on PASCAL VOC and marine debris datasets. On the COCO dataset with SSD, mAP improvement is observed in the medium and large objects, but mAP decreases by 1% in small objects. The experimental results show that the blackbox optimization methods have proved to increase the mAP performance by optimizing the object detectors. Moreover, it has achieved better results than the hand-tuned configurations in most of the cases.
In dieser vorliegenden Arbeit wurde der photolytische und photokatalytische Abbau von Lignin untersucht. Eine Charakterisierung des verwendeten Photoreaktors wurde mittels Kalium-Ferrioxalat-Aktinometrie durchgeführt. Zur Analyse der abgebauten Lignine wurde eine Optimierung einer bereits bestehenden Methode zur Bestimmung des Hydroxylgehaltes erarbeitet. Die Bestimmung der Hydroxylgehalte erfolgte demnach bei Raumtemperatur nach einer Acetylierungsdauer von 72 h und zeigte eine Abnahme der Hydroxylgehalte mit andauernder UV-Bestrahlung. Selbige Beobachtung konnte mit Hilfe der ATR-IR-Spektroskopie gemacht werden. Zusätzlich konnte die Bildung von Carbonsäuren und der Abbau von aromatischen Strukturen detektiert werden. Der Abbau aromatischer Strukturen konnte ebenfalls durch UV-VIS-Spektroskopie gezeigt werden. Eine Vermutung, dass es sich bei dem Abbauprozess um einen oxidativen Mechanismus handelt, konnte mit dem Abbau von Hydroxylgruppen über eine Bildung von Carbonsäuren zu Kohlenstoffdioxid bestätigt werden. Eine Freisetzung von Kohlenstoffdioxid konnte durch eine Bestimmung des IC festgestellt werden. Die Ergebnisse der Gel-Permeations-Chromatographie zusammen mit einer TOC-Analyse zeigen einen Abbau der molaren Masse des Lignins auf. Es konnten Fragmente mit einer Molmasse ähnlich der Monomere des Lignins gefunden werden. Der eingesetzte Photokatalysator wurde via Röntgenbeugung untersucht und konnte als das hoch photokatalytisch aktive P25 von Degussa identifiziert werden. Trotz des Einsatzes verschiedener Katalysatorkonzentrationen in einem Bereich von 0-0,5 g L^(-1) konnte kein Einfluss des Photokatalysators auf den Abbauprozess des Lignins beobachtet werden.
Diese Arbeit beschäftigt sich mit der Entwicklung eines, für die kontrollierte Freisetzung hydrophiler Wirkstoffe geeigneten, Verkapselungssystems mit dem Ziel die Freisetzung osteospezifischer P2-Liganden zu verzögern, um bei der Behandlung von Knochendefekten kritischer Größe die Bildung neuen Knochengewebes zu gewährleisten. Hierfür werden, unter Anwendung der immersiven Layer-by-Layer-Beschichtung, mit den Modell-Substanzen Adenosintriphosphat und Suramin versetzte, Alginat sowie κ-Carrageen-Kapseln mit Chitosan und Lignosulfonat beschichtet und auf ihr Freisetzungsverhalten hin untersucht.
This work aims to create a natural language generation (NLG) base for further development of systems for automatic examination questions generation and automatic summarization in Hochschule Bonn-Rhein-Sieg and Fraunhofer IAIS, respectively. Nowadays both tasks are very relevant. The first can significantly simplify the university teachers' work and the second to be of assistance for a faster retrieval of knowledge from an excessively large amount of information that people often work with. We focus on the search for an efficient and robust approach to the controlled NLG problem. Therefore, though the initial idea of the project was the usage of the generative adversarial neural networks (GANs), we switched our attention to more robust and easily-controllable autoencoders. Thus, in this work we implement an autoencoder for unsupervised discovery of latent space representations of text, and show the ability of the system to generate new sentences based on this latent space. Apart from that, we apply Gaussian mixture techniques in order to obtain meaningful text clusters and thereby try to create a tool that would allow us to generate sentences relevant to the semantics of the Gaussian clusters, e.g. positive or negative reviews or examination questions on certain topic. The developed system is tested on several datasets and compared to GANs' performance.
Die letzten zwei Jahrzehnte wurden durch das exponentielle Wachstum der zur Verfügung stehenden Daten geprägt. Täglich produzieren Menschen und Maschinen mehr und mehr Daten, die oftmals in verteilten Datenspeichern abgelegt werden. Anwendungsgebiete lassen sich beispielsweise in der Physik und Astronomie finden, wo immense Datenmengen von Teilchenbeschleunigern oder Satelliten erzeugt werden, die gespeichert und verarbeitet werden müssen. Aus diesen Datenmengen können weder vom Menschen direkt noch durch traditionelle Analysemethoden neue Erkenntnisse gewonnen werden. Zur Verarbeitung dieser Datenmassen sind parallele sowie verteilte Datenanalyseverfahren notwendig. [MTT18,NEKH+18]
Neural network based object detectors are able to automatize many difficult, tedious tasks. However, they are usually slow and/or require powerful hardware. One main reason is called Batch Normalization (BN) [1], which is an important method for building these detectors. Recent studies present a potential replacement called Self-normalizing Neural Network (SNN) [2], which at its core is a special activation function named Scaled Exponential Linear Unit (SELU). This replacement seems to have most of BNs benefits while requiring less computational power. Nonetheless, it is uncertain that SELU and neural network based detectors are compatible with one another. An evaluation of SELU incorporated networks would help clarify that uncertainty. Such evaluation is performed through series of tests on different neural networks. After the evaluation, it is concluded that, while indeed faster, SELU is still not as good as BN for building complex object detector networks.
Interactive Object Detection
(2019)
The success of state-of-the-art object detection methods depend heavily on the availability of a large amount of annotated image data. The raw image data available from various sources are abundant but non-annotated. Annotating image data is often costly, time-consuming or needs expert help. In this work, a new paradigm of learning called Active Learning is explored which uses user interaction to obtain annotations for a subset of the dataset. The goal of active learning is to achieve superior object detection performance with images that are annotated on demand. To realize active learning method, the trade-off between the effort to annotate (annotation cost) unlabeled data and the performance of object detection model is minimised.
Random Forests based method called Hough Forest is chosen as the object detection model and the annotation cost is calculated as the predicted false positive and false negative rate. The framework is successfully evaluated on two Computer Vision benchmark and two Carl Zeiss custom datasets. Also, an evaluation of RGB, HoG and Deep features for the task is presented.
Experimental results show that using Deep features with Hough Forest achieves the maximum performance. By employing Active Learning, it is demonstrated that performance comparable to the fully supervised setting can be achieved by annotating just 2.5% of the images. To this end, an annotation tool is developed for user interaction during Active Learning.
Zustandsregelung für ein Mikroflugsystem zur Ansteuerung vorgegebener Wegpunkte in Innenräumen
(2018)
In der Masterarbeit Zustandsregelung für ein Mikroflugsystem zur Ansteuerung vorgegebener Wegpunkte in Innenräumen wird die Entwicklung einer Positionsregelung für ein Mikroflugsystem vorgestellt. Damit ist es möglich, sowohl in einer bekannten als auch unbekannten Umgebung vorgegebene Wegpunkte automatisch anzusteuern. Die Lokalisation des Flugsystems findet mit interner Sensorik sowie mithilfe von zwei Laserscannern statt. Steht bereits eine Karte der Umgebung zur Verfügung, ist es möglich, einen Pfad zu einem vorgegebenen Zielpunkt zu berechnen und diesen Pfad automatisch abzufliegen.
The recent explosion of available audio-visual media is the new challenge for information retrieval research. Audio speech recognition systems translate spoken content to the text domain. There is a need for searching and indexing this data which possesses no logical structure. One possible way to structure it on a high level of abstraction is by finding topic boundaries. Two unsupervised topic segmentation methods were evaluated with real-world data in the course of this work. The first one, TSF, models topic shifts as fluctuations in the similarity function of the transcript. The second one, LCSeg, approaches topic changes as places with the least overlapping lexical chains. Only LCSeg performed close to a similar real-world corpus. Other reported results could not be outperformed. Topic analysis based on the repeated word usage models renders topic changes more ambiguous than expected. This issue has more impact on the segmentation quality than the state-of-the-art ASR word error rate. It could be concluded that it is advisable to develop topic segmentation algorithms with real-world data to avoid potential biases to artificial data. Unlike evaluated approaches based on word usage analysis, methods operating with local contexts can be expected to perform better through emulation of semantic dependencies.
Estimation of Prediction Uncertainty for Semantic Scene Labeling Using Bayesian Approximation
(2018)
With the advancement in technology, autonomous and assisted driving are close to being reality. A key component of such systems is the understanding of the surrounding environment. This understanding about the environment can be attained by performing semantic labeling of the driving scenes. Existing deep learning based models have been developed over the years that outperform classical image processing algorithms for the task of semantic labeling. However, the existing models only produce semantic predictions and do not provide a measure of uncertainty about the predictions. Hence, this work focuses on developing a deep learning based semantic labeling model that can produce semantic predictions and their corresponding uncertainties. Autonomous driving needs a real-time operating model, however the Full Resolution Residual Network (FRRN) [4] architecture, which is found as the best performing architecture during literature search, is not able to satisfy this condition. Hence, a small network, similar to FRRN, has been developed and used in this work. Based on the work of [13], the developed network is then extended by adding dropout layers and the dropouts are used during testing to perform approximate Bayesian inference. The existing works on uncertainties, do not have quantitative metrics to evaluate the quality of uncertainties estimated by a model. Hence, the area under curve (AUC) of the receiver operating characteristic (ROC) curves is proposed and used as an evaluation metric in this work. Further, a comparative analysis about the influence of dropout layer position, drop probability and the number of samples, on the quality of uncertainty estimation is performed. Finally, based on the insights gained from the analysis, a model with optimal configuration of dropout is developed. It is then evaluated on the Cityscape dataset and shown to be outperforming the baseline model with an AUC-ROC of about 90%, while the latter having AUC-ROC of about 80%.
This report presents an approach on a quadrotor dynamics stabilization based on ICP SLAM. Because the quadrotor lacks sensory information to detect its horizontal drift an additional sensor as Hokuyo-UTM has been used to perform on-line ICP-based SLAM. The obtained position estimates were used in control loops to maintain desired position and orientation of the vehicle. Such attitude parameters as height, yaw and position in space were controlled based on the laser data. As a result the quadrotor demonstrated two significant for autonomous navigation capabilities: performance of on-line SLAMon a flying vehicle and maintaining desired position in 3D space. Visual approach on optical flow based on Pyramid Lucas-Kanade algorithm has been touched and tested in different environmental conditions though hasn't been implemented in the control loop. Also the performance of the Hokuyo laser scanner and the related to it ICP SLAM algorithm have been tested in different environmental conditions indoors, outdoors and in presence of smoke. Results are presented and discussed. The requirement of performing on-line SLAM algorithm and to carry quite heavy equipment for it forced to seek a solution to increase the payload of the quadrotor with its computational power. A new hardware and distributed software architectures are therefore presented in the report.
In order to help journalists investigate inside large audiovisual archives, as maintained by news broadcast agencies, the multimedia data must be indexed by text-based search engies. By automatically creating a transcript through automatic speech recognition (ASR), the spoken word becomes accessible to text search, and queries for keywords are made possible. But stil, important contextual information like the identity of the speaker is not captured. Especially when gathering original footage in the political domain, the identity of the speaker can be the most important query constraint, although this name may not be prominent in the words spoken. It is thus desireable to have this information provided explicitely to the search engine. To provide this information, the archive must be an alyzed by automatic Speaker Identification (SID). While this research topic has seen substantial gains in accuracy and robustness over last years, it has not yet established itself as a helpful, large-scale tool outside the research community. This thesis sets out to establish a workflow to provide automatic speaker identification. Its application is to help journalists searching on speeches given in the German parliament (Bundestag). This is a contribution to the News-Stream 3.0 project, a BMBF funded research project that addresses accessibility of various data sources for journalists.
This work extends the affordance-inspired robot control architecture introduced in the MACS project [35] and especially its approach to integrate symbolic planning systems given in [24] by providing methods to automated abstraction of affordances to high-level operators. It discusses how symbolic planning instances can be generated automatically based on these operators and introduces an instantiation method to execute the resulting plans. Preconditions and effects of agent behaviour are learned and represented in Gärdenfors conceptual spaces framework. Its notion of similarity is used to group behaviours to abstract operators based on the affordance-inspired, function-centred view on the environment. Ways on how the capabilities of conceptual spaces to map subsymbolic to symbolic representations to generate PDDL planning domains including affordance-based operators are discussed. During plan execution, affordance-based operators are instantiated by agent behaviour based on the situation directly before its execution. The current situation is compared to past ones and the behaviour that has been most successful in the past is applied. Execution failures can be repaired by action substitution. The concept of using contexts to dynamically change dimension salience as introduced by Gärdenfors is realized by using techniques from the field of feature selection. The approach is evaluated using a 3D simulation environment and implementations of several object manipulation behaviours.
In der vorliegenden Arbeit wird ein Verfahren zur Segmentierung von Außenszenen und Terrain-Klassifkation entwickelt. Dazu werden 360 Grad-Laserscanner-Aufnahmen von Straßen, Gebäudefassaden und Waldwegen aufgenommen. Von diesen Aufnahmen werden verschiedene visuelle Repräsentationen in 2D erstellt. Dazu werden die Distanzinformationen und Winkelübergänge der Polarkoordinaten, die Remissionswerte und der Normalenvektor eingesetzt. Die Berechnung des Normalenvektors wird über ein modernes Verfahren mit einerniedrigen Laufzeit durchgeführt. Anschließend werden Oberflächeneigenschaften innerhalb einer Punktwolke analysiert und vier Klassen unterschieden: Untergrund, Vegetation, Hindernis und Himmel. Die Segmentierung und Klassifkation geschieht in einem Schritt. Dazuwird die Varianz auf den N ormalen über eine Filtermaske berechnet und ein Deskriptor erstellt. Der Deskriptor beinhaltet die Normalenvektoren und die Normalenvarianz fürdie x-, y- und z-Achse. Die Ergebnisse werden als Überblendung auf dem Remissionsbilddargestellt. Die Auswertung wird über eigens erstellte Ground-Truth-Daten vorgenommen. Dazu wird das Remissionsbild genutzt und der Ground-Truth mit verschiedenen Farben eingezeichnet. Die Klassifkationsergebnisse sind in Precision-Recall-Diagrammen dargestellt.
The work done in this thesis enhances the MMD algorithm in multi-core environments. The MMD algorithm, a transformation based algorithm for reversible logic synthesis, is based on the works introduced by Maslov, Miller and Dueck and their original, sequential implementation. It synthesises a formal function specification, provided by a truth table, into a reversible network and is able to perform several optimization steps after the synthesis. This work concentrates on one of these optimization steps, the template matching. This approach is used to reduce the size of the reversible circuit by replacing a number of gates that match a template which implements the same function and uses less gates. Smaller circuits have several benefits since they need less area and are not as costly. The template matching approach introduced in the original works is computationally expensive since it tries to match a library of templates against the given circuit. For each template at each position in the circuit, a number of different combinations have to be calculated during runtime resulting in high execution times, especially for large circuits. In order to make the template matching approach more efficient and usable, it has been reimplemented in order to take advantage of modern multi-core architectures such as the Cell Broadband Engine or a Graphics Processing Unit. For this work, two algorithmically different approaches that try to consider each multi-core architecture’s strengths, have been analyzed and improved. For the analysis these approaches have been cross-implemented on the two target hardware architectures and compared to the original parallel versions. Important metrics for this analysis are the execution time of the algorithm and the result of the minimization with the template matching approach. It could be shown that the algorithmically different approaches produce the same minimization results, independent of the used hardware architecture. However, both cross-implementations also show a significantly higher execution time which makes them practically irrelevant. The results of the first analysis and comparison lead to the decision to enhance only the original parallel approaches. Using the same metrics for successful enhancements as mentioned above, it could be shown that improving the algorithmic concepts and exploiting the capabilities of the hardware lead to better results for the execution time and the minimization results compared to their original implementations.
This thesis presents an approach to automatically adjust the parameters of a Java application run on the IBM J9 Virtual Machine in order to improve its performance. It works by analyzing the logfile the VM generates and searching for specific behavioral patterns. These patterns are matched against a list of known patterns for which rules exist that specify how to adapt the VM to the given application. Adapting the application is done by adding parameters and changing existing ones, for example to achieve a better heap usage. The process is fully automated and carried out by a toolkit developed for this thesis. The toolkit iteratively cycles through multiple possible parameter sets, benchmarks them and proposes the best alternative to the user. The user can, without any prior knowledge about the Java application or the VM improve the performance of the deployed application.
The Java Virtual Machine (JVM) executes the compiled bytecode version of a Java program and acts as a layer between the program and the operating system. The JVM provides additional features such as Process, Thread, and Memory Management to manage the execution of these programs. The Garbage Collection (GC) is part of the memory management and has an impact on the overall runtime performance because it is responsible for removing dead objects from the heap. Currently, the execution of a program needs to be halted during every GC run. The problem of this stop-the-world approach is that all threads in the JVM need to be suspended. It would be desirable to have a thread-local GC that only blocks the current thread and does not affect any other threads. In particular, this would improve the execution of multi-threaded Java programs. An object that is accessible by more than one thread is called escaped. It is not possible to thread-locally determine if escaped objects are still alive so that they cannot be handled in a thread-local GC. To gain significant performance improvements with a thread-local GC, it is therefore necessary to determine if it is possible to reliably predict if a given object will escape. Experimental results show that the escaping of objects can be predicted with high accuracy based on the line of code the object was allocated from. A thread-local GC was developed to minimize the number of stop-the-world GCs. The prototype implementation delivers a proof-of-concept that shows that this goal can be achieved in certain scenarios.
An analysis of sharing string objects with the Java Virtual Machine was conducted; they are the most used objects in Java programs and they are immutable - thus they are read-only and easily identified. While the results are promising, it is clear that sharing more objects would result in better performance. Automatic object selection for sharing is non-trivial, because in the current state only read-only objects can be shared. This attribute can not be easily determined during runtime by an algorithm; the developer on the other hand can. This thesis presents the development of an Application Programmer Interface (API) that allows programmers to use the Java Virtual Machine (JVM) internal sharing functionality. Furthermore, we present the usage of the sharing API. Open-source software was used as real-world test cases. Afterwards the evaluation shows that the ratio between memory savings and start-up time overhead is reasonable.