Refine
H-BRS Bibliography
- yes (56) (remove)
Departments, institutes and facilities
Document Type
- Master's Thesis (56) (remove)
Year of publication
Keywords
- Active Learning (2)
- Computer Vision (2)
- Emergency support system (2)
- Mobile sensors (2)
- Object Detection (2)
- deep learning (2)
- object detection (2)
- 0-1-Integer-Problem (1)
- 3D-Lokalisierung (1)
- 3D-Scanner (1)
In order to help journalists investigate inside large audiovisual archives, as maintained by news broadcast agencies, the multimedia data must be indexed by text-based search engies. By automatically creating a transcript through automatic speech recognition (ASR), the spoken word becomes accessible to text search, and queries for keywords are made possible. But stil, important contextual information like the identity of the speaker is not captured. Especially when gathering original footage in the political domain, the identity of the speaker can be the most important query constraint, although this name may not be prominent in the words spoken. It is thus desireable to have this information provided explicitely to the search engine. To provide this information, the archive must be an alyzed by automatic Speaker Identification (SID). While this research topic has seen substantial gains in accuracy and robustness over last years, it has not yet established itself as a helpful, large-scale tool outside the research community. This thesis sets out to establish a workflow to provide automatic speaker identification. Its application is to help journalists searching on speeches given in the German parliament (Bundestag). This is a contribution to the News-Stream 3.0 project, a BMBF funded research project that addresses accessibility of various data sources for journalists.
Das Deutsche Zentrum für Luft- und Raumfahrt (DLR) führt viele Forschungen und Studien im Bereich der Luft- und Raumfahrt durch. Dabei spielen die Studien für die Gesundheit und Medizin auch eine sehr wichtige Rolle bei der DLR. Zu diesem Zweck führt die DLR die Artificial Gravity bed rest study (AGBRESA) im Auftrag der European Space Agency (esa) und in Kooperation der NASA durch. In dieser Studie werden die negativen Auswirkungen der Schwerelosigkeit auf dem Menschen im Weltall simuliert. Dabei werden Experimente durchgeführt, um die negative Auswirkungen entgegenzuwirken. Die Ergebnisse der Experimente werden in der DLR digital, aber auch auf Papier dokumentiert. In diesem Master-Projekt habe ich nun die Aufgabe, die Papierprotokolle für den Bereich der Blutabnahme und der Labordokumentation in eine digitale Form zu ersetzen.
Graphbasierte Diskussionen sind eine Form von Online-Diskussionen, bei denen eine Diskussion als Graph visualisiert wird. Beispielhafte Diskussionsanwendungen sind unter anderem Belvedere [SWCP95], FreeStyler [Gas03] oder Digalo [LK06]. Graphen dieser Art sind, was bestimmte Eigenschaften betrifft, vergleichbar mit Petri-Netzen [Pet62]. So gibt es bei Beiden gewichtete, gerichtete Kanten sowie Knoten verschiedenen Typs, die jeweils bestimmte Eigenschaften besitzen. Im Gegensatz zu einem Petri-Netz, das immer ein bipartiter Graph ist, können bei einem Diskussionsgraphen jedoch prinzipiell alle Knoten miteinander verbunden werden. Moderatoren solcher Diskussionen sind oftmals mit dem Problem konfrontiert, dass sie mehrere Diskussionen gleichzeitig beobachten wollen, was jedoch aufgrund der Komplexität der Struktur von Diskussionsgraphen kaum effizient möglich ist.
Interactive Object Detection
(2019)
The success of state-of-the-art object detection methods depend heavily on the availability of a large amount of annotated image data. The raw image data available from various sources are abundant but non-annotated. Annotating image data is often costly, time-consuming or needs expert help. In this work, a new paradigm of learning called Active Learning is explored which uses user interaction to obtain annotations for a subset of the dataset. The goal of active learning is to achieve superior object detection performance with images that are annotated on demand. To realize active learning method, the trade-off between the effort to annotate (annotation cost) unlabeled data and the performance of object detection model is minimised.
Random Forests based method called Hough Forest is chosen as the object detection model and the annotation cost is calculated as the predicted false positive and false negative rate. The framework is successfully evaluated on two Computer Vision benchmark and two Carl Zeiss custom datasets. Also, an evaluation of RGB, HoG and Deep features for the task is presented.
Experimental results show that using Deep features with Hough Forest achieves the maximum performance. By employing Active Learning, it is demonstrated that performance comparable to the fully supervised setting can be achieved by annotating just 2.5% of the images. To this end, an annotation tool is developed for user interaction during Active Learning.
This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
Diese Arbeit beschäftigt sich mit der Entwicklung eines, für die kontrollierte Freisetzung hydrophiler Wirkstoffe geeigneten, Verkapselungssystems mit dem Ziel die Freisetzung osteospezifischer P2-Liganden zu verzögern, um bei der Behandlung von Knochendefekten kritischer Größe die Bildung neuen Knochengewebes zu gewährleisten. Hierfür werden, unter Anwendung der immersiven Layer-by-Layer-Beschichtung, mit den Modell-Substanzen Adenosintriphosphat und Suramin versetzte, Alginat sowie κ-Carrageen-Kapseln mit Chitosan und Lignosulfonat beschichtet und auf ihr Freisetzungsverhalten hin untersucht.
Semantic Image Segmentation Combining Visible and Near-Infrared Channels with Depth Information
(2015)
Image understanding is a vital task in computer vision that has many applications in areas such as robotics, surveillance and the automobile industry. An important precondition for image understanding is semantic image segmentation, i.e. the correct labeling of every image pixel with its corresponding object name or class. This thesis proposes a machine learning approach for semantic image segmentation that uses images from a multi-modal camera rig. It demonstrates that semantic segmentation can be improved by combining different image types as inputs to a convolutional neural network (CNN), when compared to a single-image approach. In this work a multi-channel near-infrared (NIR) image, an RGB image and a depth map are used. The detection of people is further improved by using a skin image that indicates the presence of human skin in the scene and is computed based on NIR information. It is also shown that segmentation accuracy can be enhanced by using a class voting method based on a superpixel pre-segmentation. Models are trained for 10-class, 3-class and binary classification tasks using an original dataset. Compared to the NIR-only approach, average class accuracy is increased by 7% for 10-class, and by 22% for 3-class classification, reaching a total of 48% and 70% accuracy, respectively. The binary classification task, which focuses on the detection of people, achieves a classification accuracy of 95% and true positive rate of 66%. The report at hand describes the proposed approach and the encountered challenges and shows that a CNN can successfully learn and combine features from multi-modal image sets and use them to predict scene labeling.
This report presents an approach on a quadrotor dynamics stabilization based on ICP SLAM. Because the quadrotor lacks sensory information to detect its horizontal drift an additional sensor as Hokuyo-UTM has been used to perform on-line ICP-based SLAM. The obtained position estimates were used in control loops to maintain desired position and orientation of the vehicle. Such attitude parameters as height, yaw and position in space were controlled based on the laser data. As a result the quadrotor demonstrated two significant for autonomous navigation capabilities: performance of on-line SLAMon a flying vehicle and maintaining desired position in 3D space. Visual approach on optical flow based on Pyramid Lucas-Kanade algorithm has been touched and tested in different environmental conditions though hasn't been implemented in the control loop. Also the performance of the Hokuyo laser scanner and the related to it ICP SLAM algorithm have been tested in different environmental conditions indoors, outdoors and in presence of smoke. Results are presented and discussed. The requirement of performing on-line SLAM algorithm and to carry quite heavy equipment for it forced to seek a solution to increase the payload of the quadrotor with its computational power. A new hardware and distributed software architectures are therefore presented in the report.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
Modern engineering relies heavily on utilizing computer technologies. This is especially true for thermoplastic manufacturing, such as blow molding. A crucial milestone for digitalization is the continuous integration of data in unified or interoperable systems. While new simulation technologies are constantly developed, data management standards such as STEP fail at integrating them. On the other hand, industrial standards such as ”VMAP” manage to improve interoperability for Small and Medium-sized Enterprises. However, they do not provide Simulation Process and Data Management (SPDM) technologies. For SPDM integration of VMAP data, Ontology-Based Data Access is used to allow continuing the digital thread in custom semantic-based open-source solutions. An ontology of the database format (VMAP) was generated alongside an expandable knowledge graph of data access methods. A Python-based software architecture was developed, automatically using the semantic representations of database format and data access to query data and metadata within the VMAP file. The result is a software architecture template that can be adapted for other data standards and integrated into semantic data management systems. It allows semantic queries on simulation data down to element-wise resolution without integrating the whole model information. The architecture can instantiate a file in a knowledge graph, query a file’s metadatum and, in case it is not yet available, find a semantically represented process that allows the creation and instantiation of the required metadatum. See Figure 1. The results of this thesis can be expected to form a basis for semantic SPDM tools.
This project focuses on object detection in dense volume data. There are several types of dense volume data, namely Computed Tomography (CT) scan, Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI). This work focuses on CT scans. CT scans are not limited to the medical domain; they are also used in industries. CT scans are used in airport baggage screening, assembly lines, and the object detection systems in these places should be able to detect objects fast. One of the ways to address the issue of computational complexity and make the object detection systems fast is to use low-resolution images. Low-resolution CT scanning is fast. The entire process of scanning and detection can be made faster by using low-resolution images. Even in the medical domain, to reduce the rad iation dose, the exposure time of the patient should be reduced. The exposure time of patients could be reduced by allowing low-resolution CT scans. Hence it is essential to find out which object detection model has better accuracy as well as speed at low-resolution CT scans. However, the existing approaches did not provide details about how the model would perform when the resolution of CT scans is varied. Hence in this project, the goal is to analyze the impact of varying resolution of CT scans on both the speed and accuracy of the model. Three object detection models, namely RetinaNet, YOLOv3, and YOLOv5, were trained at various resolutions. Among the three models, it was found that YOLOv5 has the best mAP and f1 score at multiple resolutions on the DeepLesion dataset. RetinaNet model h as the least inference time on the DeepLesion dataset. From the experiments, it could be asserted that sacrificing mean average precision (mAP) to improve inference time by reducing resolution is feasible.
Für die Durchführung größerer Projekte innerhalb des DLR ist es häufig notwendig, dass sich Wissenschaftler fachübergreifend in Themengebiete einarbeiten müssen. Im Rahmen dieser Einarbeitung führen Wissenschaftler Recherchen in fremden Fachbereichen durch. Das DLR hat zu diesem Zweck das Wissensportal KnowledgeFinder entwickelt. Dieses Framework setzt klassische Suchverfahren zum Auffinden von Informationen in beliebigen Datenbeständen ein. Wenn Wissenschaftler in fremden Fachbereichen recherchieren, dann fällt es ihnen aufgrund des oberflächlichen Einblicks oftmals schwer, zielgerichtet nach Informationen zu suchen. Die im KnowledgeFinder eingesetzten klassischen Suchverfahren, die auf textueller und struktureller Ähnlichkeit basieren, können bei diesen unspezifischen Suchanfragen nur bedingt beim Auffinden von relevanten Informationen helfen. Aufgrund von Mehrdeutigkeiten und unterschiedlichen Kontexten stoße solche Verfahren oftmals an ihre Grenzen. Semantische Technologien haben zum Ziel diesen Mangel zu beheben. Hier wird neben der textuellen und strukturellen Ähnlichkeit zusätzlich die Dimension der Bedeutung betrachtet. In dieser Masterthesis wurde untersucht, ob die Suchergebnisqualität des KnowledgeFinder durch den Einsatz semantischer Technologien verbessert werden kann. Innerhalb einer Machbarkeitsstudie wurde dazu das KnowledgeFinder Framework um semantische Suchverfahren erweitert. Diese Verfahren sollen die fachübergreifende Recherche von DLR-Wissenschaftlern erleichtern, indem sie ihnen helfen, passende Suchergebnisse in den entsprechenden Fachbereichen zu finden.
Die Matrix-Vektor-Multiplikation für dünn besetzte Matrizen (SpMV) stellt für weitreichende wissenschaftliche Anwendungen eine der Kernoperationen des High-Performance-Computing-Bereichs dar. Für die verteilte Berechnung mit immer beliebter werdenden hybriden Rechenclustern kommt dabei die Frage nach einer geeigneten Partitionierungsstrategie für die Verteilung von Daten und Berechnung auf. Diese Arbeit beschäftigt sich damit welchen Einfluss die Struktur der Matrix und die unterschiedlichen Prozessortypen auf die Leistung der SpMV haben und schlägt ein Modell vor, um für diese eine lastbalancierte Verteilung zu erreichen. Wesentliche Bestandteile sind dabei die Laufzeitvorhersage für aktuelle CPUs und GPUs basierend auf einem abgewandelten Roofline-Modell sowie die bewährte Methode der Graph-Partitionierung.
In einem Grid steht Benutzern mit entsprechendem Zugang eine Vielzahl verteilter Ressourcen zur Verfügung. Die daraus entstehenden wirtschaftlichen und technischen Vorteile rechtfertigen die Portierung von bestehenden Desktop-Anwendungen. Die vorliegende Arbeit befasst sich mit der Fragestellung, welche Einflussfaktoren bei der Portierung von Desktop-Anwendungen in ein Grid eine Rolle spielen können und wie diese in Hinblick auf die Machbarkeit zu bewerten sind. Basierend auf den zugrunde liegenden Softwarearchitekturen werden Architekturmerkmale von Desktop-Anwendungen identifiziert und Hypothesen darüber entwickelt, welche Aspekte den Portierungsprozess beeinflussen. Am Beispiel der Portierung der Anwendung „DataFinder“ der Abteilung Verteilte Systeme und Komponentensoftware des DLR werden die entwickelten Hypothesen überprüft. Die Erkenntnisse aus der Beispielportierung werden ausführlich dargestellt und anschließend kritisch diskutiert.
In dieser Arbeit wird im Rahmen von FFE+, einem internen Projekt des Deutschen Zentrums für Luft- und Raumfahrt, eine entscheidungsbasierte Fertigungsstrategie für die Herstellung einer Mikrogasturbinenblisk aus oxidkeramischem Faserverbundwerkstoff entwickelt. Hierfür soll das vakuumbasierte Infusionsverfahren der Abteilung Struktur- und Funktionskeramik des Instituts für Werksstoffforschung verwendet werden. Zunächst wird der theoretische Hintergrund des Materials und die davon etablierte Verarbeitung betrachtet. Aus Basis dieser Grundlage können das System und Funktionen der oxidkeramischen Blisk im Sinne der methodischen Prozessentwicklung bestimmt werden. Die darin formulierten Anforderungen und Bewertungskriterien lassen eine aufwandsreduzierte Entwurfsphase von Konzepten oder Lösungsprinzipien zu. Hierbei ist die Faserstruktur der maßgeblicher Einflussfaktor in der Lösungsfindung. Nach der Bewertung, Validierung und Anpassung der Ergebnisse wird die Fertigungsstrategie auf dem best-bewerteten Konzept und den bisherigen Projekten der Abteilung entworfen. Zusätzlich ist in dieser Arbeit eine Machbarkeitsstudie am Institut für Flugzeugbau der Universität Stuttgart von einem bislang unbekannten Verfahren zur Herstellung oxidkeramischer Faserpreforms durchgeführt worden. Da eine Aussage über die Materialkennwerte für eine sichere Funktionsgewährleistung notwendig ist, sind Materialversuche bei Raum- und Hochtemperatur geplant. Das abschließende Ziel einer Prozessketten-Grundlage von Projekten mit dem vakuumbasierten Infusionsverfahren des Instituts für Werkstoffforschung fasst die Ergebnisse von dieser Arbeit und anderen Erfahrungsberichten zusammen.
The aim of this master thesis was to probe the view of Bonn’s citizens on the smart city project of the German city. A literature review helped defining the smart city term and identifying the smart city concept that is mostly used in Germany. This can be summarized as an urban planning concept using information and communication technology to build citizen centric, sustainable cities. According to this, a smart city should include transparent communication and participation of its citizens. The websites and different publications of Bonn were researched to understand its smart city strategy and vision. This revealed inconsistencies. To resolve these inconsistencies, three representatives of the city were inter-viewed. Based on the knowledge gained up to this point, two groups of Bonn’s inhabitants discussed the Smart City Bonn and presented their perception of it. With the help of this methodology, the following results were obtained. Communication and participation of the city are in many cases in line with the current recommendations for a smart city. Bonn has apparently recognized the relevance of these aspects in theory but should also implement them more consistently in practice. Currently the city council publishes contradictory information and does not plan to incorporate the sight of Bonn’s citizens to develop the smart city strat-egy in the first place, as it is recommended in common literature.
The ability to finely segment different instances of various objects in an environment forms a critical tool in the perception tool-box of any autonomous agent. Traditionally instance segmentation is treated as a multi-label pixel-wise classification problem. This formulation has resulted in networks that are capable of producing high-quality instance masks but are extremely slow for real-world usage, especially on platforms with limited computational capabilities. This thesis investigates an alternate regression-based formulation of instance segmentation to achieve a good trade-off between mask precision and run-time. Particularly the instance masks are parameterized and a CNN is trained to regress to these parameters, analogous to bounding box regression performed by an object detection network.
In this investigation, the instance segmentation masks in the Cityscape dataset are approximated using irregular octagons and an existing object detector network (i.e., SqueezeDet) is modified to regresses to the parameters of these octagonal approximations. The resulting network is referred to as SqueezeDetOcta. At the image boundaries, object instances are only partially visible. Due to the convolutional nature of most object detection networks, special handling of the boundary adhering object instances is warranted. However, the current object detection techniques seem to be unaffected by this and handle all the object instances alike. To this end, this work proposes selectively learning only partial, untainted parameters of the bounding box approximation of the boundary adhering object instances. Anchor-based object detection networks like SqueezeDet and YOLOv2 have a discrepancy between the ground-truth encoding/decoding scheme and the coordinate space used for clustering, to generate the prior anchor shapes. To resolve this disagreement, this work proposes clustering in a space defined by two coordinate axes representing the natural log transformations of the width and height of the ground-truth bounding boxes.
When both SqueezeDet and SqueezeDetOcta were trained from scratch, SqueezeDetOcta lagged behind the SqueezeDet network by a massive ≈ 6.19 mAP. Further analysis revealed that the sparsity of the annotated data was the reason for this lackluster performance of the SqueezeDetOcta network. To mitigate this issue transfer-learning was used to fine-tune the SqueezeDetOcta network starting from the trained weights of the SqueezeDet network. When all the layers of the SqueezeDetOcta were fine-tuned, it outperformed the SqueezeDet network paired with logarithmically extracted anchors by ≈ 0.77 mAP. In addition to this, the forward pass latencies of both SqueezeDet and SqueezeDetOcta are close to ≈ 19ms. Boundary adhesion considerations, during training, resulted in an improvement of ≈ 2.62 mAP of the baseline SqueezeDet network. A SqueezeDet network paired with logarithmically extracted anchors improved the performance of the baseline SqueezeDet network by ≈ 1.85 mAP.
In summary, this work demonstrates that if given sufficient fine instance annotated data, an existing object detection network can be modified to predict much finer approximations (i.e., irregular octagons) of the instance annotations, whilst having the same forward pass latency as that of the bounding box predicting network. The results justify the merits of logarithmically extracted anchors to boost the performance of any anchor-based object detection network. The results also showed that the special handling of image boundary adhering object instances produces more performant object detectors.
Auf der einen Seite wird audiovisuellen Medien die Möglichkeit zugeschrieben, ein Abbild der Wirklichkeit zu schaffen – ein Grund dafür, dass sie im Journalismus von zentraler Bedeutung sind. Auf der anderen Seite ermöglichen die technologischen Entwicklungen der letzten Jahre immer einfacher, kostengünstiger und schneller authentisch wirkende Manipulationen zu erstellen. Noch vor zehn Jahren war die Manipulation von Videomaterial, abgesehen von trivialen Operationen auf Bildebene, nur im Rahmen von Filmproduktionen möglich. Das ist inzwischen anders – synthetische Medien, auch als Deepfakes bekannt, sind in aller Munde. So stellen audiovisuelle Manipulationen Redaktionen vor eine zunehmend größere Herausforderung und schaffen es mitunter bereits als vermeintlich authentischer Inhalt in die Berichterstattung. Es stellt sich die Frage: Inwiefern ist und bleibt es möglich, die Authentizität audiovisuellen Materials in Redaktionen sicherzustellen?
Auf der Grundlage von sieben geführten Experteninterviews mit Akteur:innen aus Wissenschaft und Praxis liefert die Arbeit zusätzlich zu einer aktuellen Beschreibung des technischen Sachstandes in Bezug auf Manipulations- und Verifikationsmöglichkeiten eine Beschreibung und Bewertung der existierenden Probleme und potenzieller Lösungen für Redaktionen, sowie eine Einschätzung der zukünftigen Entwicklung relevanter Technologien und den damit verbundenen Auswirkungen. Im Ergebnis zeigt sich, dass technische Hilfsmittel für Verifikationsprozesse in Redaktionen gebraucht werden, es aber kaum möglich ist, allein auf technologischer Ebene die Authentizität audiovisuellen Materials sicherzustellen. Damit einhergehend seien zurzeit nicht fehlende technische Hilfsmittel die größte Herausforderung für Redaktionen bei der Verifikation, sondern vielmehr der Mangel an Zeit.
Interviewt wurden: Dr. Dominique Dresen – Bundesamt für Sicherheit in der Informationstechnik (BSI), Dr. Jutta Jahnel – Karlsruher Institut für Technologie (KIT), Dr. Christian Riess – FAU Erlangen-Nürnberg, Andrea Sauerbier – SPIEGEL, Jochen Spangenberg – u. a. DW Innovation, Johanna Wild – Bellingcat und Dr. Sascha Zmudzinski – Fraunhofer-Institut für Sichere Informationstechnologie (SIT).