Refine
H-BRS Bibliography
- yes (56) (remove)
Departments, institutes and facilities
Document Type
- Master's Thesis (56) (remove)
Year of publication
Keywords
- Active Learning (2)
- Computer Vision (2)
- Emergency support system (2)
- Mobile sensors (2)
- Object Detection (2)
- deep learning (2)
- object detection (2)
- 0-1-Integer-Problem (1)
- 3D-Lokalisierung (1)
- 3D-Scanner (1)
In order to help journalists investigate inside large audiovisual archives, as maintained by news broadcast agencies, the multimedia data must be indexed by text-based search engies. By automatically creating a transcript through automatic speech recognition (ASR), the spoken word becomes accessible to text search, and queries for keywords are made possible. But stil, important contextual information like the identity of the speaker is not captured. Especially when gathering original footage in the political domain, the identity of the speaker can be the most important query constraint, although this name may not be prominent in the words spoken. It is thus desireable to have this information provided explicitely to the search engine. To provide this information, the archive must be an alyzed by automatic Speaker Identification (SID). While this research topic has seen substantial gains in accuracy and robustness over last years, it has not yet established itself as a helpful, large-scale tool outside the research community. This thesis sets out to establish a workflow to provide automatic speaker identification. Its application is to help journalists searching on speeches given in the German parliament (Bundestag). This is a contribution to the News-Stream 3.0 project, a BMBF funded research project that addresses accessibility of various data sources for journalists.
Das Deutsche Zentrum für Luft- und Raumfahrt (DLR) führt viele Forschungen und Studien im Bereich der Luft- und Raumfahrt durch. Dabei spielen die Studien für die Gesundheit und Medizin auch eine sehr wichtige Rolle bei der DLR. Zu diesem Zweck führt die DLR die Artificial Gravity bed rest study (AGBRESA) im Auftrag der European Space Agency (esa) und in Kooperation der NASA durch. In dieser Studie werden die negativen Auswirkungen der Schwerelosigkeit auf dem Menschen im Weltall simuliert. Dabei werden Experimente durchgeführt, um die negative Auswirkungen entgegenzuwirken. Die Ergebnisse der Experimente werden in der DLR digital, aber auch auf Papier dokumentiert. In diesem Master-Projekt habe ich nun die Aufgabe, die Papierprotokolle für den Bereich der Blutabnahme und der Labordokumentation in eine digitale Form zu ersetzen.
Graphbasierte Diskussionen sind eine Form von Online-Diskussionen, bei denen eine Diskussion als Graph visualisiert wird. Beispielhafte Diskussionsanwendungen sind unter anderem Belvedere [SWCP95], FreeStyler [Gas03] oder Digalo [LK06]. Graphen dieser Art sind, was bestimmte Eigenschaften betrifft, vergleichbar mit Petri-Netzen [Pet62]. So gibt es bei Beiden gewichtete, gerichtete Kanten sowie Knoten verschiedenen Typs, die jeweils bestimmte Eigenschaften besitzen. Im Gegensatz zu einem Petri-Netz, das immer ein bipartiter Graph ist, können bei einem Diskussionsgraphen jedoch prinzipiell alle Knoten miteinander verbunden werden. Moderatoren solcher Diskussionen sind oftmals mit dem Problem konfrontiert, dass sie mehrere Diskussionen gleichzeitig beobachten wollen, was jedoch aufgrund der Komplexität der Struktur von Diskussionsgraphen kaum effizient möglich ist.
Interactive Object Detection
(2019)
The success of state-of-the-art object detection methods depend heavily on the availability of a large amount of annotated image data. The raw image data available from various sources are abundant but non-annotated. Annotating image data is often costly, time-consuming or needs expert help. In this work, a new paradigm of learning called Active Learning is explored which uses user interaction to obtain annotations for a subset of the dataset. The goal of active learning is to achieve superior object detection performance with images that are annotated on demand. To realize active learning method, the trade-off between the effort to annotate (annotation cost) unlabeled data and the performance of object detection model is minimised.
Random Forests based method called Hough Forest is chosen as the object detection model and the annotation cost is calculated as the predicted false positive and false negative rate. The framework is successfully evaluated on two Computer Vision benchmark and two Carl Zeiss custom datasets. Also, an evaluation of RGB, HoG and Deep features for the task is presented.
Experimental results show that using Deep features with Hough Forest achieves the maximum performance. By employing Active Learning, it is demonstrated that performance comparable to the fully supervised setting can be achieved by annotating just 2.5% of the images. To this end, an annotation tool is developed for user interaction during Active Learning.
This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
Diese Arbeit beschäftigt sich mit der Entwicklung eines, für die kontrollierte Freisetzung hydrophiler Wirkstoffe geeigneten, Verkapselungssystems mit dem Ziel die Freisetzung osteospezifischer P2-Liganden zu verzögern, um bei der Behandlung von Knochendefekten kritischer Größe die Bildung neuen Knochengewebes zu gewährleisten. Hierfür werden, unter Anwendung der immersiven Layer-by-Layer-Beschichtung, mit den Modell-Substanzen Adenosintriphosphat und Suramin versetzte, Alginat sowie κ-Carrageen-Kapseln mit Chitosan und Lignosulfonat beschichtet und auf ihr Freisetzungsverhalten hin untersucht.
Semantic Image Segmentation Combining Visible and Near-Infrared Channels with Depth Information
(2015)
Image understanding is a vital task in computer vision that has many applications in areas such as robotics, surveillance and the automobile industry. An important precondition for image understanding is semantic image segmentation, i.e. the correct labeling of every image pixel with its corresponding object name or class. This thesis proposes a machine learning approach for semantic image segmentation that uses images from a multi-modal camera rig. It demonstrates that semantic segmentation can be improved by combining different image types as inputs to a convolutional neural network (CNN), when compared to a single-image approach. In this work a multi-channel near-infrared (NIR) image, an RGB image and a depth map are used. The detection of people is further improved by using a skin image that indicates the presence of human skin in the scene and is computed based on NIR information. It is also shown that segmentation accuracy can be enhanced by using a class voting method based on a superpixel pre-segmentation. Models are trained for 10-class, 3-class and binary classification tasks using an original dataset. Compared to the NIR-only approach, average class accuracy is increased by 7% for 10-class, and by 22% for 3-class classification, reaching a total of 48% and 70% accuracy, respectively. The binary classification task, which focuses on the detection of people, achieves a classification accuracy of 95% and true positive rate of 66%. The report at hand describes the proposed approach and the encountered challenges and shows that a CNN can successfully learn and combine features from multi-modal image sets and use them to predict scene labeling.
This report presents an approach on a quadrotor dynamics stabilization based on ICP SLAM. Because the quadrotor lacks sensory information to detect its horizontal drift an additional sensor as Hokuyo-UTM has been used to perform on-line ICP-based SLAM. The obtained position estimates were used in control loops to maintain desired position and orientation of the vehicle. Such attitude parameters as height, yaw and position in space were controlled based on the laser data. As a result the quadrotor demonstrated two significant for autonomous navigation capabilities: performance of on-line SLAMon a flying vehicle and maintaining desired position in 3D space. Visual approach on optical flow based on Pyramid Lucas-Kanade algorithm has been touched and tested in different environmental conditions though hasn't been implemented in the control loop. Also the performance of the Hokuyo laser scanner and the related to it ICP SLAM algorithm have been tested in different environmental conditions indoors, outdoors and in presence of smoke. Results are presented and discussed. The requirement of performing on-line SLAM algorithm and to carry quite heavy equipment for it forced to seek a solution to increase the payload of the quadrotor with its computational power. A new hardware and distributed software architectures are therefore presented in the report.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.