pub H-BRS | Search

Einführung eines (Halb-)automatisierten Datenerfassungssystems von Blutproben in die Forschungsumgebung des Deutschen Zentrums für Luft- und Raumfahrt e. V. (DLR) (2019)

Zok, Thomas Tobias

Das Deutsche Zentrum für Luft- und Raumfahrt (DLR) führt viele Forschungen und Studien im Bereich der Luft- und Raumfahrt durch. Dabei spielen die Studien für die Gesundheit und Medizin auch eine sehr wichtige Rolle bei der DLR. Zu diesem Zweck führt die DLR die Artificial Gravity bed rest study (AGBRESA) im Auftrag der European Space Agency (esa) und in Kooperation der NASA durch. In dieser Studie werden die negativen Auswirkungen der Schwerelosigkeit auf dem Menschen im Weltall simuliert. Dabei werden Experimente durchgeführt, um die negative Auswirkungen entgegenzuwirken. Die Ergebnisse der Experimente werden in der DLR digital, aber auch auf Papier dokumentiert. In diesem Master-Projekt habe ich nun die Aufgabe, die Papierprotokolle für den Bereich der Blutabnahme und der Labordokumentation in eine digitale Form zu ersetzen.

Entwicklung einer Steuerung für Löser linearer Gleichungssysteme (2012)

Buschulte, Kai

In der Arbeit wurde ein Steuerungsframework für die LAMA-Bibliothek (http://www.libama.org) zur Konfiguration von Lösern linearer Gleichungssysteme entwickelt. Hierzu wurde ein Parser mit der Boost.Spirit-Biblithek realisiert, der die Laufzeitinterpretation einer domänenspezifische Sprache (DSL) erlaubt. Durch die Konfigurationssprache ist es möglich, Löser ohne Einschränkungen über ihre ID zu verknüpfen, diesen Lösern Logger und logisch verknüpfte Haltekriterien zuzuordnen.

Entwicklung und Implementierung von Partitionierungsstrategien für dünn besetzte Matrizen auf hybriden Systemen mit verteiltem Speicher (2012)

Schubert, Lauretta

Die Matrix-Vektor-Multiplikation für dünn besetzte Matrizen (SpMV) stellt für weitreichende wissenschaftliche Anwendungen eine der Kernoperationen des High-Performance-Computing-Bereichs dar. Für die verteilte Berechnung mit immer beliebter werdenden hybriden Rechenclustern kommt dabei die Frage nach einer geeigneten Partitionierungsstrategie für die Verteilung von Daten und Berechnung auf. Diese Arbeit beschäftigt sich damit welchen Einfluss die Struktur der Matrix und die unterschiedlichen Prozessortypen auf die Leistung der SpMV haben und schlägt ein Modell vor, um für diese eine lastbalancierte Verteilung zu erreichen. Wesentliche Bestandteile sind dabei die Laufzeitvorhersage für aktuelle CPUs und GPUs basierend auf einem abgewandelten Roofline-Modell sowie die bewährte Methode der Graph-Partitionierung.

Entwicklung und Inbetriebnahme eines Messsystems zur Bestimmung der Flussdichteverteilung konzentrierter Solarstrahlung (2015)

Wittenberg, Benjamin

Entwicklung und Optimierung einer Methode zur Bestimmung der antioxidativen Aktivität von Kraft- und Organosolv-Lignin via Folin- Ciocalteu-Assay (2019)

Rumpf, Jessica

Entwicklung und Optimierung einer Methode zur Bestimmung der antioxidativen Aktivität von Kraft- und Organosolv-Lignin via Folin- Ciocalteu-Assay

Entwicklung von Aerogelen zur Anwendung in Kreislaufwärmerohren (2020)

Hesse, Anna Mária

Im Rahmen dieser Arbeit wurden Resorcinol-Formaldehyd-Aerogele zur Anwendung in Kreislaufwärmerohren (LHP) als Dochtmaterial entwickelt. Aerogele als Dochtmaterial bilden aufgrund der hohen Porosität und der effektiven Kapillarwirkung eine gute Grundvoraussetzung für Stoff- und Wärmetransport. Diese Eigenschaften können zu einer Verbesserung der Kühlleistung einer Wärmepumpe beitragen. Dazu wurden Aerogele in Dochtform synthetisiert und anschließend erfolgte die Bestimmung der skelettalen Dichte, umhüllenden Dichte, Porosität und Gaspermeabilität. Zusätzlich wurde ein Test zum Schwellverhalten entwickelt. Außerdem wurden die Proben zur Fa. Allatherm gesendet, um die Anforderungen an die entwickelten RFAerogele in Dochtform zu prüfen. Die mechanische Bearbeitbarkeit der Aerogele konnte verbessert werden. Die Porosität und die Gaspermeabilität der untersuchten Aerogele lagen in einem optimalen Bereich. Nur die Durchgangsporengröße der Aerogele, die mittels Gasblasendruck-Analyse bestimmt wurde, benötigt weitere Rezeptentwicklungen und Messungen, um die größte Durchgangspore in Richtung 1 µm einzugrenzen.

Entwicklung, Validierung und Anwendung einer Methode zur Untersuchung von Kunststoffemissionen auf landwirtschaftlichen Nutzflächen (2020)

Brenner, Hannah

Im Rahmen dieser Forschungsarbeit wurde eine praxisorientierte Methode entwickelt, die es ermöglicht, Bodenproben nach ihrer Entnahme auf dem Feld aufzubereiten und hinsichtlich ihres Mikroplastikgehaltes analysieren zu können. Die Extraktionsmethode wurde bereits für zwei Polymere, PA 12 und PE (Mulchfolienpartikel), mit Wiederfindungsraten von je 100 % für Partikel größer als 0,5 mm validiert. Für Partikel größer als 63 μm liegt die Wiederfindungsrate für PE-Mulchfolienpartikel bei 97 % beziehungs-weise für PA-Partikel bei 86 %. Weiterhin wurden verschiedene spektroskopische Detektions-methoden untersucht und hinsichtlich ihrer Potentiale und Grenzen miteinander verglichen. Dabei wurde festgestellt, dass die Digitalmikroskopie zwar sehr gut geeignet ist, die Farbe, Größe, Form und Anzahl der Partikel zu bestimmen, jedoch stark von der subjektiven Einschätzung abhängig ist. Sie sollte daher in jedem Fall mit einer weiteren Detektionsmethode kombiniert werden. In dieser Arbeit wurde hierzu die ATR-FTIR-Spektroskopie verwendet. Diese ermöglicht zusätzlich die Bestimmung des Polymertyps einzelner Partikel mit einer unteren Nachweisgrenze von 500 μm. Die Methode konnte auf insgesamt fünf landwirtschaftlich genutzten Flächen angewendet werden, wovon zwei konventionell und drei ökologisch bewirtschaftet werden. Um einen ersten Eindruck über die aktuelle Mikroplastik-Belastung von Agrarböden zu erhalten, wurden die mit Hilfe der in dieser Forschungsarbeit entwickelten Methode erhaltenen Ergebnisse extrapoliert und als Emissionskoeffizienten in verschiedenen Einheiten angegeben.

Estimation of Prediction Uncertainty for Semantic Scene Labeling Using Bayesian Approximation (2018)

Ajmera, Anand

With the advancement in technology, autonomous and assisted driving are close to being reality. A key component of such systems is the understanding of the surrounding environment. This understanding about the environment can be attained by performing semantic labeling of the driving scenes. Existing deep learning based models have been developed over the years that outperform classical image processing algorithms for the task of semantic labeling. However, the existing models only produce semantic predictions and do not provide a measure of uncertainty about the predictions. Hence, this work focuses on developing a deep learning based semantic labeling model that can produce semantic predictions and their corresponding uncertainties. Autonomous driving needs a real-time operating model, however the Full Resolution Residual Network (FRRN) [4] architecture, which is found as the best performing architecture during literature search, is not able to satisfy this condition. Hence, a small network, similar to FRRN, has been developed and used in this work. Based on the work of [13], the developed network is then extended by adding dropout layers and the dropouts are used during testing to perform approximate Bayesian inference. The existing works on uncertainties, do not have quantitative metrics to evaluate the quality of uncertainties estimated by a model. Hence, the area under curve (AUC) of the receiver operating characteristic (ROC) curves is proposed and used as an evaluation metric in this work. Further, a comparative analysis about the influence of dropout layer position, drop probability and the number of samples, on the quality of uncertainty estimation is performed. Finally, based on the insights gained from the analysis, a model with optimal configuration of dropout is developed. It is then evaluated on the Cityscape dataset and shown to be outperforming the baseline model with an AUC-ROC of about 90%, while the latter having AUC-ROC of about 80%.

Evaluation and generic application scenarios for curved hexahedral adaptive mesh refinement (2022)

Elsweijer, Sandro

In (dynamic) adaptive mesh refinement (AMR) an input mesh is refined or coarsened to the need of the numerical application. This refinement happens with no respect to the originally meshed domain and is therefore limited to the geometrical accuracy of the original input mesh. We presented a novel approach to equip this input mesh with additional geometry information, to allow refinement and high-order cells based on the geometry of the original domain. We already showed a limited implementation of this algorithm. Now we evaluate this prototype with a numerical application and we prove its influence on the accuracy of certain numerical results. To be as practical as possible, we implement the ability to import meshes generated by Gmsh and equip them with the needed geometry information. Furthermore, we improve the mapping algorithm, which maps the geometry information of the boundary of a cell into the cell's volume. With these preliminary steps done, we use out new approach in a simulation of the advection of a concentration along the boundary of a sphere shell and past the boundary of a rotating cylinder. We evaluate the accuracy of our approach in comparison to the conventional refinement of cells to answer our research question: How does the performance and accuracy of the hexahedral curved domain AMR algorithm compare to linear AMR when solving the advection equation with the linear finite volume method? To answer this question, we show the influence of curved AMR on our simulation results and see, that it is even able to outperform far finer linear meshes in terms of accuracy. We also see that the current implementation of this approach is too slow for practical usage. We can therefore prove the benefits of curved AMR in certain, geometry-related application scenarios and show possible improvements to make it more feasible and practical in the future.

Evaluation of Drift Detection Techniques for Automated Machine Learning Pipelines (2023)

Abdelwahab, Hammam

Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of input data. Hence, monitoring data drift becomes essential to maintain the model’s desired performance. Based on the conducted review of the literature on drift detection, statistical hypothesis testing enables to investigate whether incoming data is drifting from training data. Because Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) have shown to be reliable distance measures between multivariate distributions in the literature review, both were selected from several existing techniques for experimentation. For the scope of this work, the image classification use case was experimented with using the Stream-51 dataset. Based on the results from different drift experiments, both MMD and KS showed high Area Under Curve values. However, KS exhibited faster performance than MMD with fewer false positives. Furthermore, the results showed that using the pre-trained ResNet-18 for feature extraction maintained the high performance of the experimented drift detectors. Furthermore, the results showed that the performance of the drift detectors highly depends on the sample sizes of the reference (training) data and the test data that flow into the pipeline’s monitor. Finally, the results also showed that if the test data is a mixture of drifting and non-drifting data, the performance of the drift detectors does not depend on how the drifting data are scattered with the non-drifting ones, but rather their amount in the test set

Evaluation von Ansätzen zur skalierten Ausführung von workflowbasierten Anwendungen am Beispiel einer Audio-Mining Anwendung (2019)

Luhmer, David

Die letzten zwei Jahrzehnte wurden durch das exponentielle Wachstum der zur Verfügung stehenden Daten geprägt. Täglich produzieren Menschen und Maschinen mehr und mehr Daten, die oftmals in verteilten Datenspeichern abgelegt werden. Anwendungsgebiete lassen sich beispielsweise in der Physik und Astronomie finden, wo immense Datenmengen von Teilchenbeschleunigern oder Satelliten erzeugt werden, die gespeichert und verarbeitet werden müssen. Aus diesen Datenmengen können weder vom Menschen direkt noch durch traditionelle Analysemethoden neue Erkenntnisse gewonnen werden. Zur Verarbeitung dieser Datenmassen sind parallele sowie verteilte Datenanalyseverfahren notwendig. [MTT18,NEKH+18]

Fertigungsstrategie einer Turbinenblisk aus oxidischer Faserverbundkeramik mittels Infusionsverfahren unter Berücksichtigung materialtechnischer Einflussfaktoren (2023)

Rudat, Jacob

In dieser Arbeit wird im Rahmen von FFE+, einem internen Projekt des Deutschen Zentrums für Luft- und Raumfahrt, eine entscheidungsbasierte Fertigungsstrategie für die Herstellung einer Mikrogasturbinenblisk aus oxidkeramischem Faserverbundwerkstoff entwickelt. Hierfür soll das vakuumbasierte Infusionsverfahren der Abteilung Struktur- und Funktionskeramik des Instituts für Werksstoffforschung verwendet werden. Zunächst wird der theoretische Hintergrund des Materials und die davon etablierte Verarbeitung betrachtet. Aus Basis dieser Grundlage können das System und Funktionen der oxidkeramischen Blisk im Sinne der methodischen Prozessentwicklung bestimmt werden. Die darin formulierten Anforderungen und Bewertungskriterien lassen eine aufwandsreduzierte Entwurfsphase von Konzepten oder Lösungsprinzipien zu. Hierbei ist die Faserstruktur der maßgeblicher Einflussfaktor in der Lösungsfindung. Nach der Bewertung, Validierung und Anpassung der Ergebnisse wird die Fertigungsstrategie auf dem best-bewerteten Konzept und den bisherigen Projekten der Abteilung entworfen. Zusätzlich ist in dieser Arbeit eine Machbarkeitsstudie am Institut für Flugzeugbau der Universität Stuttgart von einem bislang unbekannten Verfahren zur Herstellung oxidkeramischer Faserpreforms durchgeführt worden. Da eine Aussage über die Materialkennwerte für eine sichere Funktionsgewährleistung notwendig ist, sind Materialversuche bei Raum- und Hochtemperatur geplant. Das abschließende Ziel einer Prozessketten-Grundlage von Projekten mit dem vakuumbasierten Infusionsverfahren des Instituts für Werkstoffforschung fasst die Ergebnisse von dieser Arbeit und anderen Erfahrungsberichten zusammen.

Generalization of SELU to CNN (2019)

Ha, Bach

Neural network based object detectors are able to automatize many difficult, tedious tasks. However, they are usually slow and/or require powerful hardware. One main reason is called Batch Normalization (BN) [1], which is an important method for building these detectors. Recent studies present a potential replacement called Self-normalizing Neural Network (SNN) [2], which at its core is a special activation function named Scaled Exponential Linear Unit (SELU). This replacement seems to have most of BNs benefits while requiring less computational power. Nonetheless, it is uncertain that SELU and neural network based detectors are compatible with one another. An evaluation of SELU incorporated networks would help clarify that uncertainty. Such evaluation is performed through series of tests on different neural networks. After the evaluation, it is concluded that, while indeed faster, SELU is still not as good as BN for building complex object detector networks.

Human Detection and Action Recognition in Video Sequences: Human Character Recognition in TV-Style Movies (2011)

Kläser, Alexander

This master thesis describes a supervised approach to the detection and the identification of humans in TV-style video sequences. In still images and video sequences, humans appear in different poses and views, fully visible and partly occluded, with varying distances to the camera, at different places, under different illumination conditions, etc. This diversity in appearance makes the task of human detection and identification to a particularly challenging problem. A possible solution of this problem is interesting for a wide range of applications such as video surveillance and content-based image and video processing. In order to detect humans in views ranging from full to close-up view and in the presence of clutter and occlusion, they are modeled by an assembly of several upper body parts. For each body part, a detector is trained based on a Support Vector Machine and on densely sampled, SIFT-like feature points in a detection window. For a more robust human detection, localized body parts are assembled using a learned model for geometric relations based on Gaussians. For a flexible human identification, the outward appearance of humans is captured and learned using the Bag-of-Features approach and non-linear Support Vector Machines. Probabilistic votes for each body part are combined to improve classification results. The combined votes yield an identification accuracy of about 80% in our experiments on episodes of the TV series "Buffy the Vampire Slayer". The Bag-of-Features approach has been used in previous work mainly for object classification tasks. Our results show that this approach can also be applied to the identification of humans in video sequences. Despite the difficulty of the given problem, the overall results are good and encourage future work in this direction.

Interactive Object Detection (2019)

Vokuda, Priyanka Subramanya

The success of state-of-the-art object detection methods depend heavily on the availability of a large amount of annotated image data. The raw image data available from various sources are abundant but non-annotated. Annotating image data is often costly, time-consuming or needs expert help. In this work, a new paradigm of learning called Active Learning is explored which uses user interaction to obtain annotations for a subset of the dataset. The goal of active learning is to achieve superior object detection performance with images that are annotated on demand. To realize active learning method, the trade-off between the effort to annotate (annotation cost) unlabeled data and the performance of object detection model is minimised. Random Forests based method called Hough Forest is chosen as the object detection model and the annotation cost is calculated as the predicted false positive and false negative rate. The framework is successfully evaluated on two Computer Vision benchmark and two Carl Zeiss custom datasets. Also, an evaluation of RGB, HoG and Deep features for the task is presented. Experimental results show that using Deep features with Hough Forest achieves the maximum performance. By employing Active Learning, it is demonstrated that performance comparable to the fully supervised setting can be achieved by annotating just 2.5% of the images. To this end, an annotation tool is developed for user interaction during Active Learning.

Konzeption eines Transaktionsmodells für den WebDAV-Standard (2009)

Jung, Martin

Das WebDAV-Protokoll (Web-based Distributed Authoring and Versioning) ermöglicht die Bearbeitung und Verwaltung von Dateien auf einem Web-Server. Aus technischer Sicht ist WebDAV eine Erweiterung des HTTP-Protokolls. Durch die rasche Zunahme und den steigenden Verbreitungsgrad von WebDAV-basierten Anwendungen, wie etwa Dokumentenmanagementsystemen, steigen auch die Anforderungen an deren Zuverlässigkeit. Die voll umfassende Unterstützung von Transaktionen, d.h. die Zusammenfassung einer Menge von Verarbeitungsschritten zu einer logischen Einheit, würde hierzu einen wichtigen Beitrag leisten. Die für Transaktionen geforderten Eigenschaften, die gleichzeitig auch deren Hauptvorteile darstellen, werden durch das bekannte Akronym ACID beschrieben, welches für Atomarität (atomicity), Konsistenz (consistency), Isoliertheit (isolation) und Dauerhaftigkeit (durability) steht. Zurzeit unterstützt das WebDAV-Protokoll allerdings nur die Punkte Konsistenz und Dauerhaftigkeit, eine komplette und vor allem standardkonforme Unterstützung der ACID-Eigenschaften von Transaktionen ist nicht gegeben. Im Rahmen dieser Arbeit wurde nun ein Transaktionsmodell für den WebDAVStandard erarbeitet. Das Modell ermöglicht es, eine Menge von Dateioperationen transaktionsbasiert durchzuführen. Das Modell unterstützt dabei zur Sicherstellung der Serialisierbarkeit sowohl optimistische als auch pessimistische Verfahren. Die Unterstützung des optimistischen Verfahrens wurde dabei von der IETF (Internet Engineering Task Force) als zulässiges und sinnvolles Vorgehen zur Realisierung von Transaktionen mittels WebDAV bestätigt. Für die pessimistischen Verfahren wurde im Rahmen dieser Arbeit aufgezeigt, wie die bestehenden Konzepte des WebDAV-Standards erweitert werden müssen, um dies ebenfalls umsetzen zu können. Um die getroffene Entwurfsentscheidung zu verifizieren, wurde eine prototypische Implementierung des Modells vorgenommen. Hierbei wurde, nach einer entsprechenden Evaluierung und Bewertung, die optimistische Nebenläufigkeitskontrolle umgesetzt. Clientseitig setzt die Implementierung auf der Jackrabit-Library auf, die serverseitige Implementierung verwendet als Grundlage den WebDAV-Server von Subversion.

Linear segmentation of ASR transcripts and text by topic (2011)

Muryshkin, Peter

The recent explosion of available audio-visual media is the new challenge for information retrieval research. Audio speech recognition systems translate spoken content to the text domain. There is a need for searching and indexing this data which possesses no logical structure. One possible way to structure it on a high level of abstraction is by finding topic boundaries. Two unsupervised topic segmentation methods were evaluated with real-world data in the course of this work. The first one, TSF, models topic shifts as fluctuations in the similarity function of the transcript. The second one, LCSeg, approaches topic changes as places with the least overlapping lexical chains. Only LCSeg performed close to a similar real-world corpus. Other reported results could not be outperformed. Topic analysis based on the repeated word usage models renders topic changes more ambiguous than expected. This issue has more impact on the segmentation quality than the state-of-the-art ASR word error rate. It could be concluded that it is advisable to develop topic segmentation algorithms with real-world data to avoid potential biases to artificial data. Unlike evaluated approaches based on word usage analysis, methods operating with local contexts can be expected to perform better through emulation of semantic dependencies.

Markerloses, modellbasiertes Echtzeit-Tracking für AR-Applikationen (2012)

Millberg, Jessica

Augmented Reality (AR) findet heutzutage sehr viele Anwendungsbereiche. Durch die Überlagerung von virtuellen Informationen mit der realen Umgebung eignet sich diese Technologie besonders für die Unterstützung der Benutzer bei technischen Wartungs- oder Reparaturvorgängen. Damit die virtuellen Daten korrekt mit der realen Welt überlagert werden, müssen Position und Orientierung der Kamera durch ein Trackingverfahren ermittelt werden. In dieser Arbeit wurde für diesen Zweck ein markerloses, modellbasiertes Trackingsystem implementiert. Während einer Initialisierungs-Phase wird die Kamerapose mithilfe von kalibrierten Referenzbildern, sogenannten Keyframes, bestimmt. In einer darauffolgenden Tracking-Phase wird das zu trackende Objekt weiterverfolgt. Evaluiert wurde das System an dem 1:1 Trainingsmodell des biologischen Forschungslabors Biolab, welches von der Europäischen Weltraumorganisation ESA zur Verfügung gestellt wurde.

Markush structure reconstruction (2009)

Haupt, Carina S.

Today publications are digitally available which enables researchers to search the text and often also the content of tables. On the contrary, images cannot be searched which is not a problem for most fields, but in chemistry most of the information are contained in images, especially structure diagrams. Next to the "normal" chemical structures, which represent exactly one molecule, there also exist generic structures, so called Markush structures. These contain variable parts and additional textual information which enable them to represent several molecules at once. This can vary between just a few and up to thousands or even millions. This ability lead to a spread of Markush structures in patents, because it enables patents to protect entire families of molecules at once. Next to the prevention of an enumeration of all structures it also has the advantage that, if a Markush structure is used in a patent, it is much harder to determine whether a specific structure is protected by it or not. To solve the question about the protection of a structure, it is necessary to search the patents. Appropriate databases for this task already do exist, but are filled manually. An automatic processing does not yet exist. In this project a Markush structure reconstruction prototype is developed which is able to reconstruct bitmaps including Markush structures (meaning a depiction of the structure and a text part describing the generic parts) into a digital format and save them in the newly developed context-free grammar based file format extSMILES. This format is searchable due to its context-free grammar based design. To be able to develop a Markush structure reconstruction prototype, an in depth analysis of the concept of Markush structures and their requirements for a reconstruction process was performed. Thereby it is stated, that the common connection table concept of the existing file formats is not able to store Markush structures. Especially challenging are conditions for most of the formats. Thus, a context-free grammar based file format is developed, which extends the SMILES format. This extSMILES called format assures the searchability of the results by its context-free grammar based concept, and is able to store all information contained in Markush structures. In addition it is generic, extendable and easily understandable. The developed prototype for the Markush structure reconstruction uses extSMILES as output format and is based on the chemical structure recognition tool chemoCR and the Unstructured Information Management Architecture UIMA. For chemoCR modules are developed which enable it to recognize and assemble Markush structures as well as to return the reconstruction result in extSMILES. For UIMA on the other hand, a pipeline is developed, which is able to analyse and translate the input text files to extSMILES. The results of both tools then are combined and presented in chemoCR. An evaluation of the prototype is performed on a representative set of twelve structures of interest and low image quality which contain all typical Markush elements. Trivial structures containing only one R-group are not evaluated. Due to the challenging nature of the images, no Markush structure could be correctly reconstructed. But by regarding the assumption, that R-group definitions which are described by natural language are excluded from the task, and under the condition that the core structure reconstruction is improved, the rate of success can be increased to 58.4%.

Multi-modal Emotion Categorization in Oral History Interviews (2023)

Viswanath, Anargh

This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.

Open Access

Refine

H-BRS Bibliography

Departments, institutes and facilities

Document Type

Year of publication

Language

Has Fulltext

Keywords

56 search hits