Refine
Departments, institutes and facilities
Document Type
- Master's Thesis (65) (remove)
Year of publication
Keywords
- Active Learning (2)
- Computer Vision (2)
- Emergency support system (2)
- Mobile sensors (2)
- Object Detection (2)
- deep learning (2)
- object detection (2)
- 0-1-Integer-Problem (1)
- 3D-Lokalisierung (1)
- 3D-Scanner (1)
Statins are a group of hypolipidemic drugs that act by competitive inhibition of the HMGR enzyme. They are generally considered effective and safe but claimed to have side effects on skeletal muscles. A molecular side effect of statins is the block of terpene biosynthesis and hence of dolichol involved in N-glycosylation and O-mannosylation of proteins. Defects in O-mannosylation lead to α-dystroglycan (α-DG) hypoglycosylation and a series of hereditary dystroglycanopathies. The current project aims to get insight into molecular pathomechanisms induced by statins in mammalian muscle cells and to unravel a potential link between these effects and statin-induced decreases of α-DG O-mannosylation. The study was based on mass spectrometric proteomics supported by western blot analysis to reveal Rosuvastatin effects on cellular pathways under high (micromolar) or low (nanomolar) conditions. Differential proteomics revealed higher statin effects on muscle cell function in micromolar than nanomolar concentration, which is reached in the patient’s plasma. We demonstrated distinct and partially overlapping patterns of fold-changed proteins under high and low statin conditions. Gene ontology term enrichment (GOTE) analyses of fold-changed proteins revealed cellular pathways related to muscle function and development are affected, even under low statin conditions, typically reached in the patient’s plasma during prophylactic medication.
Die vorliegende Arbeit beschäftigt sich mit Unternehmenspodcasts. Ziel dieser Arbeit ist es aktuelle Erkenntnisse über den Entwicklungsstand bei der Konzeption und Produktion von Unternehmenspodcasts zu erhalten. Fokussiert wird sich hierbei auf die Sicht der Kommunikatoren, in Form von Podcast-Agenturen. Es wird untersucht, ob Trends zu erkennen sind, ob bei unterschiedlichen Podcast-Agenturen ein Erfahrungswissen vorliegt und ob Überschneidungen zu erkennen sind. Für die Beantwortung der Fragestellungen wird in dieser Studie eine qualitative Befragung in Form von Experteninterviews durchgeführt.
This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
Auf der einen Seite wird audiovisuellen Medien die Möglichkeit zugeschrieben, ein Abbild der Wirklichkeit zu schaffen – ein Grund dafür, dass sie im Journalismus von zentraler Bedeutung sind. Auf der anderen Seite ermöglichen die technologischen Entwicklungen der letzten Jahre immer einfacher, kostengünstiger und schneller authentisch wirkende Manipulationen zu erstellen. Noch vor zehn Jahren war die Manipulation von Videomaterial, abgesehen von trivialen Operationen auf Bildebene, nur im Rahmen von Filmproduktionen möglich. Das ist inzwischen anders – synthetische Medien, auch als Deepfakes bekannt, sind in aller Munde. So stellen audiovisuelle Manipulationen Redaktionen vor eine zunehmend größere Herausforderung und schaffen es mitunter bereits als vermeintlich authentischer Inhalt in die Berichterstattung. Es stellt sich die Frage: Inwiefern ist und bleibt es möglich, die Authentizität audiovisuellen Materials in Redaktionen sicherzustellen?
Auf der Grundlage von sieben geführten Experteninterviews mit Akteur:innen aus Wissenschaft und Praxis liefert die Arbeit zusätzlich zu einer aktuellen Beschreibung des technischen Sachstandes in Bezug auf Manipulations- und Verifikationsmöglichkeiten eine Beschreibung und Bewertung der existierenden Probleme und potenzieller Lösungen für Redaktionen, sowie eine Einschätzung der zukünftigen Entwicklung relevanter Technologien und den damit verbundenen Auswirkungen. Im Ergebnis zeigt sich, dass technische Hilfsmittel für Verifikationsprozesse in Redaktionen gebraucht werden, es aber kaum möglich ist, allein auf technologischer Ebene die Authentizität audiovisuellen Materials sicherzustellen. Damit einhergehend seien zurzeit nicht fehlende technische Hilfsmittel die größte Herausforderung für Redaktionen bei der Verifikation, sondern vielmehr der Mangel an Zeit.
Interviewt wurden: Dr. Dominique Dresen – Bundesamt für Sicherheit in der Informationstechnik (BSI), Dr. Jutta Jahnel – Karlsruher Institut für Technologie (KIT), Dr. Christian Riess – FAU Erlangen-Nürnberg, Andrea Sauerbier – SPIEGEL, Jochen Spangenberg – u. a. DW Innovation, Johanna Wild – Bellingcat und Dr. Sascha Zmudzinski – Fraunhofer-Institut für Sichere Informationstechnologie (SIT).
This thesis investigates the benefit of rubrics for grading short answers using an active learning mechanism. Automating short answer grading using Natural Language Processing (NLP) is one of the active research areas in the education domain. This could save time for the evaluator and invest more time in preparing for the lecture. Most of the research on short answer grading was treated as a similarity task between reference and student answers. However, grading based on reference answers does not account for partial grades and does not provide feedback. Also, the grading is automatic that tries to replace the evaluator. Hence, using rubrics for short answer grading with active learning eliminates the drawbacks mentioned earlier.
Initially, the proposed approach is evaluated on the Mohler dataset, popularly used to benchmark the methodology. This phase is used to determine the parameters for the proposed approach. Therefore, the approach with the selected parameter exceeds the performance of current State-Of-The-Art (SOTA) methods resulting in the Pearson correlation value of 0.63 and Root Mean Square Error (RMSE) of 0.85. The proposed approach has surpassed the SOTA methods by almost 4%.
Finally, the benchmarked approach is used to grade the short answer based on rubrics instead of reference answers. The proposed approach evaluates short answers from Autonomous Mobile Robot (AMR) dataset to provide scores and feedback (formative assessment) based on the rubrics. The average performance of the dataset results in the Pearson correlation value of 0.61 and RMSE of 0.83. Thus, this research has proven that rubrics-based grading achieves formative assessment without compromising performance. In addition, the rubrics have the advantage of generalizability to all answers.
In dieser Arbeit wird im Rahmen von FFE+, einem internen Projekt des Deutschen Zentrums für Luft- und Raumfahrt, eine entscheidungsbasierte Fertigungsstrategie für die Herstellung einer Mikrogasturbinenblisk aus oxidkeramischem Faserverbundwerkstoff entwickelt. Hierfür soll das vakuumbasierte Infusionsverfahren der Abteilung Struktur- und Funktionskeramik des Instituts für Werksstoffforschung verwendet werden. Zunächst wird der theoretische Hintergrund des Materials und die davon etablierte Verarbeitung betrachtet. Aus Basis dieser Grundlage können das System und Funktionen der oxidkeramischen Blisk im Sinne der methodischen Prozessentwicklung bestimmt werden. Die darin formulierten Anforderungen und Bewertungskriterien lassen eine aufwandsreduzierte Entwurfsphase von Konzepten oder Lösungsprinzipien zu. Hierbei ist die Faserstruktur der maßgeblicher Einflussfaktor in der Lösungsfindung. Nach der Bewertung, Validierung und Anpassung der Ergebnisse wird die Fertigungsstrategie auf dem best-bewerteten Konzept und den bisherigen Projekten der Abteilung entworfen. Zusätzlich ist in dieser Arbeit eine Machbarkeitsstudie am Institut für Flugzeugbau der Universität Stuttgart von einem bislang unbekannten Verfahren zur Herstellung oxidkeramischer Faserpreforms durchgeführt worden. Da eine Aussage über die Materialkennwerte für eine sichere Funktionsgewährleistung notwendig ist, sind Materialversuche bei Raum- und Hochtemperatur geplant. Das abschließende Ziel einer Prozessketten-Grundlage von Projekten mit dem vakuumbasierten Infusionsverfahren des Instituts für Werkstoffforschung fasst die Ergebnisse von dieser Arbeit und anderen Erfahrungsberichten zusammen.
Modern engineering relies heavily on utilizing computer technologies. This is especially true for thermoplastic manufacturing, such as blow molding. A crucial milestone for digitalization is the continuous integration of data in unified or interoperable systems. While new simulation technologies are constantly developed, data management standards such as STEP fail at integrating them. On the other hand, industrial standards such as ”VMAP” manage to improve interoperability for Small and Medium-sized Enterprises. However, they do not provide Simulation Process and Data Management (SPDM) technologies. For SPDM integration of VMAP data, Ontology-Based Data Access is used to allow continuing the digital thread in custom semantic-based open-source solutions. An ontology of the database format (VMAP) was generated alongside an expandable knowledge graph of data access methods. A Python-based software architecture was developed, automatically using the semantic representations of database format and data access to query data and metadata within the VMAP file. The result is a software architecture template that can be adapted for other data standards and integrated into semantic data management systems. It allows semantic queries on simulation data down to element-wise resolution without integrating the whole model information. The architecture can instantiate a file in a knowledge graph, query a file’s metadatum and, in case it is not yet available, find a semantically represented process that allows the creation and instantiation of the required metadatum. See Figure 1. The results of this thesis can be expected to form a basis for semantic SPDM tools.
Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of input data. Hence, monitoring data drift becomes essential to maintain the model’s desired performance. Based on the conducted review of the literature on drift detection, statistical hypothesis testing enables to investigate whether incoming data is drifting from training data. Because Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) have shown to be reliable distance measures between multivariate distributions in the literature review, both were selected from several existing techniques for experimentation. For the scope of this work, the image classification use case was experimented with using the Stream-51 dataset. Based on the results from different drift experiments, both MMD and KS showed high Area Under Curve values. However, KS exhibited faster performance than MMD with fewer false positives. Furthermore, the results showed that using the pre-trained ResNet-18 for feature extraction maintained the high performance of the experimented drift detectors. Furthermore, the results showed that the performance of the drift detectors highly depends on the sample sizes of the reference (training) data and the test data that flow into the pipeline’s monitor. Finally, the results also showed that if the test data is a mixture of drifting and non-drifting data, the performance of the drift detectors does not depend on how the drifting data are scattered with the non-drifting ones, but rather their amount in the test set
The aim of this master thesis was to probe the view of Bonn’s citizens on the smart city project of the German city. A literature review helped defining the smart city term and identifying the smart city concept that is mostly used in Germany. This can be summarized as an urban planning concept using information and communication technology to build citizen centric, sustainable cities. According to this, a smart city should include transparent communication and participation of its citizens. The websites and different publications of Bonn were researched to understand its smart city strategy and vision. This revealed inconsistencies. To resolve these inconsistencies, three representatives of the city were inter-viewed. Based on the knowledge gained up to this point, two groups of Bonn’s inhabitants discussed the Smart City Bonn and presented their perception of it. With the help of this methodology, the following results were obtained. Communication and participation of the city are in many cases in line with the current recommendations for a smart city. Bonn has apparently recognized the relevance of these aspects in theory but should also implement them more consistently in practice. Currently the city council publishes contradictory information and does not plan to incorporate the sight of Bonn’s citizens to develop the smart city strat-egy in the first place, as it is recommended in common literature.
In the field of autonomous robotics, sensors have played a major role in defining the scope of technology and to a great extent, limitations of it as well. This cycle of constant updates and hence technological advancement has made given birth to some serious industries which were once inconceivable. Industries like autonomous driving which has a serious impact on safety and security of people, also has an equally harsh implication on the dynamics and economics of the market. With sensors like LiDAR and RADAR delivering 3D measurements as point clouds, there is a necessity to process the raw measurements directly and many research groups are working on the same. A sizable research has gone in solving the task of object detection on 2D images. In this thesis we aim to develop a LiDAR based 3D object detection scheme. We combine the ideas of PointPillars and feature pyramid networks from 2D vision to propose Pillar-FPN. The proposed method directly takes 3D point clouds as input and outputs a 3D bounding box. Our pipeline consists of multiple variations of proposed Pillar-FPN at the feature fusion level that are described in the results section. We have trained our model on the KITTI train dataset and evaluated it on KITTI validation dataset.
This project focuses on object detection in dense volume data. There are several types of dense volume data, namely Computed Tomography (CT) scan, Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI). This work focuses on CT scans. CT scans are not limited to the medical domain; they are also used in industries. CT scans are used in airport baggage screening, assembly lines, and the object detection systems in these places should be able to detect objects fast. One of the ways to address the issue of computational complexity and make the object detection systems fast is to use low-resolution images. Low-resolution CT scanning is fast. The entire process of scanning and detection can be made faster by using low-resolution images. Even in the medical domain, to reduce the rad iation dose, the exposure time of the patient should be reduced. The exposure time of patients could be reduced by allowing low-resolution CT scans. Hence it is essential to find out which object detection model has better accuracy as well as speed at low-resolution CT scans. However, the existing approaches did not provide details about how the model would perform when the resolution of CT scans is varied. Hence in this project, the goal is to analyze the impact of varying resolution of CT scans on both the speed and accuracy of the model. Three object detection models, namely RetinaNet, YOLOv3, and YOLOv5, were trained at various resolutions. Among the three models, it was found that YOLOv5 has the best mAP and f1 score at multiple resolutions on the DeepLesion dataset. RetinaNet model h as the least inference time on the DeepLesion dataset. From the experiments, it could be asserted that sacrificing mean average precision (mAP) to improve inference time by reducing resolution is feasible.
In (dynamic) adaptive mesh refinement (AMR) an input mesh is refined or coarsened to the need of the numerical application. This refinement happens with no respect to the originally meshed domain and is therefore limited to the geometrical accuracy of the original input mesh. We presented a novel approach to equip this input mesh with additional geometry information, to allow refinement and high-order cells based on the geometry of the original domain. We already showed a limited implementation of this algorithm. Now we evaluate this prototype with a numerical application and we prove its influence on the accuracy of certain numerical results. To be as practical as possible, we implement the ability to import meshes generated by Gmsh and equip them with the needed geometry information. Furthermore, we improve the mapping algorithm, which maps the geometry information of the boundary of a cell into the cell's volume. With these preliminary steps done, we use out new approach in a simulation of the advection of a concentration along the boundary of a sphere shell and past the boundary of a rotating cylinder. We evaluate the accuracy of our approach in comparison to the conventional refinement of cells to answer our research question: How does the performance and accuracy of the hexahedral curved domain AMR algorithm compare to linear AMR when solving the advection equation with the linear finite volume method? To answer this question, we show the influence of curved AMR on our simulation results and see, that it is even able to outperform far finer linear meshes in terms of accuracy. We also see that the current implementation of this approach is too slow for practical usage. We can therefore prove the benefits of curved AMR in certain, geometry-related application scenarios and show possible improvements to make it more feasible and practical in the future.
As cameras are ubiquitous in autonomous systems, object detection is a crucial task. Object detectors are widely used in applications such as autonomous driving, healthcare, and robotics. Given an image, an object detector outputs both the bounding box coordinates as well as classification probabilities for each object detected. The state-of-the-art detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications in particular. It is therefore crucial to explain the reason behind each detector decision in order to gain user trust, enhance detector performance, and analyze their failure.
Previous work fails to explain as well as evaluate both bounding box and classification decisions individually for various detectors. Moreover, no tools explain each detector decision, evaluate the explanations, and also identify the reasons for detector failures. This restricts the flexibility to analyze detectors. The main contribution presented here is an open-source Detector Explanation Toolkit (DExT). It is used to explain the detector decisions, evaluate the explanations, and analyze detector errors. The detector decisions are explained visually by highlighting the image pixels that most influence a particular decision. The toolkit implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. To the author’s knowledge, this is the first work to conduct extensive qualitative and novel quantitative evaluations of different explanation methods across various detectors. The qualitative evaluation incorporates a visual analysis of the explanations carried out by the author as well as a human-centric evaluation. The human-centric evaluation includes a user study to understand user trust in the explanations generated across various explanation methods for different detectors. Four multi-object visualization methods are provided to merge the explanations of multiple objects detected in an image as well as the corresponding detector outputs in a single image. Finally, DExT implements the procedure to analyze detector failures using the formulated approach.
The visual analysis illustrates that the ability to explain a model is more dependent on the model itself than the actual ability of the explanation method. In addition, the explanations are affected by the object explained, the decision explained, detector architecture, training data labels, and model parameters. The results of the quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. In addition, a single explanation method cannot generate more faithful explanations than other methods for both the bounding box and the classification decision across different detectors. Both the quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. Finally, a convex polygon-based multi-object visualization method provides more human-understandable visualization than other methods.
The author expects that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.
Die vorliegende Studie untersucht als Erste simultan die Auswirkungen des dreidimensionalen Konstrukts der prozeduralen, distributiven und kommunikativen Lohntransparenz auf Arbeitnehmer, auch unter Berücksichtigung von persönlichen Einstellungen und dem tatsächlichen Gehalt anhand einer deutschen Stichprobe (N = 159). Hierfür wurden Angestellte in einer querschnittlichen Online-Fragebogenstudie zu der wahrgenommenen Lohntransparenz in ihrer Organisation sowie zu weiteren arbeitnehmer- und organisationsrelevanten Variablen befragt. Mittels regressionsanalytischer Untersuchungen konnten hypothesenkonform positive Zusammenhänge der Lohntransparenz mit der Lohnzufriedenheit, der Wahrnehmung prozeduraler und distributiver Gerechtigkeit sowie mit dem Empfinden organisationalen Vertrauens nachgewiesen werden. Von wesentlicher Bedeutung für die Zusammenhänge war allerdings lediglich die prozedurale Lohntransparenz als eine der drei Dimensionen. Weiterhin ergaben Moderatoranalysen, dass ein geringes Bedürfnis nach informationeller Privatheit sowie ein geringes Bruttoentgelt die positiven Zusammenhänge der Lohntransparenz mit den Kriteriumsvariablen partiell verstärken. Abschließend werden Implikationen der Befunde für die Forschung und Praxis vor dem Hintergrund der Einschränkungen, denen diese Studie unterliegt, erläutert.
Im Rahmen dieser Forschungsarbeit wurde eine praxisorientierte Methode entwickelt, die es ermöglicht, Bodenproben nach ihrer Entnahme auf dem Feld aufzubereiten und hinsichtlich ihres Mikroplastikgehaltes analysieren zu können. Die Extraktionsmethode wurde bereits für zwei Polymere, PA 12 und PE (Mulchfolienpartikel), mit Wiederfindungsraten von je 100 % für Partikel größer als 0,5 mm validiert. Für Partikel größer als 63 μm liegt die Wiederfindungsrate für PE-Mulchfolienpartikel bei 97 % beziehungs-weise für PA-Partikel bei 86 %. Weiterhin wurden verschiedene spektroskopische Detektions-methoden untersucht und hinsichtlich ihrer Potentiale und Grenzen miteinander verglichen. Dabei wurde festgestellt, dass die Digitalmikroskopie zwar sehr gut geeignet ist, die Farbe, Größe, Form und Anzahl der Partikel zu bestimmen, jedoch stark von der subjektiven Einschätzung abhängig ist. Sie sollte daher in jedem Fall mit einer weiteren Detektionsmethode kombiniert werden. In dieser Arbeit wurde hierzu die ATR-FTIR-Spektroskopie verwendet. Diese ermöglicht zusätzlich die Bestimmung des Polymertyps einzelner Partikel mit einer unteren Nachweisgrenze von 500 μm. Die Methode konnte auf insgesamt fünf landwirtschaftlich genutzten Flächen angewendet werden, wovon zwei konventionell und drei ökologisch bewirtschaftet werden. Um einen ersten Eindruck über die aktuelle Mikroplastik-Belastung von Agrarböden zu erhalten, wurden die mit Hilfe der in dieser Forschungsarbeit entwickelten Methode erhaltenen Ergebnisse extrapoliert und als Emissionskoeffizienten in verschiedenen Einheiten angegeben.
Im Rahmen dieser Arbeit wurden Resorcinol-Formaldehyd-Aerogele zur Anwendung in Kreislaufwärmerohren (LHP) als Dochtmaterial entwickelt. Aerogele als Dochtmaterial bilden aufgrund der hohen Porosität und der effektiven Kapillarwirkung eine gute Grundvoraussetzung für Stoff- und Wärmetransport. Diese Eigenschaften können zu einer Verbesserung der Kühlleistung einer Wärmepumpe beitragen. Dazu wurden Aerogele in Dochtform synthetisiert und anschließend erfolgte die Bestimmung der skelettalen Dichte, umhüllenden Dichte, Porosität und Gaspermeabilität. Zusätzlich wurde ein Test zum Schwellverhalten entwickelt. Außerdem wurden die Proben zur Fa. Allatherm gesendet, um die Anforderungen an die entwickelten RFAerogele in Dochtform zu prüfen. Die mechanische Bearbeitbarkeit der Aerogele konnte verbessert werden. Die Porosität und die Gaspermeabilität der untersuchten Aerogele lagen in einem optimalen Bereich. Nur die Durchgangsporengröße der Aerogele, die mittels Gasblasendruck-Analyse bestimmt wurde, benötigt weitere Rezeptentwicklungen und Messungen, um die größte Durchgangspore in Richtung 1 µm einzugrenzen.
The ability to finely segment different instances of various objects in an environment forms a critical tool in the perception tool-box of any autonomous agent. Traditionally instance segmentation is treated as a multi-label pixel-wise classification problem. This formulation has resulted in networks that are capable of producing high-quality instance masks but are extremely slow for real-world usage, especially on platforms with limited computational capabilities. This thesis investigates an alternate regression-based formulation of instance segmentation to achieve a good trade-off between mask precision and run-time. Particularly the instance masks are parameterized and a CNN is trained to regress to these parameters, analogous to bounding box regression performed by an object detection network.
In this investigation, the instance segmentation masks in the Cityscape dataset are approximated using irregular octagons and an existing object detector network (i.e., SqueezeDet) is modified to regresses to the parameters of these octagonal approximations. The resulting network is referred to as SqueezeDetOcta. At the image boundaries, object instances are only partially visible. Due to the convolutional nature of most object detection networks, special handling of the boundary adhering object instances is warranted. However, the current object detection techniques seem to be unaffected by this and handle all the object instances alike. To this end, this work proposes selectively learning only partial, untainted parameters of the bounding box approximation of the boundary adhering object instances. Anchor-based object detection networks like SqueezeDet and YOLOv2 have a discrepancy between the ground-truth encoding/decoding scheme and the coordinate space used for clustering, to generate the prior anchor shapes. To resolve this disagreement, this work proposes clustering in a space defined by two coordinate axes representing the natural log transformations of the width and height of the ground-truth bounding boxes.
When both SqueezeDet and SqueezeDetOcta were trained from scratch, SqueezeDetOcta lagged behind the SqueezeDet network by a massive ≈ 6.19 mAP. Further analysis revealed that the sparsity of the annotated data was the reason for this lackluster performance of the SqueezeDetOcta network. To mitigate this issue transfer-learning was used to fine-tune the SqueezeDetOcta network starting from the trained weights of the SqueezeDet network. When all the layers of the SqueezeDetOcta were fine-tuned, it outperformed the SqueezeDet network paired with logarithmically extracted anchors by ≈ 0.77 mAP. In addition to this, the forward pass latencies of both SqueezeDet and SqueezeDetOcta are close to ≈ 19ms. Boundary adhesion considerations, during training, resulted in an improvement of ≈ 2.62 mAP of the baseline SqueezeDet network. A SqueezeDet network paired with logarithmically extracted anchors improved the performance of the baseline SqueezeDet network by ≈ 1.85 mAP.
In summary, this work demonstrates that if given sufficient fine instance annotated data, an existing object detection network can be modified to predict much finer approximations (i.e., irregular octagons) of the instance annotations, whilst having the same forward pass latency as that of the bounding box predicting network. The results justify the merits of logarithmically extracted anchors to boost the performance of any anchor-based object detection network. The results also showed that the special handling of image boundary adhering object instances produces more performant object detectors.
Das Deutsche Zentrum für Luft- und Raumfahrt (DLR) führt viele Forschungen und Studien im Bereich der Luft- und Raumfahrt durch. Dabei spielen die Studien für die Gesundheit und Medizin auch eine sehr wichtige Rolle bei der DLR. Zu diesem Zweck führt die DLR die Artificial Gravity bed rest study (AGBRESA) im Auftrag der European Space Agency (esa) und in Kooperation der NASA durch. In dieser Studie werden die negativen Auswirkungen der Schwerelosigkeit auf dem Menschen im Weltall simuliert. Dabei werden Experimente durchgeführt, um die negative Auswirkungen entgegenzuwirken. Die Ergebnisse der Experimente werden in der DLR digital, aber auch auf Papier dokumentiert. In diesem Master-Projekt habe ich nun die Aufgabe, die Papierprotokolle für den Bereich der Blutabnahme und der Labordokumentation in eine digitale Form zu ersetzen.