Refine
H-BRS Bibliography
- yes (57) (remove)
Departments, institutes and facilities
Document Type
- Master's Thesis (57) (remove)
Year of publication
Keywords
- Active Learning (2)
- Computer Vision (2)
- Emergency support system (2)
- Mobile sensors (2)
- Object Detection (2)
- deep learning (2)
- object detection (2)
- 0-1-Integer-Problem (1)
- 3D-Lokalisierung (1)
- 3D-Scanner (1)
Die letzten zwei Jahrzehnte wurden durch das exponentielle Wachstum der zur Verfügung stehenden Daten geprägt. Täglich produzieren Menschen und Maschinen mehr und mehr Daten, die oftmals in verteilten Datenspeichern abgelegt werden. Anwendungsgebiete lassen sich beispielsweise in der Physik und Astronomie finden, wo immense Datenmengen von Teilchenbeschleunigern oder Satelliten erzeugt werden, die gespeichert und verarbeitet werden müssen. Aus diesen Datenmengen können weder vom Menschen direkt noch durch traditionelle Analysemethoden neue Erkenntnisse gewonnen werden. Zur Verarbeitung dieser Datenmassen sind parallele sowie verteilte Datenanalyseverfahren notwendig. [MTT18,NEKH+18]
The objective of this thesis is to implement a computer game based motivation system for maximal strength testing on the Biodex System 3 Isokinetic Dynamometer. The prototype game has been designed to improve the peak torque produced in an isometric knee extensor strength test. An extensive analysis is performed on a torque data set from a previous study. The torque responses for five second long maximal voluntary contractions of the knee extensor are analyzed to understand torque response characteristics of different subjects. The parameters identifed in the data analysis are used in the implementation of the 'Shark and School of Fish' game. The behavior of the game for different torque responses is analyzed on a different torque data set from the previous study. The evaluation shows that the game rewards and motivates continuously over a repetition to reach the peak torque value. The evaluation also shows that the game rewards the user more if he overcomes a baseline torque value within the first second and then gradually increase the torque to reach peak torque.
Die vorliegende Arbeit beschäftigt sich mit Unternehmenspodcasts. Ziel dieser Arbeit ist es aktuelle Erkenntnisse über den Entwicklungsstand bei der Konzeption und Produktion von Unternehmenspodcasts zu erhalten. Fokussiert wird sich hierbei auf die Sicht der Kommunikatoren, in Form von Podcast-Agenturen. Es wird untersucht, ob Trends zu erkennen sind, ob bei unterschiedlichen Podcast-Agenturen ein Erfahrungswissen vorliegt und ob Überschneidungen zu erkennen sind. Für die Beantwortung der Fragestellungen wird in dieser Studie eine qualitative Befragung in Form von Experteninterviews durchgeführt.
Augmented Reality (AR) findet heutzutage sehr viele Anwendungsbereiche. Durch die Überlagerung von virtuellen Informationen mit der realen Umgebung eignet sich diese Technologie besonders für die Unterstützung der Benutzer bei technischen Wartungs- oder Reparaturvorgängen. Damit die virtuellen Daten korrekt mit der realen Welt überlagert werden, müssen Position und Orientierung der Kamera durch ein Trackingverfahren ermittelt werden. In dieser Arbeit wurde für diesen Zweck ein markerloses, modellbasiertes Trackingsystem implementiert. Während einer Initialisierungs-Phase wird die Kamerapose mithilfe von kalibrierten Referenzbildern, sogenannten Keyframes, bestimmt. In einer darauffolgenden Tracking-Phase wird das zu trackende Objekt weiterverfolgt. Evaluiert wurde das System an dem 1:1 Trainingsmodell des biologischen Forschungslabors Biolab, welches von der Europäischen Weltraumorganisation ESA zur Verfügung gestellt wurde.
Statins are a group of hypolipidemic drugs that act by competitive inhibition of the HMGR enzyme. They are generally considered effective and safe but claimed to have side effects on skeletal muscles. A molecular side effect of statins is the block of terpene biosynthesis and hence of dolichol involved in N-glycosylation and O-mannosylation of proteins. Defects in O-mannosylation lead to α-dystroglycan (α-DG) hypoglycosylation and a series of hereditary dystroglycanopathies. The current project aims to get insight into molecular pathomechanisms induced by statins in mammalian muscle cells and to unravel a potential link between these effects and statin-induced decreases of α-DG O-mannosylation. The study was based on mass spectrometric proteomics supported by western blot analysis to reveal Rosuvastatin effects on cellular pathways under high (micromolar) or low (nanomolar) conditions. Differential proteomics revealed higher statin effects on muscle cell function in micromolar than nanomolar concentration, which is reached in the patient’s plasma. We demonstrated distinct and partially overlapping patterns of fold-changed proteins under high and low statin conditions. Gene ontology term enrichment (GOTE) analyses of fold-changed proteins revealed cellular pathways related to muscle function and development are affected, even under low statin conditions, typically reached in the patient’s plasma during prophylactic medication.
In the field of autonomous robotics, sensors have played a major role in defining the scope of technology and to a great extent, limitations of it as well. This cycle of constant updates and hence technological advancement has made given birth to some serious industries which were once inconceivable. Industries like autonomous driving which has a serious impact on safety and security of people, also has an equally harsh implication on the dynamics and economics of the market. With sensors like LiDAR and RADAR delivering 3D measurements as point clouds, there is a necessity to process the raw measurements directly and many research groups are working on the same. A sizable research has gone in solving the task of object detection on 2D images. In this thesis we aim to develop a LiDAR based 3D object detection scheme. We combine the ideas of PointPillars and feature pyramid networks from 2D vision to propose Pillar-FPN. The proposed method directly takes 3D point clouds as input and outputs a 3D bounding box. Our pipeline consists of multiple variations of proposed Pillar-FPN at the feature fusion level that are described in the results section. We have trained our model on the KITTI train dataset and evaluated it on KITTI validation dataset.
The recent explosion of available audio-visual media is the new challenge for information retrieval research. Audio speech recognition systems translate spoken content to the text domain. There is a need for searching and indexing this data which possesses no logical structure. One possible way to structure it on a high level of abstraction is by finding topic boundaries. Two unsupervised topic segmentation methods were evaluated with real-world data in the course of this work. The first one, TSF, models topic shifts as fluctuations in the similarity function of the transcript. The second one, LCSeg, approaches topic changes as places with the least overlapping lexical chains. Only LCSeg performed close to a similar real-world corpus. Other reported results could not be outperformed. Topic analysis based on the repeated word usage models renders topic changes more ambiguous than expected. This issue has more impact on the segmentation quality than the state-of-the-art ASR word error rate. It could be concluded that it is advisable to develop topic segmentation algorithms with real-world data to avoid potential biases to artificial data. Unlike evaluated approaches based on word usage analysis, methods operating with local contexts can be expected to perform better through emulation of semantic dependencies.
Object detectors have improved considerably in the last years by using advanced Convolutional Neural Networks (CNNs) architectures. However, many detector hyper-parameters are not generally tuned, and they are used with values set by the detector authors. Blackbox optimization methods have gained more attention in recent years because of its ability to optimize the hyper-parameters of various machine learning algorithms and deep learning models. However, these methods are not explored in improving CNN-based object detector's hyper-parameters. In this research work, we propose the use of blackbox optimization methods such as Gaussian Process based Bayesian Optimization (BOGP), Sequential Model-based Algorithm Configuration (SMAC), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to tune the hyper-parameters in Faster R-CNN and Single Shot MultiBox Detector (SSD). In Faster R-CNN, tuning the input image size, prior box anchor scales and ratios using BOGP, SMAC, and CMA-ES has increased the performance around 1.5% in terms of Mean Average Precision (mAP) on PASCAL VOC. Tuning the anchor scales of SSD has increased the mAP by 3% on PASCAL VOC and marine debris datasets. On the COCO dataset with SSD, mAP improvement is observed in the medium and large objects, but mAP decreases by 1% in small objects. The experimental results show that the blackbox optimization methods have proved to increase the mAP performance by optimizing the object detectors. Moreover, it has achieved better results than the hand-tuned configurations in most of the cases.
In service robotics, tasks without the involvement of objects are barely applicable, like in searching, fetching or delivering tasks. Service robots are supposed to capture efficiently object related information in real world scenes while for instance considering clutter and noise, and also being flexible and scalable to memorize a large set of objects. Besides object perception tasks like object recognition where the object’s identity is analyzed, object categorization is an important visual object perception cue that associates unknown object instances based on their e.g. appearance or shape to a corresponding category. We present a pipeline from the detection of object candidates in a domestic scene over the description to the final shape categorization of detected candidates. In order to detect object related information in cluttered domestic environments an object detection method is proposed that copes with multiple plane and object occurrences like in cluttered scenes with shelves. Further a surface reconstruction method based on Growing Neural Gas (GNG) in combination with a shape distribution-based descriptor is proposed to reflect shape characteristics of object candidates. Beneficial properties provided by the GNG such as smoothing and denoising effects support a stable description of the object candidates which also leads towards a more stable learning of categories. Based on the presented descriptor a dictionary approach combined with a supervised shape learner is presented to learn prediction models of shape categories.
Experimental results, of different shapes related to domestically appearing object shape categories such as cup, can, box, bottle, bowl, plate and ball, are shown. A classification accuracy of about 90% and a sequential execution time of lesser than two seconds for the categorization of an unknown object is achieved which proves the reasonableness of the proposed system design. Additional results are shown towards object tracking and false positive handling to enhance the robustness of the categorization. Also an initial approach towards incremental shape category learning is proposed that learns a new category based on the set of previously learned shape categories.
The task of this thesis is to develop an OGC-compliant Sensor Observation Service (SOS) { a component of the SWE { for GPS related sensor data in this context. It should, in contrast to existing implementations, support full mobility of the sensors and be con gurable with respect to adding di erent kinds of sensors. In particular, mobile phones should be considered as sensors, which transmit their data to the SOS server through the transactional SOS interface.
This work aims to create a natural language generation (NLG) base for further development of systems for automatic examination questions generation and automatic summarization in Hochschule Bonn-Rhein-Sieg and Fraunhofer IAIS, respectively. Nowadays both tasks are very relevant. The first can significantly simplify the university teachers' work and the second to be of assistance for a faster retrieval of knowledge from an excessively large amount of information that people often work with. We focus on the search for an efficient and robust approach to the controlled NLG problem. Therefore, though the initial idea of the project was the usage of the generative adversarial neural networks (GANs), we switched our attention to more robust and easily-controllable autoencoders. Thus, in this work we implement an autoencoder for unsupervised discovery of latent space representations of text, and show the ability of the system to generate new sentences based on this latent space. Apart from that, we apply Gaussian mixture techniques in order to obtain meaningful text clusters and thereby try to create a tool that would allow us to generate sentences relevant to the semantics of the Gaussian clusters, e.g. positive or negative reviews or examination questions on certain topic. The developed system is tested on several datasets and compared to GANs' performance.
As cameras are ubiquitous in autonomous systems, object detection is a crucial task. Object detectors are widely used in applications such as autonomous driving, healthcare, and robotics. Given an image, an object detector outputs both the bounding box coordinates as well as classification probabilities for each object detected. The state-of-the-art detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications in particular. It is therefore crucial to explain the reason behind each detector decision in order to gain user trust, enhance detector performance, and analyze their failure.
Previous work fails to explain as well as evaluate both bounding box and classification decisions individually for various detectors. Moreover, no tools explain each detector decision, evaluate the explanations, and also identify the reasons for detector failures. This restricts the flexibility to analyze detectors. The main contribution presented here is an open-source Detector Explanation Toolkit (DExT). It is used to explain the detector decisions, evaluate the explanations, and analyze detector errors. The detector decisions are explained visually by highlighting the image pixels that most influence a particular decision. The toolkit implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. To the author’s knowledge, this is the first work to conduct extensive qualitative and novel quantitative evaluations of different explanation methods across various detectors. The qualitative evaluation incorporates a visual analysis of the explanations carried out by the author as well as a human-centric evaluation. The human-centric evaluation includes a user study to understand user trust in the explanations generated across various explanation methods for different detectors. Four multi-object visualization methods are provided to merge the explanations of multiple objects detected in an image as well as the corresponding detector outputs in a single image. Finally, DExT implements the procedure to analyze detector failures using the formulated approach.
The visual analysis illustrates that the ability to explain a model is more dependent on the model itself than the actual ability of the explanation method. In addition, the explanations are affected by the object explained, the decision explained, detector architecture, training data labels, and model parameters. The results of the quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. In addition, a single explanation method cannot generate more faithful explanations than other methods for both the bounding box and the classification decision across different detectors. Both the quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. Finally, a convex polygon-based multi-object visualization method provides more human-understandable visualization than other methods.
The author expects that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.
Auf der einen Seite wird audiovisuellen Medien die Möglichkeit zugeschrieben, ein Abbild der Wirklichkeit zu schaffen – ein Grund dafür, dass sie im Journalismus von zentraler Bedeutung sind. Auf der anderen Seite ermöglichen die technologischen Entwicklungen der letzten Jahre immer einfacher, kostengünstiger und schneller authentisch wirkende Manipulationen zu erstellen. Noch vor zehn Jahren war die Manipulation von Videomaterial, abgesehen von trivialen Operationen auf Bildebene, nur im Rahmen von Filmproduktionen möglich. Das ist inzwischen anders – synthetische Medien, auch als Deepfakes bekannt, sind in aller Munde. So stellen audiovisuelle Manipulationen Redaktionen vor eine zunehmend größere Herausforderung und schaffen es mitunter bereits als vermeintlich authentischer Inhalt in die Berichterstattung. Es stellt sich die Frage: Inwiefern ist und bleibt es möglich, die Authentizität audiovisuellen Materials in Redaktionen sicherzustellen?
Auf der Grundlage von sieben geführten Experteninterviews mit Akteur:innen aus Wissenschaft und Praxis liefert die Arbeit zusätzlich zu einer aktuellen Beschreibung des technischen Sachstandes in Bezug auf Manipulations- und Verifikationsmöglichkeiten eine Beschreibung und Bewertung der existierenden Probleme und potenzieller Lösungen für Redaktionen, sowie eine Einschätzung der zukünftigen Entwicklung relevanter Technologien und den damit verbundenen Auswirkungen. Im Ergebnis zeigt sich, dass technische Hilfsmittel für Verifikationsprozesse in Redaktionen gebraucht werden, es aber kaum möglich ist, allein auf technologischer Ebene die Authentizität audiovisuellen Materials sicherzustellen. Damit einhergehend seien zurzeit nicht fehlende technische Hilfsmittel die größte Herausforderung für Redaktionen bei der Verifikation, sondern vielmehr der Mangel an Zeit.
Interviewt wurden: Dr. Dominique Dresen – Bundesamt für Sicherheit in der Informationstechnik (BSI), Dr. Jutta Jahnel – Karlsruher Institut für Technologie (KIT), Dr. Christian Riess – FAU Erlangen-Nürnberg, Andrea Sauerbier – SPIEGEL, Jochen Spangenberg – u. a. DW Innovation, Johanna Wild – Bellingcat und Dr. Sascha Zmudzinski – Fraunhofer-Institut für Sichere Informationstechnologie (SIT).
The ability to finely segment different instances of various objects in an environment forms a critical tool in the perception tool-box of any autonomous agent. Traditionally instance segmentation is treated as a multi-label pixel-wise classification problem. This formulation has resulted in networks that are capable of producing high-quality instance masks but are extremely slow for real-world usage, especially on platforms with limited computational capabilities. This thesis investigates an alternate regression-based formulation of instance segmentation to achieve a good trade-off between mask precision and run-time. Particularly the instance masks are parameterized and a CNN is trained to regress to these parameters, analogous to bounding box regression performed by an object detection network.
In this investigation, the instance segmentation masks in the Cityscape dataset are approximated using irregular octagons and an existing object detector network (i.e., SqueezeDet) is modified to regresses to the parameters of these octagonal approximations. The resulting network is referred to as SqueezeDetOcta. At the image boundaries, object instances are only partially visible. Due to the convolutional nature of most object detection networks, special handling of the boundary adhering object instances is warranted. However, the current object detection techniques seem to be unaffected by this and handle all the object instances alike. To this end, this work proposes selectively learning only partial, untainted parameters of the bounding box approximation of the boundary adhering object instances. Anchor-based object detection networks like SqueezeDet and YOLOv2 have a discrepancy between the ground-truth encoding/decoding scheme and the coordinate space used for clustering, to generate the prior anchor shapes. To resolve this disagreement, this work proposes clustering in a space defined by two coordinate axes representing the natural log transformations of the width and height of the ground-truth bounding boxes.
When both SqueezeDet and SqueezeDetOcta were trained from scratch, SqueezeDetOcta lagged behind the SqueezeDet network by a massive ≈ 6.19 mAP. Further analysis revealed that the sparsity of the annotated data was the reason for this lackluster performance of the SqueezeDetOcta network. To mitigate this issue transfer-learning was used to fine-tune the SqueezeDetOcta network starting from the trained weights of the SqueezeDet network. When all the layers of the SqueezeDetOcta were fine-tuned, it outperformed the SqueezeDet network paired with logarithmically extracted anchors by ≈ 0.77 mAP. In addition to this, the forward pass latencies of both SqueezeDet and SqueezeDetOcta are close to ≈ 19ms. Boundary adhesion considerations, during training, resulted in an improvement of ≈ 2.62 mAP of the baseline SqueezeDet network. A SqueezeDet network paired with logarithmically extracted anchors improved the performance of the baseline SqueezeDet network by ≈ 1.85 mAP.
In summary, this work demonstrates that if given sufficient fine instance annotated data, an existing object detection network can be modified to predict much finer approximations (i.e., irregular octagons) of the instance annotations, whilst having the same forward pass latency as that of the bounding box predicting network. The results justify the merits of logarithmically extracted anchors to boost the performance of any anchor-based object detection network. The results also showed that the special handling of image boundary adhering object instances produces more performant object detectors.
The aim of this master thesis was to probe the view of Bonn’s citizens on the smart city project of the German city. A literature review helped defining the smart city term and identifying the smart city concept that is mostly used in Germany. This can be summarized as an urban planning concept using information and communication technology to build citizen centric, sustainable cities. According to this, a smart city should include transparent communication and participation of its citizens. The websites and different publications of Bonn were researched to understand its smart city strategy and vision. This revealed inconsistencies. To resolve these inconsistencies, three representatives of the city were inter-viewed. Based on the knowledge gained up to this point, two groups of Bonn’s inhabitants discussed the Smart City Bonn and presented their perception of it. With the help of this methodology, the following results were obtained. Communication and participation of the city are in many cases in line with the current recommendations for a smart city. Bonn has apparently recognized the relevance of these aspects in theory but should also implement them more consistently in practice. Currently the city council publishes contradictory information and does not plan to incorporate the sight of Bonn’s citizens to develop the smart city strat-egy in the first place, as it is recommended in common literature.
In dieser Arbeit wird im Rahmen von FFE+, einem internen Projekt des Deutschen Zentrums für Luft- und Raumfahrt, eine entscheidungsbasierte Fertigungsstrategie für die Herstellung einer Mikrogasturbinenblisk aus oxidkeramischem Faserverbundwerkstoff entwickelt. Hierfür soll das vakuumbasierte Infusionsverfahren der Abteilung Struktur- und Funktionskeramik des Instituts für Werksstoffforschung verwendet werden. Zunächst wird der theoretische Hintergrund des Materials und die davon etablierte Verarbeitung betrachtet. Aus Basis dieser Grundlage können das System und Funktionen der oxidkeramischen Blisk im Sinne der methodischen Prozessentwicklung bestimmt werden. Die darin formulierten Anforderungen und Bewertungskriterien lassen eine aufwandsreduzierte Entwurfsphase von Konzepten oder Lösungsprinzipien zu. Hierbei ist die Faserstruktur der maßgeblicher Einflussfaktor in der Lösungsfindung. Nach der Bewertung, Validierung und Anpassung der Ergebnisse wird die Fertigungsstrategie auf dem best-bewerteten Konzept und den bisherigen Projekten der Abteilung entworfen. Zusätzlich ist in dieser Arbeit eine Machbarkeitsstudie am Institut für Flugzeugbau der Universität Stuttgart von einem bislang unbekannten Verfahren zur Herstellung oxidkeramischer Faserpreforms durchgeführt worden. Da eine Aussage über die Materialkennwerte für eine sichere Funktionsgewährleistung notwendig ist, sind Materialversuche bei Raum- und Hochtemperatur geplant. Das abschließende Ziel einer Prozessketten-Grundlage von Projekten mit dem vakuumbasierten Infusionsverfahren des Instituts für Werkstoffforschung fasst die Ergebnisse von dieser Arbeit und anderen Erfahrungsberichten zusammen.
In einem Grid steht Benutzern mit entsprechendem Zugang eine Vielzahl verteilter Ressourcen zur Verfügung. Die daraus entstehenden wirtschaftlichen und technischen Vorteile rechtfertigen die Portierung von bestehenden Desktop-Anwendungen. Die vorliegende Arbeit befasst sich mit der Fragestellung, welche Einflussfaktoren bei der Portierung von Desktop-Anwendungen in ein Grid eine Rolle spielen können und wie diese in Hinblick auf die Machbarkeit zu bewerten sind. Basierend auf den zugrunde liegenden Softwarearchitekturen werden Architekturmerkmale von Desktop-Anwendungen identifiziert und Hypothesen darüber entwickelt, welche Aspekte den Portierungsprozess beeinflussen. Am Beispiel der Portierung der Anwendung „DataFinder“ der Abteilung Verteilte Systeme und Komponentensoftware des DLR werden die entwickelten Hypothesen überprüft. Die Erkenntnisse aus der Beispielportierung werden ausführlich dargestellt und anschließend kritisch diskutiert.
Die Matrix-Vektor-Multiplikation für dünn besetzte Matrizen (SpMV) stellt für weitreichende wissenschaftliche Anwendungen eine der Kernoperationen des High-Performance-Computing-Bereichs dar. Für die verteilte Berechnung mit immer beliebter werdenden hybriden Rechenclustern kommt dabei die Frage nach einer geeigneten Partitionierungsstrategie für die Verteilung von Daten und Berechnung auf. Diese Arbeit beschäftigt sich damit welchen Einfluss die Struktur der Matrix und die unterschiedlichen Prozessortypen auf die Leistung der SpMV haben und schlägt ein Modell vor, um für diese eine lastbalancierte Verteilung zu erreichen. Wesentliche Bestandteile sind dabei die Laufzeitvorhersage für aktuelle CPUs und GPUs basierend auf einem abgewandelten Roofline-Modell sowie die bewährte Methode der Graph-Partitionierung.
Für die Durchführung größerer Projekte innerhalb des DLR ist es häufig notwendig, dass sich Wissenschaftler fachübergreifend in Themengebiete einarbeiten müssen. Im Rahmen dieser Einarbeitung führen Wissenschaftler Recherchen in fremden Fachbereichen durch. Das DLR hat zu diesem Zweck das Wissensportal KnowledgeFinder entwickelt. Dieses Framework setzt klassische Suchverfahren zum Auffinden von Informationen in beliebigen Datenbeständen ein. Wenn Wissenschaftler in fremden Fachbereichen recherchieren, dann fällt es ihnen aufgrund des oberflächlichen Einblicks oftmals schwer, zielgerichtet nach Informationen zu suchen. Die im KnowledgeFinder eingesetzten klassischen Suchverfahren, die auf textueller und struktureller Ähnlichkeit basieren, können bei diesen unspezifischen Suchanfragen nur bedingt beim Auffinden von relevanten Informationen helfen. Aufgrund von Mehrdeutigkeiten und unterschiedlichen Kontexten stoße solche Verfahren oftmals an ihre Grenzen. Semantische Technologien haben zum Ziel diesen Mangel zu beheben. Hier wird neben der textuellen und strukturellen Ähnlichkeit zusätzlich die Dimension der Bedeutung betrachtet. In dieser Masterthesis wurde untersucht, ob die Suchergebnisqualität des KnowledgeFinder durch den Einsatz semantischer Technologien verbessert werden kann. Innerhalb einer Machbarkeitsstudie wurde dazu das KnowledgeFinder Framework um semantische Suchverfahren erweitert. Diese Verfahren sollen die fachübergreifende Recherche von DLR-Wissenschaftlern erleichtern, indem sie ihnen helfen, passende Suchergebnisse in den entsprechenden Fachbereichen zu finden.
This project focuses on object detection in dense volume data. There are several types of dense volume data, namely Computed Tomography (CT) scan, Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI). This work focuses on CT scans. CT scans are not limited to the medical domain; they are also used in industries. CT scans are used in airport baggage screening, assembly lines, and the object detection systems in these places should be able to detect objects fast. One of the ways to address the issue of computational complexity and make the object detection systems fast is to use low-resolution images. Low-resolution CT scanning is fast. The entire process of scanning and detection can be made faster by using low-resolution images. Even in the medical domain, to reduce the rad iation dose, the exposure time of the patient should be reduced. The exposure time of patients could be reduced by allowing low-resolution CT scans. Hence it is essential to find out which object detection model has better accuracy as well as speed at low-resolution CT scans. However, the existing approaches did not provide details about how the model would perform when the resolution of CT scans is varied. Hence in this project, the goal is to analyze the impact of varying resolution of CT scans on both the speed and accuracy of the model. Three object detection models, namely RetinaNet, YOLOv3, and YOLOv5, were trained at various resolutions. Among the three models, it was found that YOLOv5 has the best mAP and f1 score at multiple resolutions on the DeepLesion dataset. RetinaNet model h as the least inference time on the DeepLesion dataset. From the experiments, it could be asserted that sacrificing mean average precision (mAP) to improve inference time by reducing resolution is feasible.
Modern engineering relies heavily on utilizing computer technologies. This is especially true for thermoplastic manufacturing, such as blow molding. A crucial milestone for digitalization is the continuous integration of data in unified or interoperable systems. While new simulation technologies are constantly developed, data management standards such as STEP fail at integrating them. On the other hand, industrial standards such as ”VMAP” manage to improve interoperability for Small and Medium-sized Enterprises. However, they do not provide Simulation Process and Data Management (SPDM) technologies. For SPDM integration of VMAP data, Ontology-Based Data Access is used to allow continuing the digital thread in custom semantic-based open-source solutions. An ontology of the database format (VMAP) was generated alongside an expandable knowledge graph of data access methods. A Python-based software architecture was developed, automatically using the semantic representations of database format and data access to query data and metadata within the VMAP file. The result is a software architecture template that can be adapted for other data standards and integrated into semantic data management systems. It allows semantic queries on simulation data down to element-wise resolution without integrating the whole model information. The architecture can instantiate a file in a knowledge graph, query a file’s metadatum and, in case it is not yet available, find a semantically represented process that allows the creation and instantiation of the required metadatum. See Figure 1. The results of this thesis can be expected to form a basis for semantic SPDM tools.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
This report presents an approach on a quadrotor dynamics stabilization based on ICP SLAM. Because the quadrotor lacks sensory information to detect its horizontal drift an additional sensor as Hokuyo-UTM has been used to perform on-line ICP-based SLAM. The obtained position estimates were used in control loops to maintain desired position and orientation of the vehicle. Such attitude parameters as height, yaw and position in space were controlled based on the laser data. As a result the quadrotor demonstrated two significant for autonomous navigation capabilities: performance of on-line SLAMon a flying vehicle and maintaining desired position in 3D space. Visual approach on optical flow based on Pyramid Lucas-Kanade algorithm has been touched and tested in different environmental conditions though hasn't been implemented in the control loop. Also the performance of the Hokuyo laser scanner and the related to it ICP SLAM algorithm have been tested in different environmental conditions indoors, outdoors and in presence of smoke. Results are presented and discussed. The requirement of performing on-line SLAM algorithm and to carry quite heavy equipment for it forced to seek a solution to increase the payload of the quadrotor with its computational power. A new hardware and distributed software architectures are therefore presented in the report.
Semantic Image Segmentation Combining Visible and Near-Infrared Channels with Depth Information
(2015)
Image understanding is a vital task in computer vision that has many applications in areas such as robotics, surveillance and the automobile industry. An important precondition for image understanding is semantic image segmentation, i.e. the correct labeling of every image pixel with its corresponding object name or class. This thesis proposes a machine learning approach for semantic image segmentation that uses images from a multi-modal camera rig. It demonstrates that semantic segmentation can be improved by combining different image types as inputs to a convolutional neural network (CNN), when compared to a single-image approach. In this work a multi-channel near-infrared (NIR) image, an RGB image and a depth map are used. The detection of people is further improved by using a skin image that indicates the presence of human skin in the scene and is computed based on NIR information. It is also shown that segmentation accuracy can be enhanced by using a class voting method based on a superpixel pre-segmentation. Models are trained for 10-class, 3-class and binary classification tasks using an original dataset. Compared to the NIR-only approach, average class accuracy is increased by 7% for 10-class, and by 22% for 3-class classification, reaching a total of 48% and 70% accuracy, respectively. The binary classification task, which focuses on the detection of people, achieves a classification accuracy of 95% and true positive rate of 66%. The report at hand describes the proposed approach and the encountered challenges and shows that a CNN can successfully learn and combine features from multi-modal image sets and use them to predict scene labeling.