Refine
H-BRS Bibliography
- yes (29) (remove)
Departments, institutes and facilities
Document Type
- Master's Thesis (29) (remove)
Year of publication
Language
- English (29) (remove)
Keywords
- Active Learning (2)
- Computer Vision (2)
- Emergency support system (2)
- Mobile sensors (2)
- Object Detection (2)
- deep learning (2)
- object detection (2)
- 3D-Scanner (1)
- ASAG (1)
- Adaptive mesh refinement (1)
This thesis proposes a multi-label classification approach using the Multimodal Transformer (MulT) [80] to perform multi-modal emotion categorization on a dataset of oral histories archived at the Haus der Geschichte (HdG). Prior uni-modal emotion classification experiments conducted on the novel HdG dataset provided less than satisfactory results. They uncovered issues such as class imbalance, ambiguities in emotion perception between annotators, and lack of representative training data to perform transfer learning [28]. Hence, the objectives of this thesis were to achieve better results by performing a multi-modal fusion and resolving the problems arising from class imbalance and annotator-induced bias in emotion perception. A further objective was to assess the quality of the novel HdG dataset and benchmark the results using SOTA techniques. Through a literature survey on the challenges, models, and datasets related to multi-modal emotion recognition, we created a methodology utilizing the MulT along with a multi-label classification approach. This approach produced a considerable improvement in the overall emotion recognition by obtaining an average AUC of 0.74 and Balanced-accuracy of 0.70 on the HdG dataset, which is comparable to state-of-the-art (SOTA) results on other datasets. In this manner, we were also able to benchmark the novel HdG dataset as well as introduce a novel multi-annotator learning approach to understand each annotator’s relative strengths and weaknesses for emotion perception. Our evaluation results highlight the potential benefits of the novel multi-annotator learning approach in improving overall performance by resolving the problems arising from annotator-induced bias and variation in the perception of emotions. Complementing these results, we performed a further qualitative analysis of the HdG annotations with a psychologist to study the ambiguities found in the annotations. We conclude that the ambiguities in annotations may have resulted from a combination of several socio-psychological factors and systemic issues associated with the process of creating these annotations. As these problems are also present in most multi-modal emotion recognition datasets, we conclude that the domain could benefit from a set of annotation guidelines to create standardized datasets.
Object detection concerns the classification and localization of objects in an image. To cope with changes in the environment, such as when new classes are added or a new domain is encountered, the detector needs to update itself with the new information while retaining knowledge learned in the past. Previous works have shown that training the detector solely on new data would produce a severe "forgetting" effect, in which the performance on past tasks deteriorates through each new learning phase. However, in many cases, storing and accessing past data is not possible due to privacy concerns or storage constraints. This project aims to investigate promising continual learning strategies for object detection without storing and accessing past training images and labels. We show that by utilizing the pseudo-background trick to deal with missing labels, and knowledge distillation to deal with missing data, the forgetting effect can be significantly reduced in both class-incremental and domain-incremental scenarios. Furthermore, an integration of a small latent replay buffer can result in a positive backward transfer, indicating the enhancement of past knowledge when new knowledge is learned.
This thesis investigates the benefit of rubrics for grading short answers using an active learning mechanism. Automating short answer grading using Natural Language Processing (NLP) is one of the active research areas in the education domain. This could save time for the evaluator and invest more time in preparing for the lecture. Most of the research on short answer grading was treated as a similarity task between reference and student answers. However, grading based on reference answers does not account for partial grades and does not provide feedback. Also, the grading is automatic that tries to replace the evaluator. Hence, using rubrics for short answer grading with active learning eliminates the drawbacks mentioned earlier.
Initially, the proposed approach is evaluated on the Mohler dataset, popularly used to benchmark the methodology. This phase is used to determine the parameters for the proposed approach. Therefore, the approach with the selected parameter exceeds the performance of current State-Of-The-Art (SOTA) methods resulting in the Pearson correlation value of 0.63 and Root Mean Square Error (RMSE) of 0.85. The proposed approach has surpassed the SOTA methods by almost 4%.
Finally, the benchmarked approach is used to grade the short answer based on rubrics instead of reference answers. The proposed approach evaluates short answers from Autonomous Mobile Robot (AMR) dataset to provide scores and feedback (formative assessment) based on the rubrics. The average performance of the dataset results in the Pearson correlation value of 0.61 and RMSE of 0.83. Thus, this research has proven that rubrics-based grading achieves formative assessment without compromising performance. In addition, the rubrics have the advantage of generalizability to all answers.
Modern engineering relies heavily on utilizing computer technologies. This is especially true for thermoplastic manufacturing, such as blow molding. A crucial milestone for digitalization is the continuous integration of data in unified or interoperable systems. While new simulation technologies are constantly developed, data management standards such as STEP fail at integrating them. On the other hand, industrial standards such as ”VMAP” manage to improve interoperability for Small and Medium-sized Enterprises. However, they do not provide Simulation Process and Data Management (SPDM) technologies. For SPDM integration of VMAP data, Ontology-Based Data Access is used to allow continuing the digital thread in custom semantic-based open-source solutions. An ontology of the database format (VMAP) was generated alongside an expandable knowledge graph of data access methods. A Python-based software architecture was developed, automatically using the semantic representations of database format and data access to query data and metadata within the VMAP file. The result is a software architecture template that can be adapted for other data standards and integrated into semantic data management systems. It allows semantic queries on simulation data down to element-wise resolution without integrating the whole model information. The architecture can instantiate a file in a knowledge graph, query a file’s metadatum and, in case it is not yet available, find a semantically represented process that allows the creation and instantiation of the required metadatum. See Figure 1. The results of this thesis can be expected to form a basis for semantic SPDM tools.
Machine learning-based solutions are frequently adapted in several applications that require big data in operations. The performance of a model that is deployed into operations is subject to degradation due to unanticipated changes in the flow of input data. Hence, monitoring data drift becomes essential to maintain the model’s desired performance. Based on the conducted review of the literature on drift detection, statistical hypothesis testing enables to investigate whether incoming data is drifting from training data. Because Maximum Mean Discrepancy (MMD) and Kolmogorov-Smirnov (KS) have shown to be reliable distance measures between multivariate distributions in the literature review, both were selected from several existing techniques for experimentation. For the scope of this work, the image classification use case was experimented with using the Stream-51 dataset. Based on the results from different drift experiments, both MMD and KS showed high Area Under Curve values. However, KS exhibited faster performance than MMD with fewer false positives. Furthermore, the results showed that using the pre-trained ResNet-18 for feature extraction maintained the high performance of the experimented drift detectors. Furthermore, the results showed that the performance of the drift detectors highly depends on the sample sizes of the reference (training) data and the test data that flow into the pipeline’s monitor. Finally, the results also showed that if the test data is a mixture of drifting and non-drifting data, the performance of the drift detectors does not depend on how the drifting data are scattered with the non-drifting ones, but rather their amount in the test set
In the field of autonomous robotics, sensors have played a major role in defining the scope of technology and to a great extent, limitations of it as well. This cycle of constant updates and hence technological advancement has made given birth to some serious industries which were once inconceivable. Industries like autonomous driving which has a serious impact on safety and security of people, also has an equally harsh implication on the dynamics and economics of the market. With sensors like LiDAR and RADAR delivering 3D measurements as point clouds, there is a necessity to process the raw measurements directly and many research groups are working on the same. A sizable research has gone in solving the task of object detection on 2D images. In this thesis we aim to develop a LiDAR based 3D object detection scheme. We combine the ideas of PointPillars and feature pyramid networks from 2D vision to propose Pillar-FPN. The proposed method directly takes 3D point clouds as input and outputs a 3D bounding box. Our pipeline consists of multiple variations of proposed Pillar-FPN at the feature fusion level that are described in the results section. We have trained our model on the KITTI train dataset and evaluated it on KITTI validation dataset.
This project focuses on object detection in dense volume data. There are several types of dense volume data, namely Computed Tomography (CT) scan, Positron Emission Tomography (PET), Magnetic Resonance Imaging (MRI). This work focuses on CT scans. CT scans are not limited to the medical domain; they are also used in industries. CT scans are used in airport baggage screening, assembly lines, and the object detection systems in these places should be able to detect objects fast. One of the ways to address the issue of computational complexity and make the object detection systems fast is to use low-resolution images. Low-resolution CT scanning is fast. The entire process of scanning and detection can be made faster by using low-resolution images. Even in the medical domain, to reduce the rad iation dose, the exposure time of the patient should be reduced. The exposure time of patients could be reduced by allowing low-resolution CT scans. Hence it is essential to find out which object detection model has better accuracy as well as speed at low-resolution CT scans. However, the existing approaches did not provide details about how the model would perform when the resolution of CT scans is varied. Hence in this project, the goal is to analyze the impact of varying resolution of CT scans on both the speed and accuracy of the model. Three object detection models, namely RetinaNet, YOLOv3, and YOLOv5, were trained at various resolutions. Among the three models, it was found that YOLOv5 has the best mAP and f1 score at multiple resolutions on the DeepLesion dataset. RetinaNet model h as the least inference time on the DeepLesion dataset. From the experiments, it could be asserted that sacrificing mean average precision (mAP) to improve inference time by reducing resolution is feasible.
In (dynamic) adaptive mesh refinement (AMR) an input mesh is refined or coarsened to the need of the numerical application. This refinement happens with no respect to the originally meshed domain and is therefore limited to the geometrical accuracy of the original input mesh. We presented a novel approach to equip this input mesh with additional geometry information, to allow refinement and high-order cells based on the geometry of the original domain. We already showed a limited implementation of this algorithm. Now we evaluate this prototype with a numerical application and we prove its influence on the accuracy of certain numerical results. To be as practical as possible, we implement the ability to import meshes generated by Gmsh and equip them with the needed geometry information. Furthermore, we improve the mapping algorithm, which maps the geometry information of the boundary of a cell into the cell's volume. With these preliminary steps done, we use out new approach in a simulation of the advection of a concentration along the boundary of a sphere shell and past the boundary of a rotating cylinder. We evaluate the accuracy of our approach in comparison to the conventional refinement of cells to answer our research question: How does the performance and accuracy of the hexahedral curved domain AMR algorithm compare to linear AMR when solving the advection equation with the linear finite volume method? To answer this question, we show the influence of curved AMR on our simulation results and see, that it is even able to outperform far finer linear meshes in terms of accuracy. We also see that the current implementation of this approach is too slow for practical usage. We can therefore prove the benefits of curved AMR in certain, geometry-related application scenarios and show possible improvements to make it more feasible and practical in the future.
As cameras are ubiquitous in autonomous systems, object detection is a crucial task. Object detectors are widely used in applications such as autonomous driving, healthcare, and robotics. Given an image, an object detector outputs both the bounding box coordinates as well as classification probabilities for each object detected. The state-of-the-art detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications in particular. It is therefore crucial to explain the reason behind each detector decision in order to gain user trust, enhance detector performance, and analyze their failure.
Previous work fails to explain as well as evaluate both bounding box and classification decisions individually for various detectors. Moreover, no tools explain each detector decision, evaluate the explanations, and also identify the reasons for detector failures. This restricts the flexibility to analyze detectors. The main contribution presented here is an open-source Detector Explanation Toolkit (DExT). It is used to explain the detector decisions, evaluate the explanations, and analyze detector errors. The detector decisions are explained visually by highlighting the image pixels that most influence a particular decision. The toolkit implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. To the author’s knowledge, this is the first work to conduct extensive qualitative and novel quantitative evaluations of different explanation methods across various detectors. The qualitative evaluation incorporates a visual analysis of the explanations carried out by the author as well as a human-centric evaluation. The human-centric evaluation includes a user study to understand user trust in the explanations generated across various explanation methods for different detectors. Four multi-object visualization methods are provided to merge the explanations of multiple objects detected in an image as well as the corresponding detector outputs in a single image. Finally, DExT implements the procedure to analyze detector failures using the formulated approach.
The visual analysis illustrates that the ability to explain a model is more dependent on the model itself than the actual ability of the explanation method. In addition, the explanations are affected by the object explained, the decision explained, detector architecture, training data labels, and model parameters. The results of the quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. In addition, a single explanation method cannot generate more faithful explanations than other methods for both the bounding box and the classification decision across different detectors. Both the quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. Finally, a convex polygon-based multi-object visualization method provides more human-understandable visualization than other methods.
The author expects that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.