Technical Report / Hochschule Bonn-Rhein-Sieg University of Applied Sciences. Department of Computer Science
Publisher: Dean Prof. Dr. Sascha Alda
Hochschule Bonn-Rhein-Sieg University of Applied Sciences, Department of Computer Science
Sankt Augustin, Germany
ISSN 1869-5272
Hochschule Bonn-Rhein-Sieg University of Applied Sciences, Department of Computer Science
Sankt Augustin, Germany
ISSN 1869-5272
Refine
H-BRS Bibliography
- yes (5)
Departments, institutes and facilities
Document Type
- Report (3)
- Master's Thesis (2)
Year of publication
- 2020 (5) (remove)
Language
- English (5)
Has Fulltext
- yes (5)
Keywords
- Apprenticeship Learning (1)
- Black-Box Optimization (1)
- Calibration (1)
- Computer Vision (1)
- Deep Learning (1)
- Domestic Robots (1)
- Human-Centered Robotics (1)
- Hyper-parameter Tuning (1)
- Image Classification (1)
- Learning and Adaptive Systems (1)
05-2020
The ability to finely segment different instances of various objects in an environment forms a critical tool in the perception tool-box of any autonomous agent. Traditionally instance segmentation is treated as a multi-label pixel-wise classification problem. This formulation has resulted in networks that are capable of producing high-quality instance masks but are extremely slow for real-world usage, especially on platforms with limited computational capabilities. This thesis investigates an alternate regression-based formulation of instance segmentation to achieve a good trade-off between mask precision and run-time. Particularly the instance masks are parameterized and a CNN is trained to regress to these parameters, analogous to bounding box regression performed by an object detection network.
In this investigation, the instance segmentation masks in the Cityscape dataset are approximated using irregular octagons and an existing object detector network (i.e., SqueezeDet) is modified to regresses to the parameters of these octagonal approximations. The resulting network is referred to as SqueezeDetOcta. At the image boundaries, object instances are only partially visible. Due to the convolutional nature of most object detection networks, special handling of the boundary adhering object instances is warranted. However, the current object detection techniques seem to be unaffected by this and handle all the object instances alike. To this end, this work proposes selectively learning only partial, untainted parameters of the bounding box approximation of the boundary adhering object instances. Anchor-based object detection networks like SqueezeDet and YOLOv2 have a discrepancy between the ground-truth encoding/decoding scheme and the coordinate space used for clustering, to generate the prior anchor shapes. To resolve this disagreement, this work proposes clustering in a space defined by two coordinate axes representing the natural log transformations of the width and height of the ground-truth bounding boxes.
When both SqueezeDet and SqueezeDetOcta were trained from scratch, SqueezeDetOcta lagged behind the SqueezeDet network by a massive ≈ 6.19 mAP. Further analysis revealed that the sparsity of the annotated data was the reason for this lackluster performance of the SqueezeDetOcta network. To mitigate this issue transfer-learning was used to fine-tune the SqueezeDetOcta network starting from the trained weights of the SqueezeDet network. When all the layers of the SqueezeDetOcta were fine-tuned, it outperformed the SqueezeDet network paired with logarithmically extracted anchors by ≈ 0.77 mAP. In addition to this, the forward pass latencies of both SqueezeDet and SqueezeDetOcta are close to ≈ 19ms. Boundary adhesion considerations, during training, resulted in an improvement of ≈ 2.62 mAP of the baseline SqueezeDet network. A SqueezeDet network paired with logarithmically extracted anchors improved the performance of the baseline SqueezeDet network by ≈ 1.85 mAP.
In summary, this work demonstrates that if given sufficient fine instance annotated data, an existing object detection network can be modified to predict much finer approximations (i.e., irregular octagons) of the instance annotations, whilst having the same forward pass latency as that of the bounding box predicting network. The results justify the merits of logarithmically extracted anchors to boost the performance of any anchor-based object detection network. The results also showed that the special handling of image boundary adhering object instances produces more performant object detectors.
04-2020
A Comparative Study of Uncertainty Estimation Methods in Deep Learning Based Classification Models
(2020)
Deep learning models produce overconfident predictions even for misclassified data. This work aims to improve the safety guarantees of software-intensive systems that use deep learning based classification models for decision making by performing comparative evaluation of different uncertainty estimation methods to identify possible misclassifications.
In this work, uncertainty estimation methods applicable to deep learning models are reviewed and those which can be seamlessly integrated to existing deployed deep learning architectures are selected for evaluation. The different uncertainty estimation methods, deep ensembles, test-time data augmentation and Monte Carlo dropout with its variants, are empirically evaluated on two standard datasets (CIFAR-10 and CIFAR-100) and two custom classification datasets (optical inspection and RoboCup@Work dataset). A relative ranking between the methods is provided by evaluating the deep learning classifiers on various aspects such as uncertainty quality, classifier performance and calibration. Standard metrics like entropy, cross-entropy, mutual information, and variance, combined with a rank histogram based method to identify uncertain predictions by thresholding on these metrics, are used to evaluate uncertainty quality.
The results indicate that Monte Carlo dropout combined with test-time data augmentation outperforms all other methods by identifying more than 95% of the misclassifications and representing uncertainty in the highest number of samples in the test set. It also yields a better classifier performance and calibration in terms of higher accuracy and lower Expected Calibration Error (ECE), respectively. A python based uncertainty estimation library for training and real-time uncertainty estimation of deep learning based classification models is also developed.
03-2020
Human and robot tasks in household environments include actions such as carrying an object, cleaning a surface, etc. These tasks are performed by means of dexterous manipulation, and for humans, they are straightforward to accomplish. Moreover, humans perform these actions with reasonable accuracy and precision but with much less energy and stress on the actuators (muscles) than the robots do. The high agility in controlling their forces and motions is actually due to "laziness", i.e. humans exploit the existing natural forces and constraints to execute the tasks.
The above-mentioned properties of the human lazy strategy motivate us to relax the problem of controlling robot motions and forces, and solve it with the help of the environment. Therefore, in this work, we developed a lazy control strategy, i.e. task specification models and control architectures that relax several aspects of robot control by exploiting prior knowledge about the task and environment. The developed control strategy is realized in four different robotics use cases. In this work, the Popov-Vereshchagin hybrid dynamics solver is used as one of the building blocks in the proposed control architectures. An extension of the solver’s interface with the artificial Cartesian force and feed-forward joint torque task-drivers is proposed in this thesis.
To validate the proposed lazy control approach, an experimental evaluation was performed in a simulation environment and on a real robot platform.
02-2020
Object detectors have improved considerably in the last years by using advanced Convolutional Neural Networks (CNNs) architectures. However, many detector hyper-parameters are not generally tuned, and they are used with values set by the detector authors. Blackbox optimization methods have gained more attention in recent years because of its ability to optimize the hyper-parameters of various machine learning algorithms and deep learning models. However, these methods are not explored in improving CNN-based object detector's hyper-parameters. In this research work, we propose the use of blackbox optimization methods such as Gaussian Process based Bayesian Optimization (BOGP), Sequential Model-based Algorithm Configuration (SMAC), and Covariance Matrix Adaptation Evolution Strategy (CMA-ES) to tune the hyper-parameters in Faster R-CNN and Single Shot MultiBox Detector (SSD). In Faster R-CNN, tuning the input image size, prior box anchor scales and ratios using BOGP, SMAC, and CMA-ES has increased the performance around 1.5% in terms of Mean Average Precision (mAP) on PASCAL VOC. Tuning the anchor scales of SSD has increased the mAP by 3% on PASCAL VOC and marine debris datasets. On the COCO dataset with SSD, mAP improvement is observed in the medium and large objects, but mAP decreases by 1% in small objects. The experimental results show that the blackbox optimization methods have proved to increase the mAP performance by optimizing the object detectors. Moreover, it has achieved better results than the hand-tuned configurations in most of the cases.
01-2020
An essential measure of autonomy in service robots designed to assist humans is adaptivity to the various contexts of human-oriented tasks. These robots may have to frequently execute the same action, but subject to subtle variations in task parameters that determine optimal behaviour. Such actions are traditionally executed by robots using pre-determined, generic motions, but a better approach could utilize robot arm maneuverability to learn and execute different trajectories that work best in each context.
In this project, we explore a robot skill acquisition procedure that allows incorporating contextual knowledge, adjusting executions according to context, and improvement through experience, as a step towards more adaptive service robots. We propose an apprenticeship learning approach to achieving context-aware action generalisation on the task of robot-to-human object hand-over. The procedure combines learning from demonstration, with which a robot learns to imitate a demonstrator’s execution of the task, and a reinforcement learning strategy, which enables subsequent experiential learning of contextualized policies, guided by information about context that is integrated into the learning process. By extending the initial, static hand-over policy to a contextually adaptive one, the robot derives and executes variants of the demonstrated action that most appropriately suit the current context. We use dynamic movement primitives (DMPs) as compact motion representations, and a model-based Contextual Relative Entropy Policy Search (C-REPS) algorithm for learning policies that can specify hand-over position, trajectory shape, and execution speed, conditioned on context variables. Policies are learned using simulated task executions, before transferring them to the robot and evaluating emergent behaviours.
We demonstrate the algorithm’s ability to learn context-dependent hand-over positions, and new trajectories, guided by suitable reward functions, and show that the current DMP implementation limits learning context-dependent execution speeds. We additionally conduct a user study involving participants assuming different postures and receiving an object from the robot, which executes hand-overs by either exclusively imitating a demonstrated motion, or selecting hand-over positions based on learned contextual policies and adapting its motion accordingly. The results confirm the hypothesized improvements in the robot’s perceived behaviour when it is context-aware and adaptive, and provide useful insights that can inform future developments.