Refine
H-BRS Bibliography
- yes (50)
Departments, institutes and facilities
- Fachbereich Informatik (50) (remove)
Document Type
- Preprint (50) (remove)
Year of publication
Language
- English (50)
Keywords
Force field (FF) based molecular modeling is an often used method to investigate and study structural and dynamic properties of (bio-)chemical substances and systems. When such a system is modeled or refined, the force field parameters need to be adjusted. This force field parameter optimization can be a tedious task and is always a trade-off in terms of errors regarding the targeted properties. To better control the balance of various properties’ errors, in this study we introduce weighting factors for the optimization objectives. Different weighting strategies are compared to fine-tune the balance between bulk-phase density and relative conformational energies (RCE), using n-octane as a representative system. Additionally, a non-linear projection of the individual property-specific parts of the optimized loss function is deployed to further improve the balance between them. The results show that the overall error is reduced. One interesting outcome is a large variety in the resulting optimized force field parameters (FFParams) and corresponding errors, suggesting that the optimization landscape is multi-modal and very dependent on the weighting factor setup. We conclude that adjusting the weighting factors can be a very important feature to lower the overall error in the FF optimization procedure, giving researchers the possibility to fine-tune their FFs.
In vision tasks, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, convolution requires multiple stacked layers and a hierarchical structure for large context. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to the non-causal two-dimensional image space. We scale the Hyena convolution kernels beyond the feature map size up to 191$\times$191 to maximize the ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 83.0% and 83.5%, respectively, while outperforming other large-kernel networks. Combining HyenaPixel with attention further increases accuracy to 83.6%. We attribute the success of attention to the lack of spatial bias in later stages and support this finding with bidirectional Hyena.
This paper presents the b-it-bots RoboCup@Work team and its current hardware and functional architecture for the KUKA youBot robot. We describe the underlying software framework and the developed capabilities required for operating in industrial environments including features such as reliable and precise navigation, flexible manipulation, robust object recognition and task planning. New developments include an approach to grasp vertical objects, placement of objects by considering the empty space on a workstation, and the process of porting our code to ROS2.
This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.
Loading of shipping containers for dairy products often includes a press-fit task, which involves manually stacking milk cartons in a container without using pallets or packaging. Automating this task with a mobile manipulator can reduce worker strain, and also enhance the efficiency and safety of the container loading process. This paper proposes an approach called Adaptive Compliant Control with Integrated Failure Recovery (ACCIFR), which enables a mobile manipulator to reliably perform the press-fit task. We base the approach on a demonstration learning-based compliant control framework, such that we integrate a monitoring and failure recovery mechanism for successful task execution. Concretely, we monitor the execution through distance and force feedback, detect collisions while the robot is performing the press-fit task, and use wrench measurements to classify the direction of collision; this information informs the subsequent recovery process. We evaluate the method on a miniature container setup, considering variations in the (i) starting position of the end effector, (ii) goal configuration, and (iii) object grasping position. The results demonstrate that the proposed approach outperforms the baseline demonstration-based learning framework regarding adaptability to environmental variations and the ability to recover from collision failures, making it a promising solution for practical press-fit applications.
Saliency methods are frequently used to explain Deep Neural Network-based models. Adebayo et al.'s work on evaluating saliency methods for classification models illustrate certain explanation methods fail the model and data randomization tests. However, on extending the tests for various state of the art object detectors we illustrate that the ability to explain a model is more dependent on the model itself than the explanation method. We perform sanity checks for object detection and define new qualitative criteria to evaluate the saliency explanations, both for object classification and bounding box decisions, using Guided Backpropagation, Integrated Gradients, and their Smoothgrad versions, together with Faster R-CNN, SSD, and EfficientDet-D0, trained on COCO. In addition, the sensitivity of the explanation method to model parameters and data labels varies class-wise motivating to perform the sanity checks for each class. We find that EfficientDet-D0 is the most interpretable method independent of the saliency method, which passes the sanity checks with little problems.
Robots applied in therapeutic scenarios, for instance in the therapy of individuals with Autism Spectrum Disorder, are sometimes used for imitation learning activities in which a person needs to repeat motions by the robot. To simplify the task of incorporating new types of motions that a robot can perform, it is desirable that the robot has the ability to learn motions by observing demonstrations from a human, such as a therapist. In this paper, we investigate an approach for acquiring motions from skeleton observations of a human, which are collected by a robot-centric RGB-D camera. Given a sequence of observations of various joints, the joint positions are mapped to match the configuration of a robot before being executed by a PID position controller. We evaluate the method, in particular the reproduction error, by performing a study with QTrobot in which the robot acquired different upper-body dance moves from multiple participants. The results indicate the method's overall feasibility, but also indicate that the reproduction quality is affected by noise in the skeleton observations.
The representation, or encoding, utilized in evolutionary algorithms has a substantial effect on their performance. Examination of the suitability of widely used representations for quality diversity optimization (QD) in robotic domains has yielded inconsistent results regarding the most appropriate encoding method. Given the domain-dependent nature of QD, additional evidence from other domains is necessary. This study compares the impact of several representations, including direct encoding, a dictionary-based representation, parametric encoding, compositional pattern producing networks, and cellular automata, on the generation of voxelized meshes in an architecture setting. The results reveal that some indirect encodings outperform direct encodings and can generate more diverse solution sets, especially when considering full phenotypic diversity. The paper introduces a multi-encoding QD approach that incorporates all evaluated representations in the same archive. Species of encodings compete on the basis of phenotypic features, leading to an approach that demonstrates similar performance to the best single-encoding QD approach. This is noteworthy, as it does not always require the contribution of the best-performing single encoding.
Quality diversity algorithms can be used to efficiently create a diverse set of solutions to inform engineers' intuition. But quality diversity is not efficient in very expensive problems, needing 100.000s of evaluations. Even with the assistance of surrogate models, quality diversity needs 100s or even 1000s of evaluations, which can make it use infeasible. In this study we try to tackle this problem by using a pre-optimization strategy on a lower-dimensional optimization problem and then map the solutions to a higher-dimensional case. For a use case to design buildings that minimize wind nuisance, we show that we can predict flow features around 3D buildings from 2D flow features around building footprints. For a diverse set of building designs, by sampling the space of 2D footprints with a quality diversity algorithm, a predictive model can be trained that is more accurate than when trained on a set of footprints that were selected with a space-filling algorithm like the Sobol sequence. Simulating only 16 buildings in 3D, a set of 1024 building designs with low predicted wind nuisance is created. We show that we can produce better machine learning models by producing training data with quality diversity instead of using common sampling techniques. The method can bootstrap generative design in a computationally expensive 3D domain and allow engineers to sweep the design space, understanding wind nuisance in early design phases.
Representing 3D surfaces as level sets of continuous functions over R3 is the common denominator of neural implicit representations, which recently enabled remarkable progress in geometric deep learning and computer vision tasks. In order to represent 3D motion within this framework, it is often assumed (either explicitly or implicitly) that the transformations which a surface may undergo are homeomorphic: this is not necessarily true, for instance, in the case of fluid dynamics. In order to represent more general classes of deformations, we propose to apply this theoretical framework as regularizers for the optimization of simple 4D implicit functions (such as signed distance fields). We show that our representation is capable of capturing both homeomorphic and topology-changing deformations, while also defining correspondences over the continuously-reconstructed surfaces.
State-of-the-art object detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications. Previous work fails to produce explanations for both bounding box and classification decisions, and generally make individual explanations for various detectors. In this paper, we propose an open-source Detector Explanation Toolkit (DExT) which implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. We suggests various multi-object visualization methods to merge the explanations of multiple objects detected in an image as well as the corresponding detections in a single image. The quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. Both quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. We expect that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.
21 pages, with supplementary
Vietnam requires a sustainable urbanization, for which city sensing is used in planning and de-cision-making. Large cities need portable, scalable, and inexpensive digital technology for this purpose. End-to-end air quality monitoring companies such as AirVisual and Plume Air have shown their reliability with portable devices outfitted with superior air sensors. They are pricey, yet homeowners use them to get local air data without evaluating the causal effect. Our air quality inspection system is scalable, reasonably priced, and flexible. Minicomputer of the sys-tem remotely monitors PMS7003 and BME280 sensor data through a microcontroller processor. The 5-megapixel camera module enables researchers to infer the causal relationship between traffic intensity and dust concentration. The design enables inexpensive, commercial-grade hardware, with Azure Blob storing air pollution data and surrounding-area imagery and pre-venting the system from physically expanding. In addition, by including an air channel that re-plenishes and distributes temperature, the design improves ventilation and safeguards electrical components. The gadget allows for the analysis of the correlation between traffic and air quali-ty data, which might aid in the establishment of sustainable urban development plans and poli-cies.
Fatigue strength estimation is a costly manual material characterization process in which state-of-the-art approaches follow a standardized experiment and analysis procedure. In this paper, we examine a modular, Machine Learning-based approach for fatigue strength estimation that is likely to reduce the number of experiments and, thus, the overall experimental costs. Despite its high potential, deployment of a new approach in a real-life lab requires more than the theoretical definition and simulation. Therefore, we study the robustness of the approach against misspecification of the prior and discretization of the specified loads. We identify its applicability and its advantageous behavior over the state-of-the-art methods, potentially reducing the number of costly experiments.
Safety-critical applications like autonomous driving use Deep Neural Networks (DNNs) for object detection and segmentation. The DNNs fail to predict when they observe an Out-of-Distribution (OOD) input leading to catastrophic consequences. Existing OOD detection methods were extensively studied for image inputs but have not been explored much for LiDAR inputs. So in this study, we proposed two datasets for benchmarking OOD detection in 3D semantic segmentation. We used Maximum Softmax Probability and Entropy scores generated using Deep Ensembles and Flipout versions of RandLA-Net as OOD scores. We observed that Deep Ensembles out perform Flipout model in OOD detection with greater AUROC scores for both datasets.
In robot-assisted therapy for individuals with Autism Spectrum Disorder, the workload of therapists during a therapeutic session is increased if they have to control the robot manually. To allow therapists to focus on the interaction with the person instead, the robot should be more autonomous, namely it should be able to interpret the person's state and continuously adapt its actions according to their behaviour. In this paper, we develop a personalised robot behaviour model that can be used in the robot decision-making process during an activity; this behaviour model is trained with the help of a user model that has been learned from real interaction data. We use Q-learning for this task, such that the results demonstrate that the policy requires about 10,000 iterations to converge. We thus investigate policy transfer for improving the convergence speed; we show that this is a feasible solution, but an inappropriate initial policy can lead to a suboptimal final return.
We introduce canonical weight normalization for convolutional neural networks. Inspired by the canonical tensor decomposition, we express the weight tensors in so-called canonical networks as scaled sums of outer vector products. In particular, we train network weights in the decomposed form, where scale weights are optimized separately for each mode. Additionally, similarly to weight normalization, we include a global scaling parameter. We study the initialization of the canonical form by running the power method and by drawing randomly from Gaussian or uniform distributions. Our results indicate that we can replace the power method with cheaper initializations drawn from standard distributions. The canonical re-parametrization leads to competitive normalization performance on the MNIST, CIFAR10, and SVHN data sets. Moreover, the formulation simplifies network compression. Once training has converged, the canonical form allows convenient model-compression by truncating the parameter sums.
TSEM: Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series
(2022)
Deep learning has become a one-size-fits-all solution for technical and business domains thanks to its flexibility and adaptability. It is implemented using opaque models, which unfortunately undermines the outcome trustworthiness. In order to have a better understanding of the behavior of a system, particularly one driven by time series, a look inside a deep learning model so-called posthoc eXplainable Artificial Intelligence (XAI) approaches, is important. There are two major types of XAI for time series data, namely model-agnostic and model-specific. Model-specific approach is considered in this work. While other approaches employ either Class Activation Mapping (CAM) or Attention Mechanism, we merge the two strategies into a single system, simply called the Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series (TSEM). TSEM combines the capabilities of RNN and CNN models in such a way that RNN hidden units are employed as attention weights for the CNN feature maps temporal axis. The result shows that TSEM outperforms XCM. It is similar to STAM in terms of accuracy, while also satisfying a number of interpretability criteria, including causality, fidelity, and spatiotemporality.
Self-supervised learning has proved to be a powerful approach to learn image representations without the need of large labeled datasets. For underwater robotics, it is of great interest to design computer vision algorithms to improve perception capabilities such as sonar image classification. Due to the confidential nature of sonar imaging and the difficulty to interpret sonar images, it is challenging to create public large labeled sonar datasets to train supervised learning algorithms. In this work, we investigate the potential of three self-supervised learning methods (RotNet, Denoising Autoencoders, and Jigsaw) to learn high-quality sonar image representation without the need of human labels. We present pre-training and transfer learning results on real-life sonar image datasets. Our results indicate that self-supervised pre-training yields classification performance comparable to supervised pre-training in a few-shot transfer learning setup across all three methods. Code and self-supervised pre-trained models are be available at https://github.com/agrija9/ssl-sonar-images
It has been well proved that deep networks are efficient at extracting features from a given (source) labeled dataset. However, it is not always the case that they can generalize well to other (target) datasets which very often have a different underlying distribution. In this report, we evaluate four different domain adaptation techniques for image classification tasks: DeepCORAL, DeepDomainConfusion, CDAN and CDAN+E. These techniques are unsupervised given that the target dataset dopes not carry any labels during training phase. We evaluate model performance on the office-31 dataset. A link to the github repository of this report can be found here: https://github.com/agrija9/Deep-Unsupervised-Domain-Adaptation.
Recent experimental evidence suggest that mebendazole, a popular antiparasitic drug, binds to heat shock protein 90 (Hsp90) and inhibit acute myeloid leukemia cell growth. In this study we use quantum mechanics (QM), molecular similarity and molecular dynamics (MD) calculations to predict possible binding poses of mebendazole to the adenosine triphosphate (ATP) binding site of Hsp90. Extensive conformational searches and minimization of the five tautomers of mebendazole using MP2/aug-cc-pVTZ theory level resulting in 152 minima being identified. Mebendazole-Hsp90 complex models were created using the QM optimized conformations and protein coordinates obtained from experimental crystal structures that were chosen through similarity calculations. Nine different poses were identified from a total of 600 ns of explicit solvent, all-atom MD simulations using two different force fields. All simulations support the hypothesis that mebendazole is able to bind to the ATP binding site of Hsp90.
Urban LoRa networks promise to provide a cost-efficient and scalable communication backbone for smart cities. One core challenge in rolling out and operating these networks is radio network planning, i.e., precise predictions about possible new locations and their impact on network coverage. Path loss models aid in this task, but evaluating and comparing different models requires a sufficiently large set of high-quality received packet power samples. In this paper, we report on a corresponding large-scale measurement study covering an urban area of 200km2 over a period of 230 days using sensors deployed on garbage trucks, resulting in more than 112 thousand high-quality samples for received packet power. Using this data, we compare eleven previously proposed path loss models and additionally provide new coefficients for the Log-distance model. Our results reveal that the Log-distance model and other well-known empirical models such as Okumura or Winner+ provide reasonable estimations in an urban environment, and terrain based models such as ITM or ITWOM have no advantages. In addition, we derive estimations for the needed sample size in similar measurement campaigns. To stimulate further research in this direction, we make all our data publicly available.
The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models (KGEMs). However, representations based on a single modality are inherently limited. To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs. This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler (INDRA) consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against two baseline models trained on either one of the modalities (i.e., text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.083. Additionally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. Finally, the source code and pre-trained STonKGs models are available at https://github.com/stonkgs/stonkgs and https://huggingface.co/stonkgs/stonkgs-150k.
Application of underwater robots are on the rise, most of them are dependent on sonar for underwater vision, but the lack of strong perception capabilities limits them in this task. An important issue in sonar perception is matching image patches, which can enable other techniques like localization, change detection, and mapping. There is a rich literature for this problem in color images, but for acoustic images, it is lacking, due to the physics that produce these images. In this paper we improve on our previous results for this problem (Valdenegro-Toro et al, 2017), instead of modeling features manually, a Convolutional Neural Network (CNN) learns a similarity function and predicts if two input sonar images are similar or not. With the objective of improving the sonar image matching problem further, three state of the art CNN architectures are evaluated on the Marine Debris dataset, namely DenseNet, and VGG, with a siamese or two-channel architecture, and contrastive loss. To ensure a fair evaluation of each network, thorough hyper-parameter optimization is executed. We find that the best performing models are DenseNet Two-Channel network with 0.955 AUC, VGG-Siamese with contrastive loss at 0.949 AUC and DenseNet Siamese with 0.921 AUC. By ensembling the top performing DenseNet two-channel and DenseNet-Siamese models overall highest prediction accuracy obtained is 0.978 AUC, showing a large improvement over the 0.91 AUC in the state of the art.
The Covid-19 pandemic has challenged educators across the world to move their teaching and mentoring from in-person to remote. During nonpandemic semesters at their institutes (e.g. universities), educators can directly provide students the software environment needed to support their learning - either in specialized computer laboratories (e.g. computational chemistry labs) or shared computer spaces. These labs are often supported by staff that maintains the operating systems (OS) and software. But how does one provide a specialized software environment for remote teaching? One solution is to provide students a customized operating system (e.g., Linux) that includes open-source software for supporting your teaching goals. However, such a solution should not require students to install the OS alongside their existing one (i.e. dual/multi-booting) or be used as a complete replacement. Such approaches are risky because of a) the students' possible lack of software expertise, b) the possible disruption of an existing software workflow that is needed in other classes or by other family members, and c) the importance of maintaining a working computer when isolated (e.g. societal restrictions). To illustrate possible solutions, we discuss our approach that used a customized Linux OS and a Docker container in a course that teaches computational chemistry and Python3.
Risk-based authentication (RBA) aims to strengthen password-based authentication rather than replacing it. RBA does this by monitoring and recording additional features during the login process. If feature values at login time differ significantly from those observed before, RBA requests an additional proof of identification. Although RBA is recommended in the NIST digital identity guidelines, it has so far been used almost exclusively by major online services. This is partly due to a lack of open knowledge and implementations that would allow any service provider to roll out RBA protection to its users.
To close this gap, we provide a first in-depth analysis of RBA characteristics in a practical deployment. We observed N=780 users with 247 unique features on a real-world online service for over 1.8 years. Based on our collected data set, we provide (i) a behavior analysis of two RBA implementations that were apparently used by major online services in the wild, (ii) a benchmark of the features to extract a subset that is most suitable for RBA use, (iii) a new feature that has not been used in RBA before, and (iv) factors which have a significant effect on RBA performance. Our results show that RBA needs to be carefully tailored to each online service, as even small configuration adjustments can greatly impact RBA's security and usability properties. We provide insights on the selection of features, their weightings, and the risk classification in order to benefit from RBA after a minimum number of login attempts.
Object detectors have improved considerably in the last years by using advanced CNN architectures. However, many detector hyper-parameters are generally manually tuned, or they are used with values set by the detector authors. Automatic Hyper-parameter optimization has not been explored in improving CNN-based object detectors hyper-parameters. In this work, we propose the use of Black-box optimization methods to tune the prior/default box scales in Faster R-CNN and SSD, using Bayesian Optimization, SMAC, and CMA-ES. We show that by tuning the input image size and prior box anchor scale on Faster R-CNN mAP increases by 2% on PASCAL VOC 2007, and by 3% with SSD. On the COCO dataset with SSD there are mAP improvement in the medium and large objects, but mAP decreases by 1% in small objects. We also perform a regression analysis to find the significant hyper-parameters to tune.
In this paper we introduce the Perception for Autonomous Systems (PAZ) software library. PAZ is a hierarchical perception library that allow users to manipulate multiple levels of abstraction in accordance to their requirements or skill level. More specifically, PAZ is divided into three hierarchical levels which we refer to as pipelines, processors, and backends. These abstractions allows users to compose functions in a hierarchical modular scheme that can be applied for preprocessing, data-augmentation, prediction and postprocessing of inputs and outputs of machine learning (ML) models. PAZ uses these abstractions to build reusable training and prediction pipelines for multiple robot perception tasks such as: 2D keypoint estimation, 2D object detection, 3D keypoint discovery, 6D pose estimation, emotion classification, face recognition, instance segmentation, and attention mechanisms.
Reinforcement learning (RL) algorithms should learn as much as possible about the environment but not the properties of the physics engines that generate the environment. There are multiple algorithms that solve the task in a physics engine based environment but there is no work done so far to understand if the RL algorithms can generalize across physics engines. In this work, we compare the generalization performance of various deep reinforcement learning algorithms on a variety of control tasks. Our results show that MuJoCo is the best engine to transfer the learning to other engines. On the other hand, none of the algorithms generalize when trained on PyBullet. We also found out that various algorithms have a promising generalizability if the effect of random seeds can be minimized on their performance.
Comparative Evaluation of Pretrained Transfer Learning Models on Automatic Short Answer Grading
(2020)
Automatic Short Answer Grading (ASAG) is the process of grading the student answers by computational approaches given a question and the desired answer. Previous works implemented the methods of concept mapping, facet mapping, and some used the conventional word embeddings for extracting semantic features. They extracted multiple features manually to train on the corresponding datasets. We use pretrained embeddings of the transfer learning models, ELMo, BERT, GPT, and GPT-2 to assess their efficiency on this task. We train with a single feature, cosine similarity, extracted from the embeddings of these models. We compare the RMSE scores and correlation measurements of the four models with previous works on Mohler dataset. Our work demonstrates that ELMo outperformed the other three models. We also, briefly describe the four transfer learning models and conclude with the possible causes of poor results of transfer learning models.
Graph drawing with spring embedders employs a V x V computation phase over the graph's vertex set to compute repulsive forces. Here, the efficacy of forces diminishes with distance: a vertex can effectively only influence other vertices in a certain radius around its position. Therefore, the algorithm lends itself to an implementation using search data structures to reduce the runtime complexity. NVIDIA RT cores implement hierarchical tree traversal in hardware. We show how to map the problem of finding graph layouts with force-directed methods to a ray tracing problem that can subsequently be implemented with dedicated ray tracing hardware. With that, we observe speedups of 4x to 13x over a CUDA software implementation.
Facial emotion recognition is the task to classify human emotions in face images. It is a difficult task due to high aleatoric uncertainty and visual ambiguity. A large part of the literature aims to show progress by increasing accuracy on this task, but this ignores the inherent uncertainty and ambiguity in the task. In this paper we show that Bayesian Neural Networks, as approximated using MC-Dropout, MC-DropConnect, or an Ensemble, are able to model the aleatoric uncertainty in facial emotion recognition, and produce output probabilities that are closer to what a human expects. We also show that calibration metrics show strange behaviors for this task, due to the multiple classes that can be considered correct, which motivates future work. We believe our work will motivate other researchers to move away from Classical and into Bayesian Neural Networks.
Deep learning models are extensively used in various safety critical applications. Hence these models along with being accurate need to be highly reliable. One way of achieving this is by quantifying uncertainty. Bayesian methods for UQ have been extensively studied for Deep Learning models applied on images but have been less explored for 3D modalities such as point clouds often used for Robots and Autonomous Systems. In this work, we evaluate three uncertainty quantification methods namely Deep Ensembles, MC-Dropout and MC-DropConnect on the DarkNet21Seg 3D semantic segmentation model and comprehensively analyze the impact of various parameters such as number of models in ensembles or forward passes, and drop probability values, on task performance and uncertainty estimate quality. We find that Deep Ensembles outperforms other methods in both performance and uncertainty metrics. Deep ensembles outperform other methods by a margin of 2.4% in terms of mIOU, 1.3% in terms of accuracy, while providing reliable uncertainty for decision making.
In complex, expensive optimization domains we often narrowly focus on finding high performing solutions, instead of expanding our understanding of the domain itself. But what if we could quickly understand the complex behaviors that can emerge in said domains instead? We introduce surrogate-assisted phenotypic niching, a quality diversity algorithm which allows to discover a large, diverse set of behaviors by using computationally expensive phenotypic features. In this work we discover the types of air flow in a 2D fluid dynamics optimization problem. A fast GPU-based fluid dynamics solver is used in conjunction with surrogate models to accurately predict fluid characteristics from the shapes that produce the air flow. We show that these features can be modeled in a data-driven way while sampling to improve performance, rather than explicitly sampling to improve feature models. Our method can reduce the need to run an infeasibly large set of simulations while still being able to design a large diversity of air flows and the shapes that cause them. Discovering diversity of behaviors helps engineers to better understand expensive domains and their solutions.
Grasp verification is advantageous for autonomous manipulation robots as they provide the feedback required for higher level planning components about successful task completion. However, a major obstacle in doing grasp verification is sensor selection. In this paper, we propose a vision based grasp verification system using machine vision cameras, with the verification problem formulated as an image classification task. Machine vision cameras consist of a camera and a processing unit capable of on-board deep learning inference. The inference in these low-power hardware are done near the data source, reducing the robot's dependence on a centralized server, leading to reduced latency, and improved reliability. Machine vision cameras provide the deep learning inference capabilities using different neural accelerators. Although, it is not clear from the documentation of these cameras what is the effect of these neural accelerators on performance metrics such as latency and throughput. To systematically benchmark these machine vision cameras, we propose a parameterized model generator that generates end to end models of Convolutional Neural Networks(CNN). Using these generated models we benchmark latency and throughput of two machine vision cameras, JeVois A33 and Sipeed Maix Bit. Our experiments demonstrate that the selected machine vision camera and the deep learning models can robustly verify grasp with 97% per frame accuracy.
In optimization methods that return diverse solution sets, three interpretations of diversity can be distinguished: multi-objective optimization which searches diversity in objective space, multimodal optimization which tries spreading out the solutions in genetic space, and quality diversity which performs diversity maintenance in phenotypic space. We introduce niching methods that provide more flexibility to the analysis of diversity and a simple domain to compare and provide insights about the paradigms. We show that multiobjective optimization does not always produce much diversity, quality diversity is not sensitive to genetic neutrality and creates the most diverse set of solutions, and multimodal optimization produces higher fitness solutions. An autoencoder is used to discover phenotypic features automatically, producing an even more diverse solution set. Finally, we make recommendations about when to use which approach.
The way solutions are represented, or encoded, is usually the result of domain knowledge and experience. In this work, we combine MAP-Elites with Variational Autoencoders to learn a Data-Driven Encoding (DDE) that captures the essence of the highest-performing solutions while still able to encode a wide array of solutions. Our approach learns this data-driven encoding during optimization by balancing between exploiting the DDE to generalize the knowledge contained in the current archive of elites and exploring new representations that are not yet captured by the DDE. Learning representation during optimization allows the algorithm to solve high-dimensional problems, and provides a low-dimensional representation which can be then be re-used. We evaluate the DDE approach by evolving solutions for inverse kinematics of a planar arm (200 joint angles) and for gaits of a 6-legged robot in action space (a sequence of 60 positions for each of the 12 joints). We show that the DDE approach not only accelerates and improves optimization, but produces a powerful encoding that captures a bias for high performance while expressing a variety of solutions.
Modern Monte-Carlo-based rendering systems still suffer from the computational complexity involved in the generation of noise-free images, making it challenging to synthesize interactive previews. We present a framework suited for rendering such previews of static scenes using a caching technique that builds upon a linkless octree. Our approach allows for memory-efficient storage and constant-time lookup to cache diffuse illumination at multiple hitpoints along the traced paths. Non-diffuse surfaces are dealt with in a hybrid way in order to reconstruct view-dependent illumination while maintaining interactive frame rates. By evaluating the visual fidelity against ground truth sequences and by benchmarking, we show that our approach compares well to low-noise path traced results, but with a greatly reduced computational complexity allowing for interactive frame rates. This way, our caching technique provides a useful tool for global illumination previews and multi-view rendering.
In an effort to assist researchers in choosing basis sets for quantum mechanical modeling of molecules (i.e. balancing calculation cost versus desired accuracy), we present a systematic study on the accuracy of computed conformational relative energies and their geometries in comparison to MP2/CBS and MP2/AV5Z data, respectively. In order to do so, we introduce a new nomenclature to unambiguously indicate how a CBS extrapolation was computed. Nineteen minima and transition states of buta-1,3-diene, propan-2-ol and the water dimer were optimized using forty-five different basis sets. Specifically, this includes one Pople (i.e. 6-31G(d)), eight Dunning (i.e. VXZ and AVXZ, X=2-5), twenty-five Jensen (i.e. pc-n, pcseg-n, aug-pcseg-n, pcSseg-n and aug-pcSseg-n, n=0-4) and nine Karlsruhe (e.g. def2-SV(P), def2-QZVPPD) basis sets. The molecules were chosen to represent both common and electronically diverse molecular systems. In comparison to MP2/CBS relative energies computed using the largest Jensen basis sets (i.e. n=2,3,4), the use of smaller sizes (n=0,1,2 and n=1,2,3) provides results that are within 0.11--0.24 and 0.09-0.16 kcal/mol. To practically guide researchers in their basis set choice, an equation is introduced that ranks basis sets based on a user-defined balance between their accuracy and calculation cost. Furthermore, we explain why the aug-pcseg-2, def2-TZVPPD and def2-TZVP basis sets are very suitable choices to balance speed and accuracy.
Traffic sign recognition is an important component of many advanced driving assistance systems, and it is required for full autonomous driving. Computational performance is usually the bottleneck in using large scale neural networks for this purpose. SqueezeNet is a good candidate for efficient image classification of traffic signs, but in our experiments it does not reach high accuracy, and we believe this is due to lack of data, requiring data augmentation. Generative adversarial networks can learn the high dimensional distribution of empirical data, allowing the generation of new data points. In this paper we apply pix2pix GANs architecture to generate new traffic sign images and evaluate the use of these images in data augmentation. We were motivated to use pix2pix to translate symbolic sign images to real ones due to the mode collapse in Conditional GANs. Through our experiments we found that data augmentation using GAN can increase classification accuracy for circular traffic signs from 92.1% to 94.0%, and for triangular traffic signs from 93.8% to 95.3%, producing an overall improvement of 2%. However some traditional augmentation techniques can outperform GAN data augmentation, for example contrast variation in circular traffic signs (95.5%) and displacement on triangular traffic signs (96.7 %). Our negative results shows that while GANs can be naively used for data augmentation, they are not always the best choice, depending on the problem and variability in the data.
Background: Virtual reality combined with spherical treadmills is used across species for studying neural circuits underlying navigation.
New Method: We developed an optical flow-based method for tracking treadmil ball motion in real-time using a single high-resolution camera.
Results: Tracking accuracy and timing were determined using calibration data. Ball tracking was performed at 500 Hz and integrated with an open source game engine for virtual reality projection. The projection was updated at 120 Hz with a latency with respect to ball motion of 30 ± 8 ms.
Comparison: with Existing Method(s) Optical flow based tracking of treadmill motion is typically achieved using optical mice. The camera-based optical flow tracking system developed here is based on off-the-shelf components and offers control over the image acquisition and processing parameters. This results in flexibility with respect to tracking conditions – such as ball surface texture, lighting conditions, or ball size – as well as camera alignment and calibration.
Conclusions: A fast system for rotational ball motion tracking suitable for virtual reality animal behavior across different scales was developed and characterized.
Current robot platforms are being employed to collaborate with humans in a wide range of domestic and industrial tasks. These environments require autonomous systems that are able to classify and communicate anomalous situations such as fires, injured persons, car accidents; or generally, any potentially dangerous situation for humans. In this paper we introduce an anomaly detection dataset for the purpose of robot applications as well as the design and implementation of a deep learning architecture that classifies and describes dangerous situations using only a single image as input. We report a classification accuracy of 97 % and METEOR score of 16.2. We will make the dataset publicly available after this paper is accepted.
In this paper we propose an implement a general convolutional neural network (CNN) building framework for designing real-time CNNs. We validate our models by creating a real-time vision system which accomplishes the tasks of face detection, gender classification and emotion classification simultaneously in one blended step using our proposed CNN architecture. After presenting the details of the training procedure setup we proceed to evaluate on standard benchmark sets. We report accuracies of 96% in the IMDB gender dataset and 66% in the FER-2013 emotion dataset. Along with this we also introduced the very recent real-time enabled guided back-propagation visualization technique. Guided back-propagation uncovers the dynamics of the weight changes and evaluates the learned features. We argue that the careful implementation of modern CNN architectures, the use of the current regularization methods and the visualization of previously hidden features are necessary in order to reduce the gap between slow performances and real-time architectures. Our system has been validated by its deployment on a Care-O-bot 3 robot used during RoboCup@Home competitions. All our code, demos and pre-trained architectures have been released under an open-source license in our public repository.
The MAP-Elites algorithm produces a set of high-performing solutions that vary according to features defined by the user. This technique has the potential to be a powerful tool for design space exploration, but is limited by the need for numerous evaluations. The Surrogate-Assisted Illumination algorithm (SAIL), introduced here, integrates approximative models and intelligent sampling of the objective function to minimize the number of evaluations required by MAP-Elites.
The ability of SAIL to efficiently produce both accurate models and diverse high performing solutions is illustrated on a 2D airfoil design problem. The search space is divided into bins, each holding a design with a different combination of features. In each bin SAIL produces a better performing solution than MAP-Elites, and requires several orders of magnitude fewer evaluations. The CMA-ES algorithm was used to produce an optimal design in each bin: with the same number of evaluations required by CMA-ES to find a near-optimal solution in a single bin, SAIL finds solutions of similar quality in every bin.
Humans exhibit flexible and robust behavior in achieving their goals. We make suitable substitutions for objects, actions, or tools to get the job done. When opportunities that would allow us to reach our goals with less effort arise, we often take advantage of them. Robots are not nearly as robust in handling such situations. Enabling a domestic service robot to find ways to get a job done by making substitutions is the goal of our work. In this paper, we highlight the challenges faced in our approach to combine Hierarchical Task Network planning, Description Logics, and the notions of affordances and conceptual similarity. We present open questions in modeling the necessary knowledge, creating planning problems, and enabling the system to handle cases where plan generation fails due to missing/unavailable objects.
TinyECC 2.0 is an open source library for Elliptic Curve Cryptography (ECC) in wireless sensor networks. This paper analyzes the side channel susceptibility of TinyECC 2.0 on a LOTUS sensor node platform. In our work we measured the electromagnetic (EM) emanation during computation of the scalar multiplication using 56 different configurations of TinyECC 2.0. All of them were found to be vulnerable, but to a different degree. The different degrees of leakage include adversary success using (i) Simple EM Analysis (SEMA) with a single measurement, (ii) SEMA using averaging, and (iii) Multiple-Exponent Single-Data (MESD) with a single measurement of the secret scalar. It is extremely critical that in 30 TinyECC 2.0 configurations a single EM measurement of an ECC private key operation is sufficient to simply read out the secret scalar. MESD requires additional adversary capabilities and it affects all TinyECC 2.0 configurations, again with only a single measurement of the ECC private key operation. These findings give evidence that in security applications a configuration of TinyECC 2.0 should be chosen that withstands SEMA with a single measurement and, beyond that, an addition of appropriate randomizing countermeasures is necessary.
In this paper, we describe an approach that enables an autonomous system to infer the semantics of a command (i.e. a symbol sequence representing an action) in terms of the relations between changes in the observations and the action instances. We present a method of how to induce a theory (i.e. a semantic description) of the meaning of a command in terms of a minimal set of background knowledge. The only thing we have is a sequence of observations from which we extract what kinds of effects were caused by performing the command. This way, we yield a description of the semantics of the action and, hence, a definition.
Suppose we have n keys, n access probabilities for the keys, and n+1 access probabilities for the gaps between the keys. Let h_min(n) be the minimal height of a binary search tree for n keys. We consider the problem to construct an optimal binary search tree with near minimal height, i.e.\ with height h <= h_min(n) + Delta for some fixed Delta. It is shown, that for any fixed Delta optimal binary search trees with near minimal height can be constructed in time O(n^2). This is as fast as in the unrestricted case. So far, the best known algorithms for the construction of height-restricted optimal binary search trees have running time O(L n^2), whereby L is the maximal permitted height. Compared to these algorithms our algorithm is at least faster by a factor of log n, because L is lower bounded by log n.
The development of robot control programs is a complex task. Many robots are different in their electrical and mechanical structure which is also reflected in the software. Specific robot software environments support the program development, but are mainly text-based and usually applied by experts in the field with profound knowledge of the target robot. This paper presents a graphical programming environment which aims to ease the development of robot control programs. In contrast to existing graphical robot programming environments, our approach focuses on the composition of parallel action sequences. The developed environment allows to schedule independent robot actions on parallel execution lines and provides mechanism to avoid side-effects of parallel actions. The developed environment is platform-independent and based on the model-driven paradigm. The feasibility of our approach is shown by the application of the sequencer to a simulated service robot and a robot for educational purpose.
The prototype of a workflow system for the submission of content to a digital object repository is here presented. It is based entirely on open-source standard components and features a service-oriented architecture. The front-end consists of Java Business Process Management (jBPM), Java Server Faces (JSF), and Java Server Pages (JSP). A Fedora Repository and a mySQL data base management system serve as a back-end. The communication between front-end and back-end uses a SOAP minimal binding stub. We describe the design principles and the construction of the prototype and discuss the possibilities and limitations of work ow creation by administrators. The code of the prototype is open-source and can be retrieved in the project escipub at http://sourceforge.net/ .