Refine
H-BRS Bibliography
- yes (50) (remove)
Departments, institutes and facilities
- Fachbereich Informatik (50) (remove)
Document Type
- Preprint (50) (remove)
Year of publication
Language
- English (50)
Keywords
Force field (FF) based molecular modeling is an often used method to investigate and study structural and dynamic properties of (bio-)chemical substances and systems. When such a system is modeled or refined, the force field parameters need to be adjusted. This force field parameter optimization can be a tedious task and is always a trade-off in terms of errors regarding the targeted properties. To better control the balance of various properties’ errors, in this study we introduce weighting factors for the optimization objectives. Different weighting strategies are compared to fine-tune the balance between bulk-phase density and relative conformational energies (RCE), using n-octane as a representative system. Additionally, a non-linear projection of the individual property-specific parts of the optimized loss function is deployed to further improve the balance between them. The results show that the overall error is reduced. One interesting outcome is a large variety in the resulting optimized force field parameters (FFParams) and corresponding errors, suggesting that the optimization landscape is multi-modal and very dependent on the weighting factor setup. We conclude that adjusting the weighting factors can be a very important feature to lower the overall error in the FF optimization procedure, giving researchers the possibility to fine-tune their FFs.
In vision tasks, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, convolution requires multiple stacked layers and a hierarchical structure for large context. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to the non-causal two-dimensional image space. We scale the Hyena convolution kernels beyond the feature map size up to 191$\times$191 to maximize the ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 83.0% and 83.5%, respectively, while outperforming other large-kernel networks. Combining HyenaPixel with attention further increases accuracy to 83.6%. We attribute the success of attention to the lack of spatial bias in later stages and support this finding with bidirectional Hyena.
This paper presents the b-it-bots RoboCup@Work team and its current hardware and functional architecture for the KUKA youBot robot. We describe the underlying software framework and the developed capabilities required for operating in industrial environments including features such as reliable and precise navigation, flexible manipulation, robust object recognition and task planning. New developments include an approach to grasp vertical objects, placement of objects by considering the empty space on a workstation, and the process of porting our code to ROS2.
This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.
Loading of shipping containers for dairy products often includes a press-fit task, which involves manually stacking milk cartons in a container without using pallets or packaging. Automating this task with a mobile manipulator can reduce worker strain, and also enhance the efficiency and safety of the container loading process. This paper proposes an approach called Adaptive Compliant Control with Integrated Failure Recovery (ACCIFR), which enables a mobile manipulator to reliably perform the press-fit task. We base the approach on a demonstration learning-based compliant control framework, such that we integrate a monitoring and failure recovery mechanism for successful task execution. Concretely, we monitor the execution through distance and force feedback, detect collisions while the robot is performing the press-fit task, and use wrench measurements to classify the direction of collision; this information informs the subsequent recovery process. We evaluate the method on a miniature container setup, considering variations in the (i) starting position of the end effector, (ii) goal configuration, and (iii) object grasping position. The results demonstrate that the proposed approach outperforms the baseline demonstration-based learning framework regarding adaptability to environmental variations and the ability to recover from collision failures, making it a promising solution for practical press-fit applications.
Saliency methods are frequently used to explain Deep Neural Network-based models. Adebayo et al.'s work on evaluating saliency methods for classification models illustrate certain explanation methods fail the model and data randomization tests. However, on extending the tests for various state of the art object detectors we illustrate that the ability to explain a model is more dependent on the model itself than the explanation method. We perform sanity checks for object detection and define new qualitative criteria to evaluate the saliency explanations, both for object classification and bounding box decisions, using Guided Backpropagation, Integrated Gradients, and their Smoothgrad versions, together with Faster R-CNN, SSD, and EfficientDet-D0, trained on COCO. In addition, the sensitivity of the explanation method to model parameters and data labels varies class-wise motivating to perform the sanity checks for each class. We find that EfficientDet-D0 is the most interpretable method independent of the saliency method, which passes the sanity checks with little problems.
Robots applied in therapeutic scenarios, for instance in the therapy of individuals with Autism Spectrum Disorder, are sometimes used for imitation learning activities in which a person needs to repeat motions by the robot. To simplify the task of incorporating new types of motions that a robot can perform, it is desirable that the robot has the ability to learn motions by observing demonstrations from a human, such as a therapist. In this paper, we investigate an approach for acquiring motions from skeleton observations of a human, which are collected by a robot-centric RGB-D camera. Given a sequence of observations of various joints, the joint positions are mapped to match the configuration of a robot before being executed by a PID position controller. We evaluate the method, in particular the reproduction error, by performing a study with QTrobot in which the robot acquired different upper-body dance moves from multiple participants. The results indicate the method's overall feasibility, but also indicate that the reproduction quality is affected by noise in the skeleton observations.
The representation, or encoding, utilized in evolutionary algorithms has a substantial effect on their performance. Examination of the suitability of widely used representations for quality diversity optimization (QD) in robotic domains has yielded inconsistent results regarding the most appropriate encoding method. Given the domain-dependent nature of QD, additional evidence from other domains is necessary. This study compares the impact of several representations, including direct encoding, a dictionary-based representation, parametric encoding, compositional pattern producing networks, and cellular automata, on the generation of voxelized meshes in an architecture setting. The results reveal that some indirect encodings outperform direct encodings and can generate more diverse solution sets, especially when considering full phenotypic diversity. The paper introduces a multi-encoding QD approach that incorporates all evaluated representations in the same archive. Species of encodings compete on the basis of phenotypic features, leading to an approach that demonstrates similar performance to the best single-encoding QD approach. This is noteworthy, as it does not always require the contribution of the best-performing single encoding.
Quality diversity algorithms can be used to efficiently create a diverse set of solutions to inform engineers' intuition. But quality diversity is not efficient in very expensive problems, needing 100.000s of evaluations. Even with the assistance of surrogate models, quality diversity needs 100s or even 1000s of evaluations, which can make it use infeasible. In this study we try to tackle this problem by using a pre-optimization strategy on a lower-dimensional optimization problem and then map the solutions to a higher-dimensional case. For a use case to design buildings that minimize wind nuisance, we show that we can predict flow features around 3D buildings from 2D flow features around building footprints. For a diverse set of building designs, by sampling the space of 2D footprints with a quality diversity algorithm, a predictive model can be trained that is more accurate than when trained on a set of footprints that were selected with a space-filling algorithm like the Sobol sequence. Simulating only 16 buildings in 3D, a set of 1024 building designs with low predicted wind nuisance is created. We show that we can produce better machine learning models by producing training data with quality diversity instead of using common sampling techniques. The method can bootstrap generative design in a computationally expensive 3D domain and allow engineers to sweep the design space, understanding wind nuisance in early design phases.
Representing 3D surfaces as level sets of continuous functions over R3 is the common denominator of neural implicit representations, which recently enabled remarkable progress in geometric deep learning and computer vision tasks. In order to represent 3D motion within this framework, it is often assumed (either explicitly or implicitly) that the transformations which a surface may undergo are homeomorphic: this is not necessarily true, for instance, in the case of fluid dynamics. In order to represent more general classes of deformations, we propose to apply this theoretical framework as regularizers for the optimization of simple 4D implicit functions (such as signed distance fields). We show that our representation is capable of capturing both homeomorphic and topology-changing deformations, while also defining correspondences over the continuously-reconstructed surfaces.
State-of-the-art object detectors are treated as black boxes due to their highly non-linear internal computations. Even with unprecedented advancements in detector performance, the inability to explain how their outputs are generated limits their use in safety-critical applications. Previous work fails to produce explanations for both bounding box and classification decisions, and generally make individual explanations for various detectors. In this paper, we propose an open-source Detector Explanation Toolkit (DExT) which implements the proposed approach to generate a holistic explanation for all detector decisions using certain gradient-based explanation methods. We suggests various multi-object visualization methods to merge the explanations of multiple objects detected in an image as well as the corresponding detections in a single image. The quantitative evaluation show that the Single Shot MultiBox Detector (SSD) is more faithfully explained compared to other detectors regardless of the explanation methods. Both quantitative and human-centric evaluations identify that SmoothGrad with Guided Backpropagation (GBP) provides more trustworthy explanations among selected methods across all detectors. We expect that DExT will motivate practitioners to evaluate object detectors from the interpretability perspective by explaining both bounding box and classification decisions.
21 pages, with supplementary
Vietnam requires a sustainable urbanization, for which city sensing is used in planning and de-cision-making. Large cities need portable, scalable, and inexpensive digital technology for this purpose. End-to-end air quality monitoring companies such as AirVisual and Plume Air have shown their reliability with portable devices outfitted with superior air sensors. They are pricey, yet homeowners use them to get local air data without evaluating the causal effect. Our air quality inspection system is scalable, reasonably priced, and flexible. Minicomputer of the sys-tem remotely monitors PMS7003 and BME280 sensor data through a microcontroller processor. The 5-megapixel camera module enables researchers to infer the causal relationship between traffic intensity and dust concentration. The design enables inexpensive, commercial-grade hardware, with Azure Blob storing air pollution data and surrounding-area imagery and pre-venting the system from physically expanding. In addition, by including an air channel that re-plenishes and distributes temperature, the design improves ventilation and safeguards electrical components. The gadget allows for the analysis of the correlation between traffic and air quali-ty data, which might aid in the establishment of sustainable urban development plans and poli-cies.
Fatigue strength estimation is a costly manual material characterization process in which state-of-the-art approaches follow a standardized experiment and analysis procedure. In this paper, we examine a modular, Machine Learning-based approach for fatigue strength estimation that is likely to reduce the number of experiments and, thus, the overall experimental costs. Despite its high potential, deployment of a new approach in a real-life lab requires more than the theoretical definition and simulation. Therefore, we study the robustness of the approach against misspecification of the prior and discretization of the specified loads. We identify its applicability and its advantageous behavior over the state-of-the-art methods, potentially reducing the number of costly experiments.
Safety-critical applications like autonomous driving use Deep Neural Networks (DNNs) for object detection and segmentation. The DNNs fail to predict when they observe an Out-of-Distribution (OOD) input leading to catastrophic consequences. Existing OOD detection methods were extensively studied for image inputs but have not been explored much for LiDAR inputs. So in this study, we proposed two datasets for benchmarking OOD detection in 3D semantic segmentation. We used Maximum Softmax Probability and Entropy scores generated using Deep Ensembles and Flipout versions of RandLA-Net as OOD scores. We observed that Deep Ensembles out perform Flipout model in OOD detection with greater AUROC scores for both datasets.
In robot-assisted therapy for individuals with Autism Spectrum Disorder, the workload of therapists during a therapeutic session is increased if they have to control the robot manually. To allow therapists to focus on the interaction with the person instead, the robot should be more autonomous, namely it should be able to interpret the person's state and continuously adapt its actions according to their behaviour. In this paper, we develop a personalised robot behaviour model that can be used in the robot decision-making process during an activity; this behaviour model is trained with the help of a user model that has been learned from real interaction data. We use Q-learning for this task, such that the results demonstrate that the policy requires about 10,000 iterations to converge. We thus investigate policy transfer for improving the convergence speed; we show that this is a feasible solution, but an inappropriate initial policy can lead to a suboptimal final return.
We introduce canonical weight normalization for convolutional neural networks. Inspired by the canonical tensor decomposition, we express the weight tensors in so-called canonical networks as scaled sums of outer vector products. In particular, we train network weights in the decomposed form, where scale weights are optimized separately for each mode. Additionally, similarly to weight normalization, we include a global scaling parameter. We study the initialization of the canonical form by running the power method and by drawing randomly from Gaussian or uniform distributions. Our results indicate that we can replace the power method with cheaper initializations drawn from standard distributions. The canonical re-parametrization leads to competitive normalization performance on the MNIST, CIFAR10, and SVHN data sets. Moreover, the formulation simplifies network compression. Once training has converged, the canonical form allows convenient model-compression by truncating the parameter sums.
TSEM: Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series
(2022)
Deep learning has become a one-size-fits-all solution for technical and business domains thanks to its flexibility and adaptability. It is implemented using opaque models, which unfortunately undermines the outcome trustworthiness. In order to have a better understanding of the behavior of a system, particularly one driven by time series, a look inside a deep learning model so-called posthoc eXplainable Artificial Intelligence (XAI) approaches, is important. There are two major types of XAI for time series data, namely model-agnostic and model-specific. Model-specific approach is considered in this work. While other approaches employ either Class Activation Mapping (CAM) or Attention Mechanism, we merge the two strategies into a single system, simply called the Temporally Weighted Spatiotemporal Explainable Neural Network for Multivariate Time Series (TSEM). TSEM combines the capabilities of RNN and CNN models in such a way that RNN hidden units are employed as attention weights for the CNN feature maps temporal axis. The result shows that TSEM outperforms XCM. It is similar to STAM in terms of accuracy, while also satisfying a number of interpretability criteria, including causality, fidelity, and spatiotemporality.
Self-supervised learning has proved to be a powerful approach to learn image representations without the need of large labeled datasets. For underwater robotics, it is of great interest to design computer vision algorithms to improve perception capabilities such as sonar image classification. Due to the confidential nature of sonar imaging and the difficulty to interpret sonar images, it is challenging to create public large labeled sonar datasets to train supervised learning algorithms. In this work, we investigate the potential of three self-supervised learning methods (RotNet, Denoising Autoencoders, and Jigsaw) to learn high-quality sonar image representation without the need of human labels. We present pre-training and transfer learning results on real-life sonar image datasets. Our results indicate that self-supervised pre-training yields classification performance comparable to supervised pre-training in a few-shot transfer learning setup across all three methods. Code and self-supervised pre-trained models are be available at https://github.com/agrija9/ssl-sonar-images
It has been well proved that deep networks are efficient at extracting features from a given (source) labeled dataset. However, it is not always the case that they can generalize well to other (target) datasets which very often have a different underlying distribution. In this report, we evaluate four different domain adaptation techniques for image classification tasks: DeepCORAL, DeepDomainConfusion, CDAN and CDAN+E. These techniques are unsupervised given that the target dataset dopes not carry any labels during training phase. We evaluate model performance on the office-31 dataset. A link to the github repository of this report can be found here: https://github.com/agrija9/Deep-Unsupervised-Domain-Adaptation.
Recent experimental evidence suggest that mebendazole, a popular antiparasitic drug, binds to heat shock protein 90 (Hsp90) and inhibit acute myeloid leukemia cell growth. In this study we use quantum mechanics (QM), molecular similarity and molecular dynamics (MD) calculations to predict possible binding poses of mebendazole to the adenosine triphosphate (ATP) binding site of Hsp90. Extensive conformational searches and minimization of the five tautomers of mebendazole using MP2/aug-cc-pVTZ theory level resulting in 152 minima being identified. Mebendazole-Hsp90 complex models were created using the QM optimized conformations and protein coordinates obtained from experimental crystal structures that were chosen through similarity calculations. Nine different poses were identified from a total of 600 ns of explicit solvent, all-atom MD simulations using two different force fields. All simulations support the hypothesis that mebendazole is able to bind to the ATP binding site of Hsp90.
Urban LoRa networks promise to provide a cost-efficient and scalable communication backbone for smart cities. One core challenge in rolling out and operating these networks is radio network planning, i.e., precise predictions about possible new locations and their impact on network coverage. Path loss models aid in this task, but evaluating and comparing different models requires a sufficiently large set of high-quality received packet power samples. In this paper, we report on a corresponding large-scale measurement study covering an urban area of 200km2 over a period of 230 days using sensors deployed on garbage trucks, resulting in more than 112 thousand high-quality samples for received packet power. Using this data, we compare eleven previously proposed path loss models and additionally provide new coefficients for the Log-distance model. Our results reveal that the Log-distance model and other well-known empirical models such as Okumura or Winner+ provide reasonable estimations in an urban environment, and terrain based models such as ITM or ITWOM have no advantages. In addition, we derive estimations for the needed sample size in similar measurement campaigns. To stimulate further research in this direction, we make all our data publicly available.
The majority of biomedical knowledge is stored in structured databases or as unstructured text in scientific publications. This vast amount of information has led to numerous machine learning-based biological applications using either text through natural language processing (NLP) or structured data through knowledge graph embedding models (KGEMs). However, representations based on a single modality are inherently limited. To generate better representations of biological knowledge, we propose STonKGs, a Sophisticated Transformer trained on biomedical text and Knowledge Graphs. This multimodal Transformer uses combined input sequences of structured information from KGs and unstructured text data from biomedical literature to learn joint representations. First, we pre-trained STonKGs on a knowledge base assembled by the Integrated Network and Dynamical Reasoning Assembler (INDRA) consisting of millions of text-triple pairs extracted from biomedical literature by multiple NLP systems. Then, we benchmarked STonKGs against two baseline models trained on either one of the modalities (i.e., text or KG) across eight different classification tasks, each corresponding to a different biological application. Our results demonstrate that STonKGs outperforms both baselines, especially on the more challenging tasks with respect to the number of classes, improving upon the F1-score of the best baseline by up to 0.083. Additionally, our pre-trained model as well as the model architecture can be adapted to various other transfer learning applications. Finally, the source code and pre-trained STonKGs models are available at https://github.com/stonkgs/stonkgs and https://huggingface.co/stonkgs/stonkgs-150k.
Application of underwater robots are on the rise, most of them are dependent on sonar for underwater vision, but the lack of strong perception capabilities limits them in this task. An important issue in sonar perception is matching image patches, which can enable other techniques like localization, change detection, and mapping. There is a rich literature for this problem in color images, but for acoustic images, it is lacking, due to the physics that produce these images. In this paper we improve on our previous results for this problem (Valdenegro-Toro et al, 2017), instead of modeling features manually, a Convolutional Neural Network (CNN) learns a similarity function and predicts if two input sonar images are similar or not. With the objective of improving the sonar image matching problem further, three state of the art CNN architectures are evaluated on the Marine Debris dataset, namely DenseNet, and VGG, with a siamese or two-channel architecture, and contrastive loss. To ensure a fair evaluation of each network, thorough hyper-parameter optimization is executed. We find that the best performing models are DenseNet Two-Channel network with 0.955 AUC, VGG-Siamese with contrastive loss at 0.949 AUC and DenseNet Siamese with 0.921 AUC. By ensembling the top performing DenseNet two-channel and DenseNet-Siamese models overall highest prediction accuracy obtained is 0.978 AUC, showing a large improvement over the 0.91 AUC in the state of the art.
The Covid-19 pandemic has challenged educators across the world to move their teaching and mentoring from in-person to remote. During nonpandemic semesters at their institutes (e.g. universities), educators can directly provide students the software environment needed to support their learning - either in specialized computer laboratories (e.g. computational chemistry labs) or shared computer spaces. These labs are often supported by staff that maintains the operating systems (OS) and software. But how does one provide a specialized software environment for remote teaching? One solution is to provide students a customized operating system (e.g., Linux) that includes open-source software for supporting your teaching goals. However, such a solution should not require students to install the OS alongside their existing one (i.e. dual/multi-booting) or be used as a complete replacement. Such approaches are risky because of a) the students' possible lack of software expertise, b) the possible disruption of an existing software workflow that is needed in other classes or by other family members, and c) the importance of maintaining a working computer when isolated (e.g. societal restrictions). To illustrate possible solutions, we discuss our approach that used a customized Linux OS and a Docker container in a course that teaches computational chemistry and Python3.
Risk-based authentication (RBA) aims to strengthen password-based authentication rather than replacing it. RBA does this by monitoring and recording additional features during the login process. If feature values at login time differ significantly from those observed before, RBA requests an additional proof of identification. Although RBA is recommended in the NIST digital identity guidelines, it has so far been used almost exclusively by major online services. This is partly due to a lack of open knowledge and implementations that would allow any service provider to roll out RBA protection to its users.
To close this gap, we provide a first in-depth analysis of RBA characteristics in a practical deployment. We observed N=780 users with 247 unique features on a real-world online service for over 1.8 years. Based on our collected data set, we provide (i) a behavior analysis of two RBA implementations that were apparently used by major online services in the wild, (ii) a benchmark of the features to extract a subset that is most suitable for RBA use, (iii) a new feature that has not been used in RBA before, and (iv) factors which have a significant effect on RBA performance. Our results show that RBA needs to be carefully tailored to each online service, as even small configuration adjustments can greatly impact RBA's security and usability properties. We provide insights on the selection of features, their weightings, and the risk classification in order to benefit from RBA after a minimum number of login attempts.