Refine
H-BRS Bibliography
- yes (24)
Departments, institutes and facilities
- Fachbereich Informatik (24)
- Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE) (21)
- Institut für KI und Autonome Systeme (A2S) (15)
- Internationales Zentrum für Nachhaltige Entwicklung (IZNE) (3)
- Fachbereich Ingenieurwissenschaften und Kommunikation (2)
- Fachbereich Wirtschaftswissenschaften (1)
Document Type
- Conference Object (10)
- Preprint (7)
- Article (5)
- Part of a Book (1)
- Dataset (1)
Language
- English (24)
Keywords
- deep learning (3)
- UAV (2)
- remote sensing (2)
- Computation and Language (cs.CL) (1)
- Data Fusion (1)
- FOS: Computer and information sciences (1)
- Forests (1)
- Ghana (1)
- Human computer interaction (1)
- Interpretability (1)
Deployment of modern data-driven machine learning methods, most often realized by deep neural networks (DNNs), in safety-critical applications such as health care, industrial plant control, or autonomous driving is highly challenging due to numerous model-inherent shortcomings. These shortcomings are diverse and range from a lack of generalization over insufficient interpretability and implausible predictions to directed attacks by means of malicious inputs. Cyber-physical systems employing DNNs are therefore likely to suffer from so-called safety concerns, properties that preclude their deployment as no argument or experimental setup can help to assess the remaining risk. In recent years, an abundance of state-of-the-art techniques aiming to address these safety concerns has emerged. This chapter provides a structured and broad overview of them. We first identify categories of insufficiencies to then describe research activities aiming at their detection, quantification, or mitigation. Our work addresses machine learning experts and safety engineers alike: The former ones might profit from the broad range of machine learning topics covered and discussions on limitations of recent methods. The latter ones might gain insights into the specifics of modern machine learning methods. We hope that this contribution fuels discussions on desiderata for machine learning systems and strategies on how to help to advance existing approaches accordingly.
Fatigue strength estimation is a costly manual material characterization process in which state-of-the-art approaches follow a standardized experiment and analysis procedure. In this paper, we examine a modular, Machine Learning-based approach for fatigue strength estimation that is likely to reduce the number of experiments and, thus, the overall experimental costs. Despite its high potential, deployment of a new approach in a real-life lab requires more than the theoretical definition and simulation. Therefore, we study the robustness of the approach against misspecification of the prior and discretization of the specified loads. We identify its applicability and its advantageous behavior over the state-of-the-art methods, potentially reducing the number of costly experiments.
The accurate forecasting of solar radiation plays an important role for predictive control applications for energy systems with a high share of photovoltaic (PV) energy. Especially off-grid microgrid applications using predictive control applications can benefit from forecasts with a high temporal resolution to address sudden fluctuations of PV-power. However, cloud formation processes and movements are subject to ongoing research. For now-casting applications, all-sky-imagers (ASI) are used to offer an appropriate forecasting for aforementioned application. Recent research aims to achieve these forecasts via deep learning approaches, either as an image segmentation task to generate a DNI forecast through a cloud vectoring approach to translate the DNI to a GHI with ground-based measurement (Fabel et al., 2022; Nouri et al., 2021), or as an end-to-end regression task to generate a GHI forecast directly from the images (Paletta et al., 2021; Yang et al., 2021). While end-to-end regression might be the more attractive approach for off-grid scenarios, literature reports increased performance compared to smart-persistence but do not show satisfactory forecasting patterns (Paletta et al., 2021). This work takes a step back and investigates the possibility to translate ASI-images to current GHI to deploy the neural network as a feature extractor. An ImageNet pre-trained deep learning model is used to achieve such translation on an openly available dataset by the University of California San Diego (Pedro et al., 2019). The images and measurements were collected in Folsom, California. Results show that the neural network can successfully translate ASI-images to GHI for a variety of cloud situations without the need of any external variables. Extending the neural network to a forecasting task also shows promising forecasting patterns, which shows that the neural network extracts both temporal and momentarily features within the images to generate GHI forecasts.
The workshop XAI for U aims to address the critical need for transparency in Artificial Intelligence (AI) systems that integrate into our daily lives through mobile systems, wearables, and smart environments. Despite advances in AI, many of these systems remain opaque, making it difficult for users, developers, and stakeholders to verify their reliability and correctness. This workshop addresses the pressing need for enabling Explainable AI (XAI) tools within Ubiquitous and Wearable Computing and highlights the unique challenges that come with it, such as XAI that deals with time-series and multimodal data, XAI that explains interconnected machine learning (ML) components, and XAI that provides user-centered explanations. The workshop aims to foster collaboration among researchers in related domains, share recent advancements, address open challenges, and propose future research directions to improve the applicability and development of XAI in Ubiquitous Pervasive and Wearable Computing - and with that seeks to enhance user trust, understanding, interaction, and adoption, ensuring that AI- driven solutions are not only more explainable but also more aligned with ethical standards and user expectations.
A company's financial documents use tables along with text to organize the data containing key performance indicators (KPIs) (such as profit and loss) and a financial quantity linked to them. The KPI’s linked quantity in a table might not be equal to the similarly described KPI's quantity in a text. Auditors take substantial time to manually audit these financial mistakes and this process is called consistency checking. As compared to existing work, this paper attempts to automate this task with the help of transformer-based models. Furthermore, for consistency checking it is essential for the table's KPIs embeddings to encode the semantic knowledge of the KPIs and the structural knowledge of the table. Therefore, this paper proposes a pipeline that uses a tabular model to get the table's KPIs embeddings. The pipeline takes input table and text KPIs, generates their embeddings, and then checks whether these KPIs are identical. The pipeline is evaluated on the financial documents in the German language and a comparative analysis of the cell embeddings' quality from the three tabular models is also presented. From the evaluation results, the experiment that used the English-translated text and table KPIs and Tabbie model to generate table KPIs’ embeddings achieved an accuracy of 72.81% on the consistency checking task, outperforming the benchmark, and other tabular models.
In vision tasks, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, convolution requires multiple stacked layers and a hierarchical structure for large context. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to the non-causal two-dimensional image space. We scale the Hyena convolution kernels beyond the feature map size up to 191$\times$191 to maximize the ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 83.0% and 83.5%, respectively, while outperforming other large-kernel networks. Combining HyenaPixel with attention further increases accuracy to 83.6%. We attribute the success of attention to the lack of spatial bias in later stages and support this finding with bidirectional Hyena.
To ensure reliable performance of Question Answering (QA) systems, evaluation of robustness is crucial. Common evaluation benchmarks commonly only include performance metrics, such as Exact Match (EM) and the F1 score. However, these benchmarks overlook critical factors for the deployment of QA systems. This oversight can result in systems vulnerable to minor perturbations in the input such as typographical errors. While several methods have been proposed to test the robustness of QA models, there has been minimal exploration of these approaches for languages other than English. This study focuses on the robustness evaluation of German language QA models, extending methodologies previously applied primarily to English. The objective is to nurture the development of robust models by defining an evaluation method specifically tailored to the German language. We assess the applicability of perturbations used in English QA models for German and perform a comprehensive experimental evaluation with eight models. The results show that all models are vulnerable to character-level perturbations. Additionally, the comparison of monolingual and multilingual models suggest that the former are less affected by character and word-level perturbations.
In computer vision, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, its quadratic complexity limits its applicability to tasks that benefit from high-resolution input. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to bidirectional data and two-dimensional image space. We scale Hyena’s convolution kernels beyond the feature map size, up to 191×191, to maximize ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 84.9% and 85.2%, respectively, with no additional training data, while outperforming other convolutional and large-kernel networks. Combining HyenaPixel with attention further improves accuracy. We attribute the success of bidirectional Hyena to learning the data-dependent geometric arrangement of pixels without a fixed neighborhood definition. Experimental results on downstream tasks suggest that HyenaPixel with large filters and a fixed neighborhood leads to better localization performance.
Contrastive learning (CL) approaches have gained great recognition as a very successful subset of self-supervised learning (SSL) methods. SSL enables learning from unlabeled data, a crucial step in the advancement of deep learning, particularly in computer vision (CV), given the plethora of unlabeled image data. CL works by comparing different random augmentations (e.g., different crops) of the same image, thus achieving self-labeling. Nevertheless, randomly augmenting images and especially random cropping can result in an image that is semantically very distant from the original and therefore leads to false labeling, hence undermining the efficacy of the methods. In this research, two novel parameterized cropping methods are introduced that increase the robustness of self-labeling and consequently increase the efficacy. The results show that the use of these methods significantly improves the accuracy of the model by between 2.7\% and 12.4\% on the downstream task of classifying CIFAR-10, depending on the crop size compared to that of the non-parameterized random cropping method.
Quadruped robots excel in traversing complex, unstructured environments where wheeled robots often fail. However, enabling efficient and adaptable locomotion remains challenging due to the quadrupeds' nonlinear dynamics, high degrees of freedom, and the computational demands of real-time control. Optimization-based controllers, such as Nonlinear Model Predictive Control (NMPC), have shown strong performance, but their reliance on accurate state estimation and high computational overhead makes deployment in real-world settings challenging. In this work, we present a Multi-Task Learning (MTL) framework in which expert NMPC demonstrations are used to train a single neural network to predict actions for multiple locomotion behaviors directly from raw proprioceptive sensor inputs. We evaluate our approach extensively on the quadruped robot Go1, both in simulation and on real hardware, demonstrating that it accurately reproduces expert behavior, allows smooth gait switching, and simplifies the control pipeline for real-time deployment. Our MTL architecture enables learning diverse gaits within a unified policy, achieving high $R^{2}$ scores for predicted joint targets across all tasks.
Unmanned Aerial Vehicles (UAVs) are increasingly used for reforestation and forest monitoring, including seed dispersal in hard-to-reach terrains. However, a detailed understanding of the forest floor remains a challenge due to high natural variability, quickly changing environmental parameters, and ambiguous annotations due to unclear definitions. To address this issue, we adapt the Segment Anything Model (SAM), a vision foundation model with strong generalization capabilities, to segment forest floor objects such as tree stumps, vegetation, and woody debris. To this end, we employ parameter-efficient fine-tuning (PEFT) to fine-tune a small subset of additional model parameters while keeping the original weights fixed. We adjust SAM's mask decoder to generate masks corresponding to our dataset categories, allowing for automatic segmentation without manual prompting. Our results show that the adapter-based PEFT method achieves the highest mean intersection over union (mIoU), while Low-rank Adaptation (LoRA), with fewer parameters, offers a lightweight alternative for resource-constrained UAV platforms.
In Ghana, unreliable public grid infrastructure greatly impacts rural healthcare, where diesel generators are commonly used despite their high financial and environmental costs. Photovoltaic (PV)-hybrid systems offer a sustainable alternative, but require robust, predictive control strategies to ensure reliability. This study proposes a sector-specific Model Predictive Control (MPC) approach, integrating advanced load and meteorological forecasting for optimal energy dispatch. The methodology includes a long-short-term memory (LSTM)-based load forecasting model with probabilistic Monte Carlo dropout, a customized Numerical Weather Prediction (NWP) model based on the Weather Research and Forecasting (WRF) framework, and deep learning-based All-Sky Imager (ASI) nowcasting to improve short-term solar predictions. By combining these forecasting methods into a seamless prediction framework, the proposed MPC optimizes system performance while reducing reliance on fossil fuels. This study benchmarks the MPC against a traditional rule-based dispatch system, using data collected from a rural health facility in Kologo, Ghana. Results demonstrate that predictive control greatly reduces both economic and ecological costs. Compared to rule-based dispatch, diesel generator operation and fuel consumption are reduced by up to 61.62% and 47.17%, leading to economical and ecological cost savings of up to 20.7% and 31.78%. Additionally, system reliability improves, with battery depletion events during blackouts decreasing by up to 99.42%, while wear and tear on the diesel generator and battery are reduced by up to 54.93% and 37.34%, respectively. Furthermore, hyperparameter tuning enhances MPC performance, introducing further optimization potential. These findings highlight the effectiveness of predictive control in improving energy resilience for critical healthcare applications in rural settings.
This work proposes a novel approach for probabilistic end-to-end all-sky imager-based nowcasting with horizons of up to 30 min using an ImageNet pre-trained deep neural network. The method involves a two-stage approach. First, a backbone model is trained to estimate the irradiance from all-sky imager (ASI) images. The model is then extended and retrained on image and parameter sequences for forecasting. An open access data set is used for training and evaluation. We investigated the impact of simultaneously considering global horizontal (GHI), direct normal (DNI), and diffuse horizontal irradiance (DHI) on training time and forecast performance as well as the effect of adding parameters describing the irradiance variability proposed in the literature. The backbone model estimates current GHI with an RMSE and MAE of 58.06 and 29.33 W m−2, respectively. When extended for forecasting, the model achieves an overall positive skill score reaching 18.6 % compared to a smart persistence forecast. Minor modifications to the deterministic backbone and forecasting models enables the architecture to output an asymmetrical probability distribution and reduces training time while leading to similar errors for the backbone models. Investigating the impact of variability parameters shows that they reduce training time but have no significant impact on the GHI forecasting performance for both deterministic and probabilistic forecasting while simultaneously forecasting GHI, DNI, and DHI reduces the forecast performance.