pub H-BRS | Search

Improving Natural Language Inference in Arabic Using Transformer Models and Linguistically Informed Pre-Training (2024)

Al Deen, Mohammad Majd Saad ; Pielka, Maren ; Hees, Jörn ; Abdou, Bouthaina Soulef ; Sifa, Rafet

Knowledge Base Question Answering by Transformer-Based Graph Pattern Scoring (2023)

Lamott, Marcel ; Hees, Jörn ; Ulges, Adrian

Question Answering (QA) has gained significant attention in recent years, with transformer-based models improving natural language processing. However, issues of explainability remain, as it is difficult to determine whether an answer is based on a true fact or a hallucination. Knowledge-based question answering (KBQA) methods can address this problem by retrieving answers from a knowledge graph. This paper proposes a hybrid approach to KBQA called FRED, which combines pattern-based entity retrieval with a transformer-based question encoder. The method uses an evolutionary approach to learn SPARQL patterns, which retrieve candidate entities from a knowledge base. The transformer-based regressor is then trained to estimate each pattern’s expected F1 score for answering the question, resulting in a ranking ofcandidate entities. Unlike other approaches, FRED can attribute results to learned SPARQL patterns, making them more interpretable. The method is evaluated on two datasets and yields MAP scores of up to 73 percent, with the transformer-based interpretation falling only 4 pp short of an oracle run. Additionally, the learned patterns successfully complement manually generated ones and generalize well to novel questions.

Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training (2023)

Deen, Mohammad Majd Saad Al ; Pielka, Maren ; Hees, Jörn ; Abdou, Bouthaina Soulef ; Sifa, Rafet

This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.

TreeSatAI Benchmark Archive : a multi-sensor, multi-label dataset for tree species classification in remote sensing (2023)

Ahlswede, Steve ; Schulz, Christian ; Gava, Christiano ; Helber, Patrick ; Bischke, Benjamin ; Förster, Michael ; Arias, Florencia ; Hees, Jörn ; Demir, Begüm ; Kleinschmit, Birgit

Airborne and spaceborne platforms are the primary data sources for large-scale forest mapping, but visual interpretation for individual species determination is labor-intensive. Hence, various studies focusing on forests have investigated the benefits of multiple sensors for automated tree species classification. However, transferable deep learning approaches for large-scale applications are still lacking. This gap motivated us to create a novel dataset for tree species classification in central Europe based on multi-sensor data from aerial, Sentinel-1 and Sentinel-2 imagery. In this paper, we introduce the TreeSatAI Benchmark Archive, which contains labels of 20 European tree species (i.e., 15 tree genera) derived from forest administration data of the federal state of Lower Saxony, Germany. We propose models and guidelines for the application of the latest machine learning techniques for the task of tree species classification with multi-label data. Finally, we provide various benchmark experiments showcasing the information which can be derived from the different sensors including artificial neural networks and tree-based machine learning methods. We found that residual neural networks (ResNet) perform sufficiently well with weighted precision scores up to 79 % only by using the RGB bands of aerial imagery. This result indicates that the spatial content present within the 0.2 m resolution data is very informative for tree species classification. With the incorporation of Sentinel-1 and Sentinel-2 imagery, performance improved marginally. However, the sole use of Sentinel-2 still allows for weighted precision scores of up to 74 % using either multi-layer perceptron (MLP) or Light Gradient Boosting Machine (LightGBM) models. Since the dataset is derived from real-world reference data, it contains high class imbalances. We found that this dataset attribute negatively affects the models' performances for many of the underrepresented classes (i.e., scarce tree species). However, the class-wise precision of the best-performing late fusion model still reached values ranging from 54 % (Acer) to 88 % (Pinus). Based on our results, we conclude that deep learning techniques using aerial imagery could considerably support forestry administration in the provision of large-scale tree species maps at a very high resolution to plan for challenges driven by global environmental change. The original dataset used in this paper is shared via Zenodo (https://doi.org/10.5281/zenodo.6598390, Schulz et al., 2022). For citation of the dataset, we refer to this article.

RECol: Reconstruction Error Columns for Outlier Detection (2023)

Herurkar, Dayananda ; Meier, Mario ; Hees, Jörn

Cross-Domain Transformation for Outlier Detection on Tabular Datasets (2023)

Herurkar, Dayananda ; Sattarov, Timur ; Hees, Jörn ; Palacio, Sebastian ; Raue, Federico ; Dengel, Andreas

Author(s)
Title
Other Person(s)
Referee(s)
Abstract
Fulltext

Open Access

Refine

H-BRS Bibliography

Departments, institutes and facilities

Document Type

Year of publication

Language

Has Fulltext

Keywords

6 search hits