Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 12 of 112
Back to Result List

Improving Natural Language Inference in Arabic using Transformer Models and Linguistically Informed Pre-Training

  • This paper addresses the classification of Arabic text data in the field of Natural Language Processing (NLP), with a particular focus on Natural Language Inference (NLI) and Contradiction Detection (CD). Arabic is considered a resource-poor language, meaning that there are few data sets available, which leads to limited availability of NLP methods. To overcome this limitation, we create a dedicated data set from publicly available resources. Subsequently, transformer-based machine learning models are being trained and evaluated. We find that a language-specific model (AraBERT) performs competitively with state-of-the-art multilingual approaches, when we apply linguistically informed pre-training methods such as Named Entity Recognition (NER). To our knowledge, this is the first large-scale evaluation for this task in Arabic, as well as the first application of multi-task pre-training in this context.

Export metadata

Additional Services

Search Google Scholar Check availability

Statistics

Show usage statistics
Metadaten
Document Type:Preprint
Language:English
Author:Mohammad Majd Saad Al Deen, Maren Pielka, Jörn Hees, Bouthaina Soulef Abdou, Rafet Sifa
DOI:https://doi.org/10.48550/arXiv.2307.14666
ArXiv Id:http://arxiv.org/abs/2307.14666
Publisher:arXiv
Date of first publication:2023/07/27
Departments, institutes and facilities:Fachbereich Informatik
Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE)
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 006 Spezielle Computerverfahren
Entry in this database:2023/08/07