Volltext-Downloads (blau) und Frontdoor-Views (grau)

Multimodal Transformers for Biomedical Text and Knowledge Graph Data

  • Recent advances in Natural Language Processing have substantially improved contextualized representations of language. However, the inclusion of factual knowledge, particularly in the biomedical domain, remains challenging. Hence, many Language Models (LMs) are extended by Knowledge Graphs (KGs), but most approaches require entity linking (i.e., explicit alignment between text and KG entities). Inspired by single-stream multimodal Transformers operating on text, image and video data, this thesis proposes the Sophisticated Transformer trained on biomedical text and Knowledge Graphs (STonKGs). STonKGs incorporates a novel multimodal architecture based on a cross encoder that uses the attention mechanism on a concatenation of input sequences derived from text and KG triples, respectively. Over 13 million so-called text-triple pairs, coming from PubMed and assembled using the Integrated Network and Dynamical Reasoning Assembler (INDRA), were used in an unsupervised pre-training procedure to learn representations of biomedical knowledge in STonKGs. By comparing STonKGs to an NLP- and a KG-baseline (operating on either text or KG data) on a benchmark consisting of eight fine-tuning tasks, the proposed knowledge integration method applied in STonKGs was empirically validated. Specifically, on tasks with a comparatively small dataset size and a larger number of classes, STonKGs resulted in considerable performance gains, beating the F1-score of the best baseline by up to 0.083. Both the source code as well as the code used to implement STonKGs are made publicly available so that the proposed method of this thesis can be extended to many other biomedical applications.

Download full text files

Export metadata

Additional Services

Search Google Scholar Check availability


Show usage statistics
Document Type:Report
Author:Helena Balabin
Number of pages:xiii, 98
Referee:Paul Gerhard Plöger, Martin Hofmann-Apitius, Daniel Domingo-Fernández
Publishing Institution:Hochschule Bonn-Rhein-Sieg
Date of first publication:2022/01/28
Series (Volume):Technical Report / Hochschule Bonn-Rhein-Sieg University of Applied Sciences. Department of Computer Science (02-2022)
Keyword:Bioinformatics; Deep Learning; Knowledge Graphs; Machine Learning; Natural Language Processing; Transformers; representation learning
Departments, institutes and facilities:Fachbereich Informatik
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Series:Technical Report / University of Applied Sciences Bonn-Rhein-Sieg. Department of Computer Science
Entry in this database:2022/01/28
Licence (Multiple languages):License LogoIn Copyright - Educational Use Permitted (Urheberrechtsschutz - Nutzung zu Bildungszwecken erlaubt)