TY - RPRT U1 - Verschiedenartige Texte A1 - Balabin, Helena T1 - Multimodal Transformers for Biomedical Text and Knowledge Graph Data N2 - Recent advances in Natural Language Processing have substantially improved contextualized representations of language. However, the inclusion of factual knowledge, particularly in the biomedical domain, remains challenging. Hence, many Language Models (LMs) are extended by Knowledge Graphs (KGs), but most approaches require entity linking (i.e., explicit alignment between text and KG entities). Inspired by single-stream multimodal Transformers operating on text, image and video data, this thesis proposes the Sophisticated Transformer trained on biomedical text and Knowledge Graphs (STonKGs). STonKGs incorporates a novel multimodal architecture based on a cross encoder that uses the attention mechanism on a concatenation of input sequences derived from text and KG triples, respectively. Over 13 million so-called text-triple pairs, coming from PubMed and assembled using the Integrated Network and Dynamical Reasoning Assembler (INDRA), were used in an unsupervised pre-training procedure to learn representations of biomedical knowledge in STonKGs. By comparing STonKGs to an NLP- and a KG-baseline (operating on either text or KG data) on a benchmark consisting of eight fine-tuning tasks, the proposed knowledge integration method applied in STonKGs was empirically validated. Specifically, on tasks with a comparatively small dataset size and a larger number of classes, STonKGs resulted in considerable performance gains, beating the F1-score of the best baseline by up to 0.083. Both the source code as well as the code used to implement STonKGs are made publicly available so that the proposed method of this thesis can be extended to many other biomedical applications. T3 - Technical Report / Hochschule Bonn-Rhein-Sieg University of Applied Sciences. Department of Computer Science - 02-2022 KW - Knowledge Graphs KW - representation learning KW - Natural Language Processing KW - Transformers KW - Bioinformatics KW - Machine Learning KW - Deep Learning Y2 - 2022 UN - https://nbn-resolving.org/urn:nbn:de:hbz:1044-opus-60816 SN - 1869-5272 SS - 1869-5272 SN - 978-3-96043-100-8 SB - 978-3-96043-100-8 U6 - https://doi.org/10.18418/978-3-96043-100-8 DO - https://doi.org/10.18418/978-3-96043-100-8 SP - xiii, 98 S1 - xiii, 98 ER -