Volltext-Downloads (blau) und Frontdoor-Views (grau)

ProtSTonKGs: A Sophisticated Transformer Trained on Protein Sequences, Text, and Knowledge Graphs

  • While most approaches individually exploit unstructured data from the biomedical literature or structured data from biomedical knowledge graphs, their union can better exploit the advantages of such approaches, ultimately improving representations of biology. Using multimodal transformers for such purposes can improve performance on context dependent classication tasks, as demonstrated by our previous model, the Sophisticated Transformer Trained on Biomedical Text and Knowledge Graphs (STonKGs). In this work, we introduce ProtSTonKGs, a transformer aimed at learning all-encompassing representations of protein-protein interactions. ProtSTonKGs presents an extension to our previous work by adding textual protein descriptions and amino acid sequences (i.e., structural information) to the text- and knowledge graph-based input sequence used in STonKGs. We benchmark ProtSTonKGs against STonKGs, resulting in improved F1 scores by up to 0.066 (i.e., from 0.204 to 0.270) in several tasks such as predicting protein interactions in several contexts. Our work demonstrates how multimodal transformers can be used to integrate heterogeneous sources of information, paving the foundation for future approaches that use multiple modalities for biomedical applications.

Download full text files

Export metadata

Additional Services

Search Google Scholar Check availability


Show usage statistics
Document Type:Conference Object
Author:Helena Balabin, Charles Tapley Hoyt, Benjamin M. Gyori, John Bachman, Alpha Tom Kodamullil, Martin Hofmann-Apitius, Daniel Domingo-Fernández
Parent Title (English):SWAT4HCLS 2022: Semantic Web Applications and Tools for Health Care and Life Sciences. 13th International Conference on Semantic Web Applications and Tools for Health Care and Life Sciences, Virtual Event, Leiden, Netherlands, January 10th to 14th, 2022
Number of pages:5
First Page:103
Last Page:107
Publisher:RWTH Aachen
Place of publication:Aachen, Germany
Publishing Institution:Hochschule Bonn-Rhein-Sieg
Date of first publication:2022/04/21
Copyright:Copyright © 2022 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Keyword:Bioinformatics; Knowledge Graphs; Machine Learning; Natural Language Processing; Transformers
Departments, institutes and facilities:Fachbereich Informatik
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 006 Spezielle Computerverfahren
Entry in this database:2022/05/05
Licence (German):License LogoCreative Commons - CC BY - Namensnennung 4.0 International