Volltext-Downloads (blau) und Frontdoor-Views (grau)
  • search hit 42 of 111
Back to Result List

OntoClue, a framework to compare vector-based approaches for document relatedness using the RELISH corpus

  • The continuous increase of biomedical scholarly publications makes it challenging to construct document recommendation algorithms to navigate through literature, an important feature for researchers to keep up with relevant publications. Understanding semantic relatedness and similarity between two documents could improve document recommendations. The objective of this study is performing a comparative analysis of vector-based approaches to assess document similarity in the RELISH corpus. Here we present our approach to compare five different techniques to generate vectors representing the text in the documents. These techniques employ a combination of various Natural Language Processing frameworks such as Word2Vec, Doc2Vec, dictionary-based Named Entity Recognition as well as state-of-the-art models based on BERT.

Download full text files

Export metadata

Additional Services

Search Google Scholar Check availability

Statistics

Show usage statistics
Metadaten
Document Type:Conference Object
Language:English
Author:Rohitha Ravinder, Tim Fellerhoff, Vishnu Dadi, Lukas Geist, Guillermo Rocamora, Muhammad Talha, Dietrich Rebholz-Schuhmann, Leyla Jael Castro
Parent Title (English):Proceedings Semantic Web Applications and Tools for Healthcare and Life Sciences, February 13–16, 2023, Basel, Switzerland
Number of pages:2
First Page:159
Last Page:160
ISSN:1613-0073
URN:urn:nbn:de:hbz:1044-opus-73854
URL:https://ceur-ws.org/Vol-3415/#paper-38
URL:https://nbn-resolving.org/urn:nbn:de:0074-3415-0
Publisher:RWTH Aachen
Place of publication:Aachen, Germany
Publishing Institution:Hochschule Bonn-Rhein-Sieg
Date of first publication:2023/06/22
Copyright:© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
Funding:This work was partially supported by the STELLA project funded by DFG (project no. 407518790), the NFDI4DataScience project funded by GWK and DFG (no. NFDI 34/1), and the BMBF-funded de.NBI Cloud within the German Network for Bioinformatics Infrastructure (de.NBI) (031A532B, 031A533A, 031A533B, 031A534A, 031A535A, 031A537A, 031A537B, 031A537C, 031A537D, 031A538A)
Keyword:Named Entity Recognition; Word embeddings; document similarity
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 004 Datenverarbeitung; Informatik
Entry in this database:2023/07/04
Licence (German):License LogoCreative Commons - CC BY - Namensnennung 4.0 International