Volltext-Downloads (blau) und Frontdoor-Views (grau)

Visual Latent Captioning - Towards Verbalizing Vision Transformer Encoders

Export metadata

Additional Services

Search Google Scholar Check availability

Statistics

Show usage statistics
Metadaten
Document Type:Conference Object
Language:English
Author:Sogol Haghighat, Tim Daniel Metzler, Santosh Thoduka, Sebastian Houben
Parent Title (English):Hauff, Macdonald et al. (Eds.): Advances in Information Retrieval. 47th European Conference on Infor
Number of pages:14
First Page:393
Last Page:406
ISBN:978-3-031-88710-9
DOI:https://doi.org/10.1007/978-3-031-88711-6_25
Publisher:Springer
Place of publication:Cham
Date of first publication:2025/04/04
Copyright:© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
Keywords:Interpretability; Large Language Models; Multimodal Models; Transformer Vision Encoder; Vision-Language Models
Departments, institutes and facilities:Fachbereich Informatik
Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE)
Institut für KI und Autonome Systeme (A2S)
Dewey Decimal Classification (DDC):0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 006 Spezielle Computerverfahren
Entry in this database:2025/04/22