Visual Latent Captioning - Towards Verbalizing Vision Transformer Encoders
Document Type: | Conference Object |
---|---|
Language: | English |
Author: | Sogol Haghighat, Tim Daniel Metzler, Santosh Thoduka, Sebastian Houben |
Parent Title (English): | Hauff, Macdonald et al. (Eds.): Advances in Information Retrieval. 47th European Conference on Infor |
Number of pages: | 14 |
First Page: | 393 |
Last Page: | 406 |
ISBN: | 978-3-031-88710-9 |
DOI: | https://doi.org/10.1007/978-3-031-88711-6_25 |
Publisher: | Springer |
Place of publication: | Cham |
Date of first publication: | 2025/04/04 |
Copyright: | © 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG |
Keywords: | Interpretability; Large Language Models; Multimodal Models; Transformer Vision Encoder; Vision-Language Models |
Departments, institutes and facilities: | Fachbereich Informatik |
Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE) | |
Institut für KI und Autonome Systeme (A2S) | |
Dewey Decimal Classification (DDC): | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 006 Spezielle Computerverfahren |
Entry in this database: | 2025/04/22 |