Visual Latent Captioning - Towards Verbalizing Vision Transformer Encoders
| Document Type: | Conference Object |
|---|---|
| Language: | English |
| Author: | Sogol Haghighat, Tim Daniel Metzler, Santosh Thoduka, Sebastian Houben |
| Parent Title (English): | Hauff, Macdonald et al. (Eds.): Advances in Information Retrieval. 47th European Conference on Infor |
| Number of pages: | 14 |
| First Page: | 393 |
| Last Page: | 406 |
| ISBN: | 978-3-031-88710-9 |
| DOI: | https://doi.org/10.1007/978-3-031-88711-6_25 |
| Publisher: | Springer |
| Place of publication: | Cham |
| Date of first publication: | 2025/04/04 |
| Copyright: | © 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG |
| Tag: | Interpretability; Large Language Models; Multimodal Models; Transformer Vision Encoder; Vision-Language Models |
| Departments, institutes and facilities: | Fachbereich Informatik |
| Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE) | |
| Institut für KI und Autonome Systeme (A2S) | |
| Dewey Decimal Classification (DDC): | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 006 Spezielle Computerverfahren |
| Entry in this database: | 2025/04/22 |


