HyenaPixel: Global Image Context with Convolutions
- In computer vision, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, its quadratic complexity limits its applicability to tasks that benefit from high-resolution input. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to bidirectional data and two-dimensional image space. We scale Hyena’s convolution kernels beyond the feature map size, up to 191×191, to maximize ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 84.9% and 85.2%, respectively, with no additional training data, while outperforming other convolutional and large-kernel networks. Combining HyenaPixel with attention further improves accuracy. We attribute the success of bidirectional Hyena to learning the data-dependent geometric arrangement of pixels without a fixed neighborhood definition. Experimental results on downstream tasks suggest that HyenaPixel with large filters and a fixed neighborhood leads to better localization performance.
Document Type: | Conference Object |
---|---|
Language: | English |
Author: | Julian Spravil, Sebastian Houben, Sven Behnke |
Parent Title (English): | Endriss, Melo et al. (Eds.): ECAI 2024, 27th European Conference on Artificial Intelligence, 19–24 October 2024, Santiago de Compostela, Spain |
Number of pages: | 8 |
First Page: | 521 |
Last Page: | 528 |
ISBN: | 978-1-64368-548-9 |
URN: | urn:nbn:de:hbz:1044-opus-86391 |
DOI: | https://doi.org/10.3233/FAIA240529 |
Publisher: | IOS Press |
Place of publication: | Amsterdam |
Publishing Institution: | Hochschule Bonn-Rhein-Sieg |
Date of first publication: | 2024/10/16 |
Copyright: | © 2024 The Authors. This article is published online with Open Access by IOS Press and distributed under the terms of the Creative Commons Attribution Non-Commercial License 4.0 (CC BY-NC 4.0). |
Funding: | This research has been funded by the Federal Ministry of Education and Research of Germany under grant no. 01IS22094C WEST-AI. |
Departments, institutes and facilities: | Fachbereich Informatik |
Institut für KI und Autonome Systeme (A2S) | |
Dewey Decimal Classification (DDC): | 0 Informatik, Informationswissenschaft, allgemeine Werke / 00 Informatik, Wissen, Systeme / 006 Spezielle Computerverfahren |
Entry in this database: | 2024/10/23 |
Licence (German): | Creative Commons - CC BY-NC - Namensnennung - Nicht kommerziell 4.0 International |