TY - CPAPER U1 - Konferenzveröffentlichung A1 - Spravil, Julian A1 - Houben, Sebastian A1 - Behnke, Sven T1 - HyenaPixel: Global Image Context with Convolutions T2 - Endriss, Melo et al. (Eds.): ECAI 2024, 27th European Conference on Artificial Intelligence, 19–24 October 2024, Santiago de Compostela, Spain N2 - In computer vision, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, its quadratic complexity limits its applicability to tasks that benefit from high-resolution input. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to bidirectional data and two-dimensional image space. We scale Hyena’s convolution kernels beyond the feature map size, up to 191×191, to maximize ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 84.9% and 85.2%, respectively, with no additional training data, while outperforming other convolutional and large-kernel networks. Combining HyenaPixel with attention further improves accuracy. We attribute the success of bidirectional Hyena to learning the data-dependent geometric arrangement of pixels without a fixed neighborhood definition. Experimental results on downstream tasks suggest that HyenaPixel with large filters and a fixed neighborhood leads to better localization performance. Y1 - 2024 UN - https://nbn-resolving.org/urn:nbn:de:hbz:1044-opus-86391 SN - 978-1-64368-548-9 SB - 978-1-64368-548-9 U6 - https://doi.org/10.3233/FAIA240529 DO - https://doi.org/10.3233/FAIA240529 SP - 521 EP - 528 S1 - 8 PB - IOS Press CY - Amsterdam ER -