Refine
Departments, institutes and facilities
- Fachbereich Informatik (50)
- Fachbereich Ingenieurwissenschaften und Kommunikation (22)
- Institut für Technik, Ressourcenschonung und Energieeffizienz (TREE) (17)
- Fachbereich Angewandte Naturwissenschaften (9)
- Fachbereich Wirtschaftswissenschaften (6)
- Institut für Cyber Security & Privacy (ICSP) (4)
- Internationales Zentrum für Nachhaltige Entwicklung (IZNE) (2)
- Fachbereich Sozialpolitik und Soziale Sicherung (1)
- Institut für Medienentwicklung und -analyse (IMEA) (1)
- Institut für funktionale Gen-Analytik (IFGA) (1)
Document Type
- Preprint (90) (remove)
Year of publication
Keywords
- Evolutionary Computation (2)
- FOS: Computer and information sciences (2)
- burnout (2)
- inborn error of metabolism (2)
- ketone body (2)
- lignin (2)
- metabolic acidosis (2)
- metabolic decompensation (2)
- organic aciduria (2)
- psychological detachment (2)
In vision tasks, a larger effective receptive field (ERF) is associated with better performance. While attention natively supports global context, convolution requires multiple stacked layers and a hierarchical structure for large context. In this work, we extend Hyena, a convolution-based attention replacement, from causal sequences to the non-causal two-dimensional image space. We scale the Hyena convolution kernels beyond the feature map size up to 191$\times$191 to maximize the ERF while maintaining sub-quadratic complexity in the number of pixels. We integrate our two-dimensional Hyena, HyenaPixel, and bidirectional Hyena into the MetaFormer framework. For image categorization, HyenaPixel and bidirectional Hyena achieve a competitive ImageNet-1k top-1 accuracy of 83.0% and 83.5%, respectively, while outperforming other large-kernel networks. Combining HyenaPixel with attention further increases accuracy to 83.6%. We attribute the success of attention to the lack of spatial bias in later stages and support this finding with bidirectional Hyena.