13–15 Oct 2025
Tagungszentrum Alte Mensa Göttingen
Europe/Berlin timezone

Learning Task-Driven Scan Paths with Log‑Polar Foveated Networks

13 Oct 2025, 17:30
1h 30m
Emmy-Noether-Saal (Veranstaltungszentrum Alte Mensa)

Emmy-Noether-Saal

Veranstaltungszentrum Alte Mensa

Wilhelmsplatz 3, 37073 Göttingen
Poster presentation Poster session Poster session with wine and snacks

Speaker

Valentin Hassler (UMIN)

Description

Understanding how an observer decides where to look is central to both human-vision research and machine perception. We tackle this question by training a neural network that produces artificial scan paths while performing tasks such as classification, visual search, and counting. At each fixation, the network receives a log-polar, foveated view of the image, which retains high-resolution detail at the point of gaze and compresses the periphery. This process mirrors the acuity of the retina. A controller proposes the next fixation, and a task solver integrates the resulting glimpses to answer the query. The controller can be trained with standard back-propagation or, in an alternative configuration, entirely with reinforcement learning (RL).

On an MNIST-derived benchmark that places digits on a 224 × 224 canvas with distractor strokes, the model matches full-vision accuracy (99%) when the midpoint of the target digit is supplied as an oracle fixation. The same accuracy is maintained when the controller learns its own fixation sequence, and RL offers a small additional benefit. Under similar perceptual constraints the model achieves 54 % top-1 accuracy on ImageNet. Qualitative inspection shows that the learned fixations cluster around semantically informative regions. Ongoing experiments add an explicit novelty reward to study how curiosity incentives reshape exploration behaviour and downstream performance.

Author

Valentin Hassler (UMIN)

Co-author

Prof. Alexander Ecker (UMIN)

Presentation materials

There are no materials yet.