Hafsteinn Einarsson

Associate Professor, University of Iceland
Research Scientist, Amgen deCODE

I lead a research group working on natural-language processing for low-resource Germanic languages — Icelandic and Faroese — with active projects in computer vision for the natural sciences, clinical AI, and human genetics.

News

2026-05
Group presenting seven papers at LREC 2026 in Palma de Mallorca.
2026-04
New paper on temporal vision-language features for fish classification (Ecological Informatics).
2026-02
Microsoft Lingua grant funded: developing LLM red-teaming datasets for Icelandic.
2025-09
MazeEval preprint: a benchmark for sequential decision-making in language models.
2025-05
FoQA: A Faroese question-answering dataset (RESOURCEFUL 2025).

Selected publications

All publications →

Machine Learning · Natural Sciences Selected

Temporal aggregation of vision-language features for high-accuracy fish classification in automated monitoring

Silva Martins, J. R., Bárðarson, H., Guðbrandsson, J., Einarsson, H. · Ecological Informatics · 2025

Aggregates temporal vision-language features from underwater video to classify fish species in automated monitoring streams, combining per-frame embeddings with temporal pooling to handle the noise and occlusion that limit frame-by-frame classifiers in deployed systems. Published in Ecological Informatics (2025) as part of the lab's ongoing computer-vision work with the Marine and Freshwater Research Institute.

NLP for Icelandic & Germanic Selected

MazeEval: a benchmark for testing sequential decision-making in language models

Einarsson, H. · arXiv preprint arXiv:2507.20395 · 2025

MazeEval is a benchmark for testing sequential decision-making in language models through navigation tasks in procedurally generated mazes. The setup probes whether models can maintain a coherent internal map across multiple turns of interaction rather than relying on local pattern matching. Single-author preprint released on arXiv (2025).

arXiv →

NLP for Icelandic & Germanic Selected

FoQA: A Faroese question-answering dataset

Simonsen, A., Nielsen, D. S., Einarsson, H. · Proceedings of the third workshop on resources and representations for under-resourced languages and domains (RESOURCEFUL-2025) · 2025

Introduces FoQA, the first dedicated question-answering dataset for Faroese, with extractive and generative variants. The dataset is constructed via translation and adaptation of existing QA benchmarks combined with native-speaker validation, and is accompanied by baseline evaluations of multilingual and Faroese- tuned language models. Establishes a reference benchmark for Faroese reading comprehension. RESOURCEFUL 2025 (NoDaLiDa co-located workshop).

PDF →