Skip to content

Research

My group works across five themes. Open-source projects and recent talks are below.

Themes

NLP for Icelandic and other Germanic languages

Language models, datasets, and evaluation resources for Icelandic, Faroese, and other under-resourced Germanic languages. The IceBERT family, FoQA, the gender-bias evaluation set, the Hotter and Colder sentiment corpus, and a strand of work on prompt engineering and fine-tuning for low-resource machine translation.

Related publications →

Computer vision for natural and agricultural sciences

Applied vision in collaboration with the Marine and Freshwater Research Institute, agricultural science groups, and food-science partners. Few-shot otolith aging, multispectral nematode segmentation in Atlantic cod, gait classification for five-gaited horses, drought-response analysis on subarctic grasslands, and vision-language features for automated fish monitoring.

Related publications →

Clinical AI and cardiology

Joint work with the Icelandic National Hospital on automated clinical coding (ICD-10 from discharge summaries with Icelandic BERT models) and on the Icelandic Heart Failure Registry. Several international surveys on heart-failure management with preserved ejection fraction (HFpEF).

Related publications →

Human genetics and computational biology

Research-scientist position at deCODE genetics. Genome-wide-association work on common conditions where the analysis intersects with NLP-style methods, including a Nature Communications paper on BMI-associated variants and disease risk.

Related publications →

Computational neuroscience

PhD-era and ongoing work on emergent behaviour in models of cortical networks. Bootstrap percolation with inhibition, fast Hebbian spike-latency normalisation, and synfire-chain emergence under spike-timing-dependent plasticity.

Related publications →

Open-source projects

Datasets, models, and tools released by the group. Most live on Hugging Face or GitHub; see the linked publication for context.

FoQA

active

The first dedicated question-answering dataset for Faroese, with extractive and generative variants. Built through translation and adaptation of existing QA benchmarks combined with native-speaker validation, and accompanied by baseline evaluations of multilingual and Faroese-tuned language models. Released alongside the RESOURCEFUL 2025 paper.

GameQA

maintenance

A gamified mobile-app platform for crowdsourcing multi-domain question-answering datasets. Demonstrated at EACL 2023 System Demonstrations and used to collect the RUQuAD-1 reading-comprehension dataset for Icelandic.

Hotter and Colder

maintenance

An Icelandic sentiment corpus of blog comments annotated with nuanced labels for sentiment, emotions, toxicity, sarcasm, hate speech, sympathy, and related categories. Distributed via the University of Iceland, with the companion methodology paper at NoDaLiDa / Baltic-HLT 2025.

IceBERT family

maintenance

A family of Icelandic BERT-style language models pre-trained on the Icelandic Crawled Corpus. Released alongside the LREC 2022 paper with Miðeind, the IceBERT checkpoints have become the foundation for much of the subsequent Icelandic NLP work in our group and elsewhere.

MazeEval

active

A benchmark for testing sequential decision-making in language models through navigation tasks in procedurally generated mazes. Single-author preprint with the benchmark released alongside the paper.

MIM-GOLD-EL

maintenance

An entity-linking corpus for Icelandic built on top of the MIM-GOLD named-entity collection. Released with the University of Iceland and introduced at LREC 2022; the companion methodology paper appeared at the Dataset Creation for Lower-Resourced Languages workshop.

Icelandic WinoGrande

maintenance

An Icelandic adaptation of the WinoGrande commonsense-reasoning benchmark, built to evaluate Icelandic language models on pronoun disambiguation and world-knowledge tasks. Released by Miðeind ehf.

NQiI - Natural Questions in Icelandic

maintenance

An open-domain question-answering dataset for Icelandic, adapted from the English Natural Questions benchmark. Released in two versions (v1.0 at LREC 2022 and an updated v1.1) and distributed through the University of Iceland for evaluating Icelandic QA systems.

RUQuAD-1

maintenance

The Reykjavik University Question-Answering Dataset for Icelandic, a SQuAD-style reading-comprehension benchmark collected through the GameQA mobile platform. Distributed via the University of Iceland.

Talks & outreach

  • 2025

    Teachers' attitudes and perceptions of AI usefulness in academia

    ICERI 2025 · workshop

  • 2025

    University of Iceland teachers' attitudes and experience with generative AI

    Menntakvika 2025, Reykjavik, Iceland · workshop

  • 2023

    Talk at the University of Akureyri teaching conference

    University of Akureyri teaching conference, Akureyri, Iceland · invited

  • 2023

    Talk at the Icelandic teaching academy conference

    Icelandic Teaching Academy conference · invited

  • 2023

    Talk on AI and ethics

    UNESCO forum on AI and Ethics · outreach

  • 2022

    UTmessan 2022

    UTmessan, Reykjavik, Iceland · outreach

  • 2021

    Nationally broadcast interview on AI

    RÚV (Icelandic National Broadcaster), Reykjavik, Iceland · outreach

  • 2021

    UTmessan 2021

    UTmessan, Reykjavik, Iceland · outreach