University of Sussex
Browse

DeformAr: rethinking NER evaluation through component analysis and visual analytics : a comparative case study of Arabic and English NER

Download (62.08 MB)
thesis
posted on 2025-11-28, 12:18 authored by Ahmed Mostafa Mohamed Ahmed YounesAhmed Mostafa Mohamed Ahmed Younes
<p dir="ltr">Transformer models have significantly advanced Natural Language Processing (NLP), demonstrating strong performance in English. However, their effectiveness in Arabic particularly for Named Entity Recognition (NER) remains limited, even when Arabic models are pre-trained on larger datasets. This performance gap may be attributed to multiple factors, including tokenisation, dataset quality, and annotation inconsistencies. While previous studies have examined these issues individually, analysing them in isolation makes it difficult to understand how they interact and jointly affect system behaviour and performance.</p><p dir="ltr">This thesis introduces DeformAr (Debugging and Evaluation Framework for Transformer-based NER Systems), a framework designed to investigate the performance discrepancy between Arabic and English NER systems and explore the factors behind this gap. DeformAr integrates a data extraction library and an interactive dashboard supporting two modes of evaluation: cross-component analysis and behavioural analysis. The framework divides each language into dataset and model components and examines interactions between them. During the feature examination phase, we select subcomponents for analysis based on their expected contribution to performance variation.</p><p dir="ltr">The analysis proceeds in two stages. First, cross-component analysis provides systematic and behavioural diagnostic measures across both data and model subcomponents, addressing the “what,” “how,” and parts of the “why” behind observed performance discrepancies. Building on these findings, the second stage applies behavioural analysis by combining interpretability techniques with token-level metrics, interactive visualisations, and representation space analysis. Together, these stages enable a component-aware diagnostic process that not only detects model behaviours but also explains them by linking them to underlying representational patterns and data-related factors. To our knowledge, this is the first Arabic-specific, component-based interpretability tool offering a novel resource for advancing model analysis in under-resourced languages.</p><p dir="ltr"><br></p>

History

File Version

  • Published version

Pages

334

Department affiliated with

  • Informatics Theses

Qualification level

  • doctoral

Qualification name

  • phd

Language

  • eng

Institution

University of Sussex

Full text available

  • Yes

Supervisor

Julie Weeds and David Weir

Usage metrics

    University of Sussex (Theses)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC