<p dir="ltr">Transformer models have significantly advanced Natural Language Processing (NLP), demonstrating strong performance in English. However, their effectiveness in Arabic particularly for Named Entity Recognition (NER) remains limited, even when Arabic models are pre-trained on larger datasets. This performance gap may be attributed to multiple factors, including tokenisation, dataset quality, and annotation inconsistencies. While previous studies have examined these issues individually, analysing them in isolation makes it difficult to understand how they interact and jointly affect system behaviour and performance.</p><p dir="ltr">This thesis introduces DeformAr (Debugging and Evaluation Framework for Transformer-based NER Systems), a framework designed to investigate the performance discrepancy between Arabic and English NER systems and explore the factors behind this gap. DeformAr integrates a data extraction library and an interactive dashboard supporting two modes of evaluation: cross-component analysis and behavioural analysis. The framework divides each language into dataset and model components and examines interactions between them. During the feature examination phase, we select subcomponents for analysis based on their expected contribution to performance variation.</p><p dir="ltr">The analysis proceeds in two stages. First, cross-component analysis provides systematic and behavioural diagnostic measures across both data and model subcomponents, addressing the “what,” “how,” and parts of the “why” behind observed performance discrepancies. Building on these findings, the second stage applies behavioural analysis by combining interpretability techniques with token-level metrics, interactive visualisations, and representation space analysis. Together, these stages enable a component-aware diagnostic process that not only detects model behaviours but also explains them by linking them to underlying representational patterns and data-related factors. To our knowledge, this is the first Arabic-specific, component-based interpretability tool offering a novel resource for advancing model analysis in under-resourced languages.</p><p dir="ltr"><br></p>