University of Sussex
Browse

Unsupervised induction of Arabic root and pattern lexicons using machine learning

Download (229.68 kB)
presentation
posted on 2023-06-08, 20:35 authored by Bilal Khaliq, John Carroll
We describe an approach to building a morphological analyser of Arabic by inducing a lexicon of root and pattern templates from an unannotated corpus. Using maximum entropy modelling, we capture orthographic features from surface words, and cluster the words based on the similarity of their possible roots or patterns. From these clusters, we extract root and pattern lexicons, which allows us to morphologically analyse words. Further enhancements are applied, adjusting for morpheme length and structure. Final root extraction accuracy of 87.2% is achieved. In contrast to previous work on unsupervised learning of Arabic morphology, our approach is applicable to naturally-written, unvowelled Arabic text.

History

Publication status

  • Published

Page range

350-356

Presentation Type

  • paper

Event name

International conference recent advances in natural language processing (RANLP)

Event location

Hissar, Bulgaria

Event type

conference

Event date

7-13 September 2013

Department affiliated with

  • Informatics Publications

Full text available

  • Yes

Peer reviewed?

  • Yes

Legacy Posted Date

2015-04-24

Usage metrics

    University of Sussex (Publications)

    Categories

    No categories selected

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC