Unsupervised induction of Arabic root and pattern lexicons using machine learning

Khaliq, Bilal; Carroll, John

Unsupervised induction of Arabic root and pattern lexicons using machine learning

presentation

posted on 2023-06-08, 20:35 authored by Bilal Khaliq, John Carroll

We describe an approach to building a morphological analyser of Arabic by inducing a lexicon of root and pattern templates from an unannotated corpus. Using maximum entropy modelling, we capture orthographic features from surface words, and cluster the words based on the similarity of their possible roots or patterns. From these clusters, we extract root and pattern lexicons, which allows us to morphologically analyse words. Further enhancements are applied, adjusting for morpheme length and structure. Final root extraction accuracy of 87.2% is achieved. In contrast to previous work on unsupervised learning of Arabic morphology, our approach is applicable to naturally-written, unvowelled Arabic text.

History

Publication status

Published

Publisher URL

http://aclweb.org/anthology/R13-1045

Page range

350-356

Presentation Type

paper

Event name

International conference recent advances in natural language processing (RANLP)

Event location

Hissar, Bulgaria

Event type

conference

Event date

7-13 September 2013

Department affiliated with

Informatics Publications

Full text available

Yes

Peer reviewed?

Yes

Legacy Posted Date

2015-04-24

Unsupervised induction of Arabic root and pattern lexicons using machine learning

History

Publication status

Publisher URL

Page range

Presentation Type

Event name

Event location

Event type

Event date

Department affiliated with

Full text available

Peer reviewed?

Legacy Posted Date

Usage metrics

Categories

Keywords

Licence

Exports