File(s) not publicly available
Integrating character representations into Chinese word embedding
chapter
posted on 2023-06-09, 05:16 authored by Xingyuan Chen, Peng Jin, Diana Frances McCarthy, John CarrollIn this paper we propose a novel word representation for Chinese based on a state-of-the-art word embedding approach. Our main contribution is to integrate distributional representations of Chinese characters into the word embedding. Recent related work on European languages has demonstrated that information from inflectional morphology can reduce the problem of sparse data and improve word representations. Chinese has very little inflectional morphology, but there is potential for incorporating character-level information. Chinese characters are drawn from a fixed set – with just under four thousand in common usage – but a major problem with using characters is their ambiguity. In order to address this problem, we disambiguate the characters according to groupings in a semantic hierarchy. Coupling our character embeddings with word embeddings, we observe improved performance on the tasks of finding synonyms and rating word similarity compared to a model using word embeddings alone, especially for low frequency words.
History
Publication status
- Published
Publisher
Springer International PublishingExternal DOI
Volume
10085Page range
335-349Pages
15.0Book title
Chinese lexical semantics: 17th workshop, CLSW 2016, Singapore, Singapore, May 20–22, 2016, revised selected papersISBN
9783319495071Series
Lecture notes in computer scienceDepartment affiliated with
- Informatics Publications
Research groups affiliated with
- Data Science Research Group Publications
Full text available
- No
Peer reviewed?
- Yes