This work investigates the variation in a word's distributionally nearest neighbours with respect to the similarity measure used. We identify one type of variation as being the relative frequency of the neighbour words with respect to the frequency of the target word. We then demonstrate a three-way connection between relative frequency of similar words, a concept of distributional gnerality and the semantic relation of hyponymy. Finally, we consider the impact that this has on one application of distributional similarity methods (judging the compositionality of collocations).
Proceedings of the 20th International Conference on Computational Linguistics
Event location
Geneva, Switzerland
Event type
conference
ISBN
1-932-43248-5
Department affiliated with
Informatics Publications
Notes
Originality: This paper is the first to demonstrates a three-way connection between relative frequency of a similar words, a concept of distributional generality, and the semantic relation of hyponymy. Rigour: The proposals are precisely formulated, and extensive experimental evaluations are performed. Significance: This paper is seen as the first place where a systematic investigation of the relationship between lexical distributional similarity and its relationship to the notion of hyponymy has been considered. As research in the area of context-based semantics gathers pace (for example, in connection with the so-called 'textual entailment' problem) the issue of how to determine lexical semantic relationships such as hyponymy based on context is becoming of increasing importance. Impact: In two papers submitted to leading journals that I have recently reviewed, leading researchers exploring the so-called 'textual entailment' problem explicitly adopt the approach introduced in this paper in their work. Total citations in Google Scholar for this paper are 15.