Finding predominant word senses in untagged text

McCarthy, Diana Frances; Koeling, Rob; Weeds, Julie; Carroll, John

senseranks.pdf (80.94 kB)

Finding predominant word senses in untagged text

presentation

posted on 2023-06-07, 13:59 authored by Diana Frances McCarthy, Rob Koeling, Julie WeedsJulie Weeds, John Carroll

In word sense disambiguation (WSD), the heuristic of choosing the most common sense is extremely powerful because the distribution of the senses of a word is often skewed. The problem with using the predominant, or first sense heuristic, aside from the fact that it does not take surrounding context into account, is that it assumes some quantity of handtagged data. Whilst there are a few hand-tagged corpora available for some languages, one would expect the frequency distribution of the senses of words, particularly topical words, to depend on the genre and domain of the text under consideration. We present work on the use of a thesaurus acquired from raw textual corpora and the WordNet similarity package to find predominant noun senses automatically. The acquired predominant senses give a precision of 64% on the nouns of the SENSEVAL- 2 English all-words task. This is a very promising result given that our method does not require any hand-tagged text, such as SemCor. Furthermore, we demonstrate that our method discovers appropriate predominant senses for words from two domainspecific corpora.

History

Publication status

Published

External DOI

http://dx.doi.org/10.3115/1218955.1218991

Pages

7.0

Presentation Type

paper

Event name

42nd Annual Meeting of the Association for Computational Linguistics

Event location

Barcelona, Spain

Event type

conference

Department affiliated with

Informatics Publications

Notes

Publisher's version available freely at the official url. Originality: Description of a novel, unsupervised method for acquiring information about predominant senses of words - for use as priors in word sense disambiguation - with accuracy approaching that of a supervised technique. Rigour: Method evaluated on the standard word sense disambiguation data; also indicative results for domain-specific text from the Reuters corpus. Significance: Likely to become the backoff method of choice for sense disambiguation of text in specific domains, and for languages other than English. Outlet/citations: Best Paper Award at the most prestigious annual international conference on natural language processing. First such award to a paper on unsupervised learning. Google Scholar 41 citations.

Full text available

Yes

Peer reviewed?

Yes

Legacy Posted Date

2007-07-19

Usage metrics

Keywords

Uncategorised value

Licence

Copyright not evaluated

Exports

RefWorks

BibTeX

Ref. manager

Endnote

DataCite

NLM

DC

Finding predominant word senses in untagged text

History

Publication status

External DOI

Pages

Presentation Type

Event name

Event location

Event type

Department affiliated with

Notes

Full text available

Peer reviewed?

Legacy Posted Date

Usage metrics

Categories

Keywords

Licence

Exports