ASOBEK: Twitter paraphrase identification with simple overlap features and SVMs
chapter
posted on 2023-06-09, 01:51authored byAsli Eyecioglu, Bill Keller
We present an approach to identifying Twitter paraphrases using simple lexical over-lap features. The work is part of ongoing re-search into the applicability of knowledge-lean techniques to paraphrase identification. We utilize features based on overlap of word and character n-grams and train support vector machine (SVM). Our results demonstrate that character and word level overlap features in combination can give performance comparable to methods employing more sophisticated NLP processing tools and external resources. We achieve the highest F-score for identifying paraphrases on the Twitter Paraphrase Corpus as part of the SemEval-2015 Task1.
History
Publication status
Published
File Version
Published version
Journal
Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015)