Named entity recognition using framenet
Abstract
This project presents the classification of unknown words in English using Jaccard
similarity based on the distributional of similar words. Information from FrameNet, a
lexical database is utilized to determine the class of unknown words. The importance of
unknown words classification is to recognize the meaning of unknown words in Natural
Processing Language (NPL) systems. First, a program is designed to extract the words
from a sequences of sentences which taken from news articles. Next, a program is
designed a measure of Jaccard similarity with C programming language to compare
words similarity between the words around unknown word which extracted from
Document Understanding Conference (DUC) data and the words around the example
sentences from FrameNet lexical database using Jaccard coefficient. The similarity
measures the occurrence of words around the unknown word with all example sentences
from FrameNet. Finally, the performance of this proposed similarity measurement
method is evaluated by measuring the precision, recall, and f-measure of the unknown
word identification. Furthermore, the test results presented the advantage and
disadvantages of the proposed similarity measurement method that applied to classify
the English unknown word based on the distributional of similar words.