Topic recognition of message threads in social networking
Abstract
This project presents the topic recognition of message threads in English using TF-IDF
based on the distribution of similar words. First, a program is designed to extract the
words from a sequence of sentences which taken from news articles. Next, a program is
designed using TF-IDF with C programming language to compare words similarity
between the words around the data and the words around the example sentences from
Facebook, Twitter and blogs using TF-IDF coefficient. The similarity measures the
occurrence of words around the word with all example sentences from collected.
Finally, the performance of this proposed similarity measurement method is evaluated
by measuring the precision, recall, and f-measure of the word identification.
Furthermore, the test results presented the advantage and disadvantages of the proposed
similarity measurement method that applied to classify the English word based on the
distribution of similar words. Overall, the performance of our proposed method is good
for word classification with three word extraction strategy.