Phrase has been considered as a more informative feature term for improving the effectiveness of document clustering. In this work, we propose a phrase-based document similarity to compute the pair wise similarities of documents based on the Suffix Tree Document (STD) model. By mapping each node in the suffix tree of STD model into a unique feature term in the Vector Space Document (VSD) model, the phrasebased document similarity naturally inherits the term tf-idf weighting scheme in computing the document similarity with phrases. This work apply the phrase-based document similarity to affinity propagation for better document clustering as much lower error than other methods, and less amount of time. ...
Authors: Ashish Saxena, Aditi Chaturvedi.