Computer and Modernization ›› 2017, Vol. 0 ›› Issue (11): 13-16+61.doi: 10.3969/j.issn.1006-2475.2017.11.003

Previous Articles     Next Articles

Text Categorization Based on Graph Kernel

  

  1. (Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China)
  • Received:2016-11-24 Online:2017-11-21 Published:2017-11-21

Abstract: In text classification, vector space model has the characteristic of simple representation, but only represents frequency information of feature word and ignores the structural information and semantic information of word order between words, which may lead to different documents to be represented as vectors of the same. In view of this problem, this paper uses the graph structure model to represent text, and a text is represented as a directed graph (abbreviated as text graph), which effectively solves the problem of the lack of structured information. In this paper, the graph kernel technique is applied to text classification, and a graph kernel algorithm, which is suitable for the computation of the similarity between text graphs, is proposed. Then support vector machine is used to classify the texts. The experimental results on the text set show that compared with the vector space model, the classification accuracy of interval walk kernel is better than other kernel functions, so it is a good graph structure similarity calculation algorithm and it can be widely used in text classification.

Key words: graph structure, vector space model, gap walk kernel, support vector machine, text categorization

CLC Number: