Computer and Modernization

Previous Articles     Next Articles

Text Topic Extraction Based on Doublelinguisticfilter

  

  1. 1. China Mobile Group Guangdong Co. Ltd., Guangzhou 510006, China;

    2. School of Software Engineering, South China University of Technology, Guangzhou 510006, China
  • Received:2015-09-06 Online:2015-12-23 Published:2015-12-30

Abstract: The technology of text topic extraction is widely applied to refine the text information. Since the Chinese text is made up of base Chinese words, which contains trivial semantic information, the methods of using the words to express the semantic information of short text is not promised in applications. In contrast, Chinese phrases contain rich finegrained semantic information and they are preferred to be the representatives of topic of text. Therefore, this paper proposed a method of doublelinguisticfilter (lexical category filter and phraseextending filter) to weed out the redundant information and extract topic phrases from text. The phrase results are close to the refined semantic expression of text. The experimental result shows that the method we proposed can obtain reliable results, and the method would indicate other new methods on text mining.

Key words: phrase extraction, information extraction, rule mining

CLC Number: