Computer and Modernization ›› 2014, Vol. 0 ›› Issue (4): 77-80.

Previous Articles     Next Articles

 Research and Design on Topical Crawler Based on Analysis of Content and Link

  

  1. School of Computer Science and Technology, Anhui University of Science and Technology, Huainan 232001, China
  • Received:2013-12-10 Online:2014-04-17 Published:2014-04-23

Abstract:  

 Abstract:  In the aspect of grasping the topical webpage to the existing topical crawler algorithm, its accuracy is not high. This paper presents a topical webpage grasping method which based on evaluation of text content and webpage link. First it calculates the correlation of current webpage and theme, and then compares the correlation values with a given threshold to determine the current webpage is discarded or stored. At the same time the size of the correlation value also determines the priority of URL in the climbing link queue, this model takes into account the balance of topical webpage between accuracy and coverage. In the aspect of grasping topical webpage to design the new topical crawler algorithm, its accuracy has been improved to some extent.

Key words:

CLC Number: