Computer and Modernization

Previous Articles     Next Articles

 An Adaptive Focused Crawling Algorithm Based on Link and Content Analysis

  

  1. College of Computer Science, Chongqing University, Chongqing Key Laboratory of 
     Software Theory & Technology, Chongqing 400044, China
  • Received:2015-03-30 Online:2015-09-21 Published:2015-09-24

Abstract:

The focused crawling is a key technique of focus search engine. To solve the problem of incomplete parameters considering in the On-line Topical Importance
Estimation (OTIE) algorithm, this paper proposes an adaptive algorithm that combines link with content analysis to estimate the priority of unvisited URL in the frontier.
Moreover, we consider the tunneling problem in the process of topical crawling. We select topics and seed pages from the Open Directory Project (ODP) and conduct the comparative
experiments with four crawling algorithms: Best-First, Shark-Search, OTIE and our algorithm. The results of experiment indicate that the proposed method improves the performance
of focused crawler that significantly outperforms the other three algorithms on the average target recall while maintaining an acceptable harvest rate.

Key words:  focused crawler, OTIE algorithm, Shark-Search algorithm, tunneling