Computer and Modernization ›› 2012, Vol. 198 ›› Issue (2): 128-130.doi: 10.3969/j.issn.1006-2475.2012.02.034

• 网络与通信 • Previous Articles     Next Articles

A Method of Web Page Purification Based on Single Model

GAN Wen-min1, LI Jun1, LI Jian2   

  1. 1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 210016, China;2. Battle Laboratory, Nanchang Army College, Nanchang 330103, China
  • Received:2011-10-21 Revised:1900-01-01 Online:2012-02-24 Published:2012-02-24

Abstract: In order to obtain and handle with the information in Web pages effectively, this paper proposes the algorithm of Web page purification based on improved DOM tree and BP neural network.This algorithm establishes block tree by DOM tree and Web content using HTMLParser.Because of the evident numerical characteristics in subblocks of Webpages, it can establish noisy purifymodel by BP neural network. As a result, it can make the Webpage purification more modelling, also it can get a more effective result.

Key words: Web page purification, DOM tree, content block, neural network

CLC Number: