计算机与现代化 ›› 2012, Vol. 1 ›› Issue (9): 222-224.doi: 10.3969/j.issn.1006-2475.2012.09.058

• 应用与开发 • 上一篇    下一篇

基于XML的Web信息数据库的建立

黄昱阳1,李慧伦2   

  1. 1.华南理工大学生物科学与工程学学院,广东广州510006; 2.山东理工大学生命科学院,山东淄博255012
  • 收稿日期:2012-04-17 修回日期:1900-01-01 出版日期:2012-09-21 发布日期:2012-09-21

Construction of Web Database Based on XML

HUANG Yu-yang1, LI Hui-lun2   

  1. 1. School of Bioscience and Bioengineering, South China University of Technology, Guangzhou 510006, China;2. School of Life Sciences, Shandong University of Technology, Zibo 255012, China
  • Received:2012-04-17 Revised:1900-01-01 Online:2012-09-21 Published:2012-09-21

摘要: 为了有效地从Web页面上提取数据信息,本文建立一种基于XML的Web信息收集数据库。利用开源工具JTidy将Web页面加以整理,利用XML良好的结构特性,使用Dom4j工具包解析XML文件;按照XML中的标签层次特点作为对数据进行储存的依据;最后使用Hibernate将数据持久化地储存于数据库中,方便数据的储存与查询。

关键词: XML, Web, 信息挖掘, 数据库

Abstract: In order to extract information and data from Web pages effectively, this paper constructs a database used for collecting data based on XML. The HTML documents are transformed to XHTML and analyzed by the open-source tools JTidy and Dom4j. Data are extracted and saved based on the tag characteristics of XML documents. Finally the data are persisted in the database by the ORM tool-Hibernate.

Key words: XML, Web, data mining, database