计算机与现代化 ›› 2013, Vol. 1 ›› Issue (5): 172-175.doi: 10.3969/j.issn.1006-2475.2013.05.041

• 应用与开发 • 上一篇    下一篇

基于本体的Web生物信息抽取方法研究

何源   

  1. 湖南农业大学信息科学技术学院,湖南长沙410128
  • 收稿日期:2013-01-06 修回日期:1900-01-01 出版日期:2013-05-28 发布日期:2013-05-28

Research on Web Biological Information Extraction Method Based on Ontology

HE Yuan   

  1. School of Information Science and Technology, HNAU, Changsha 410128, China
  • Received:2013-01-06 Revised:1900-01-01 Online:2013-05-28 Published:2013-05-28

摘要: 针对传统的基于关键词的搜索与数据检索存在的弊端,本文提出基于本体的Web信息抽取框架。该框架首先获取Web页面,将其转换为格式良好的HTML文档,然后利用HTML解析器将该文档转化为DOM树,再根据XPath表达式获取用户感兴趣的数据块,由此生成抽取规则,最后通过OntPMatch算法实现数据的抽取,并以RDF数据格式储存信息。本文以棉花信息为研究对象加以实证研究,实现Web生物信息数据抽取原型系统,为方便用户发现有价值的Web生物信息资源提供一个有效的工具。

关键词: 本体, Web, 信息抽取

Abstract: Aiming at the malpractice in traditional search field based on keyword and data retrieval, this paper proposes a Web information extraction framework based on ontology. Firstly, the framework obtains the Web page which is converted into a wellformed HTML document, secondly, the document is turned into the DOM tree by making use of the HTML parser, then, the extraction rules is achieved on the basis of the users’ interest data block which is obtained according to the XPath expression. Finally, the data is extracted through the OntPMatch algorithm, and is stored in RDF data format. The paper makes the empirical study using the cotton information as research object, and realizes a prototype system of extracting biological information data. The paper provides a useful tool for users to obtain valuable biological information from Web.

Key words: ontology, Web, information extraction

中图分类号: