计算机与现代化 ›› 2011, Vol. 1 ›› Issue (3): 127-130.doi: 10.3969/j.issn.1006-2475.2011.03.036

• 应用与开发 • 上一篇    下一篇

通用格式的Lucene文档解析器框架的构建

李 浩   

  1. 华南师范大学计算机学院,广东 广州 510631
  • 收稿日期:2010-11-09 修回日期:1900-01-01 出版日期:2011-03-18 发布日期:2011-03-18

Construction of Common Document Parser Framework for Lucene

LI Hao   

  1. School of Computer Science, South China Normal University, Guangzhou 510631, China
  • Received:2010-11-09 Revised:1900-01-01 Online:2011-03-18 Published:2011-03-18

摘要: Lucene是一款优秀的开源全文本搜索技术框架。首先介绍高性能的全文检索工具Lucene,详细分析Lucene的系统结构、程序运行逻辑和各个模块功能,以及在Lucene上的扩展;然后针对Lucene在不同类型文档解析方面的不足,提出一种通用的文档解析器框架,并给出具体的应用实例。

关键词: 全文检索技术, Lucene, 开源框架, 文档解析器

Abstract: Lucene is an excellent technology frame of full-text retrieval engine of open source code. Firstly, Lucene, an advance full-text retrieval engine is introduced, system structure, running logic, and extend based on Lucene are analyzed in detail. Then for the Lucene document analysis in different types of deficiencies, a common document parser framework and practical examples are given.

Key words: full-text search, Lucene, open source code, document parser

中图分类号: