计算机与现代化 ›› 2011, Vol. 193 ›› Issue (9): 40-42,4.doi: 10.3969/j.issn.1006-2475.2011.09.012

• 网络与通信 • 上一篇    下一篇

基于Lucene的文本搜索引擎的设计和实现

邹燕飞1,于成尊2,赵亮1   

  1. 1.咸阳师范学院信息工程学院,陕西 咸阳 712000; 2.西安华为研究所,陕西 西安 710075
  • 收稿日期:2011-07-01 修回日期:1900-01-01 出版日期:2011-09-22 发布日期:2011-09-22

Design and Implementation of Text Search Engine Based on Lucene

ZOU Yan-fei1, YU Cheng-zun2, ZHAO Liang1   

  1. 1.College of Information Engineering, Xianyang Normal University, Xianyang 712000, China; 2.Huawei Xi’an Research Institute, Xi’an 710075, China
  • Received:2011-07-01 Revised:1900-01-01 Online:2011-09-22 Published:2011-09-22

摘要: 随着局域网信息的海量增长,个性化的轻量级搜索引擎已经被中、小型企业和校园关注和青睐。本文在研究搜索引擎基本原理的基础上,通过Lucene、JSP和Struts2等技术实现多种类型文件的文本内容的检索功能。测试结果表明,该系统实现了局域网内部对HTML、PDF、Word、txt等格式文件的内容提取和解析,具有开放性、可扩展、实时性和安全的特点,成功达到了预期目标。

关键词: 提取, 解析, 局域网, 文本

Abstract: With the increase of information of LAN, personalization and lightweight search engine has been concerned and admired. This paper realizes the retrieval of multi-type content using Lucene, JSP, struts2 etc, after studying of the principle of search engine on local area network. Experiment proves that the system can extract and analyze text of HTML, PDF, Word, txt, besides, the system is open, extended, real-time and safe. It achieves the anticipated results successfully.

Key words: search, extract, analyze, LAN, text

中图分类号: