计算机与现代化 ›› 2015, Vol. 0 ›› Issue (12): 19-.doi: 10.3969/j.issn.1006-2475.2015.12.004

• 算法分析与设计 • 上一篇    下一篇

基于句子权重和篇章结构的政府公文自动文摘算法

  

  1. 1.湖南省产商品质量监督检验研究院,湖南长沙410007;2.湖南师范大学数学与计算机科学学院,湖南长沙410081;
    3.高性能计算与随机信息处理省部共建教育部重点实验室,湖南长沙410081
  • 收稿日期:2015-10-15 出版日期:2015-12-23 发布日期:2015-12-30
  • 作者简介:毛良文(1969-),男,湖南益阳人,湖南省产商品质量监督检验研究院工程师,研究方向:信息系统安全; 通信作者:徐亮(1981-),男,湖南长沙人,湖南师范大学数学与计算机科学学院和高性能计算与随机信息处理省部共建教育部重点实验室副教授,博士,研究方向:中文信息处理,安全操作系统。
  • 基金资助:
    国家自然科学基金资助项目(61502165); 湖南省科技计划项目(2014FJ6030); 湖南省教育厅科研项目(13C527); 长沙市科技计划项目(k1403042-11); 湖南省重点学科建设项目(湘教发[2011]76号); 湖南师范大学学位与研究生教育教改课题(14JG13); 湖南师范大学教学改革项目(处发2015-13-52)

Automatic Text Summarization Algorithm Based on Sentence Weight and Chapter Structure

  1. 1. Hunan Testing Institute of Product and Commodity Supervision, Changsha 410007, China; 
    2.College of Mathematics and Computer Science, Hunan Normal University, Changsha 410081, China; 
    3. Key Laboratory of High Performance Computing and Stochastic Information Processing, 
    Ministry of Education of China, Changsha 410081, China
  • Received:2015-10-15 Online:2015-12-23 Published:2015-12-30

摘要: 提高文摘自动生成的准确性,能够帮助人们快速有效地获取有价值的信息。本文根据政府公文结构性强的特点,提出一种基于句子权重和篇章结构的政府公文自动文摘算法,首先通过基于游标的截取字符分句算法,对文档中句子和词语信息进行精确统计,获得对文章内容和篇章结构的基本了解;在此基础上,提出基于篇章结构的词语权重和句子权重计算方法,并根据权重计算结果对句子进行权重排序;然后,根据生成摘要的规模,筛选出一定数量的候选文摘句子;最后,对候选文摘句子进行一定的后处理,输出文摘句。实验结果表明,与同类型自动文摘算法以及Word 2003提供的自动文摘工具相比,本文提出的自动文摘算法在准确率和召回率上都有较大提高。

关键词: 政府公文, 自动文摘, 词语权重, 句子权重, 篇章结构

Abstract: To improve the accuracy of automatic text summarization can help people to obtain the valuable information simpler and more efficient. According to the structural characteristics of government documents, this paper proposed an automatic summarization algorithm based on sentence weight and chapter structure. First, from the accurate statistics information of sentences and words in the document, the article content and a basic understanding of textual structure can be obtained. Then through the calculation of words’ weight and sentences’ weight, sentences can be sorted. According to the size of the summarization, the candidate summary sentences can be chosen. Finally, after doing some postprocessing, the final sentences of the text summarization can be output. The results of experiment show that, compared with the similar algorithm, the accuracy rate and the recall rate in our algorithm are improved a lot.

Key words:  , government documents; automatic text summarization; word weight; sentence weight; chapter structure

中图分类号: