计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

关联首尾段落与首尾语句的多特征融合段落相似度计算

  

  1. (北京工业大学计算机学院,北京100124)
  • 收稿日期:2016-02-17 出版日期:2016-09-12 发布日期:2016-09-13
  • 作者简介:蒋宗礼(1956-),男,河南南阳人,北京工业大学计算机学院教授,博士生导师,CCF杰出会员(E200005392S),研究方向:网络信息搜索与处理; 赵洁(1990-),女,山西太原人,硕士研究生,研究方向:信息检索。
  • 基金资助:
    国家自然科学基金资助项目(61133003)

Multi-feature Fusion Paragraph Similarity Calculation Related to the First and the Last Paragraph and the First and the Last Sentence

  1. (College of Computer, Beijing University of Technology, Beijing 100124, China)
  • Received:2016-02-17 Online:2016-09-12 Published:2016-09-13

摘要: 首尾段落和首尾语句对语义有着较大的贡献,应该作为判别段落相似度的主要因素。本文将其以恰当权重融入SiteQ算法,提出关联首尾段落和首尾语句的多特征融合段落相似度计算算法Topic-SiteQ。该算法采用多特征融合的算法计算首尾语句的语义相似度,并以一定的权值体现它们对段落相似度的贡献,同时提高首尾段落的评分值,并根据这次评分值进行推荐排序。实验表明,采用该算法,相关段落排序的MRR值提高了0.032,F测度值平均提高了1.4%,说明该算法的改进是有效的。

关键词: 自动问答系统, SiteQ算法, 语义相似度, 多特征融合

Abstract: For their greater contribution to the semantics of the paragraph, the first and the last paragraphs and the first and the last sentences of the paragraph should be taken as the main factors in computing the similarity of the paragraphs. By using them in SiteQ with appropriate weight, we propose Topic-SiteQ calculation algorithm. It uses a multi-feature fusion algorithm to compute the semantic similarity of the first and the last sentences that contribute to the paragraph similarity by weight. At the same time, we improve the score of the first and the last paragraphs, recommend and sort the paragraphs by the final score. Experiments show that, with Topic-SiteQ, the MRR value of relevance ranking of paragraph increased about 0.032, and the F-measure increased about 1.4%. The experimental results show that the optimized algorithm is effective.

Key words: automatic question answering system, SiteQ, semantic similarity, multi-feature fusion

中图分类号: