计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

一种基于多特征混合句子相似度计算的改进

  

  1. 北京工业大学计算机学院,北京100124
  • 收稿日期:2015-02-13 出版日期:2015-07-23 发布日期:2015-07-28
  • 作者简介:王全民(1963-),男,北京人,北京工业大学计算机学院副教授,硕士生导师,博士,CCF高级会员,研究方向:网络与信息安全; 曹建奇(1986-),男,河北邯郸人,硕士研究生,研究方向:自然语 言处理,网络与信息安全; 王莉(1989-),女,山东菏泽人,硕士研究生,研究方向:推荐系统,网络与信息安全。
  • 基金资助:
    国家自然科学基金资助项目(61272500)

An Improvement of Sentence Similarity Calculation Based on Multifeatures

  1. College of Computer Science, Beijing University of Technology, Beijing 100124, China
  • Received:2015-02-13 Online:2015-07-23 Published:2015-07-28

摘要:

句子相似度计算是自然语言处理领域的关键问题,计算句子相似度的方法也有很多。本文针对基于多特征句子相似度计算模型对计算句子相似度结果偏低这一问题进行研究,在词语语义的基础上
增加相似词计算,同时增加句子成分关系相似度计算方法,该改进方法既避免了增加额外同义词词典的操作,又充分考虑句子的词形、句长、词序、语义、成分关系等多特征信息,提高了句子相似度的
计算结果。实验结果表明,该方法对句子相似度计算有一定的提高,且该方法合理、简便、可行。

关键词: 句子相似度, 相似词, 成分关系, 多特征

Abstract:

Sentence similarity calculation is a key issue in the field of natural language processing. There are many methods to calculate sentence similarity. We research the
problem of low calculation results of sentence similarity calculation model based on multifeatures. On the basis of word semantic similarity, the paper adds similar word
calculation method, at the same time, adds the similarity calculation method of the sentence constituents’ relationship. The improved method not only avoids the operation of
the additional synonyms dictionary, but also fully considers the words in the sentence, sentence length, word order, semantic, the relationship of sentence constituents. The
method improves the sentence similarity calculation results. Experimental results show that the method can improve the results of sentence similarity calculation and the method
is reasonable, simple and feasible.

Key words: sentence similarity; similar word; constituent&rsquo, s relationship; multifeatures

中图分类号: