计算机与现代化

• 网络与通信 • 上一篇    下一篇

基于知网与搜索引擎的词汇语义相似度计算

  

  1.  (重庆交通大学信息科学与工程学院,重庆400074)
  • 出版日期:2018-04-28 发布日期:2018-05-02
  • 作者简介:吴克介(1993),男,重庆人,重庆交通大学信息科学与工程学院硕士研究生,研究方向:交通大数据分析; 王家伟(1971),男,四川达州人,副教授,硕士生导师,研究方向:数据库技术,软件工程。

Vocabulary Semantic Similarity Computation Based on HowNet and Search Engine

  1.  (School of Information Science and Engineering, Chongqing Jiaotong University, Chongqing 400074, China) 
     
  • Online:2018-04-28 Published:2018-05-02

摘要: 提出一种基于知网与搜索引擎的词汇语义相似度计算方法。利用义原在层次体系树的深度、密度、信息量优化义原的相似性计算。将逐点共有信息(PMI)算法与归一化谷歌距离(NGD)算法结合优化基于搜索引擎的词汇语义相似度计算。将词汇的词性作为权重因子融合知网与搜索引擎的词汇相似度计算结果。实验结果表明,与基于知网和基于搜索引擎的语义相似度计算方法相比,所提出的方法在NLPCC测试集上的平均相似度更接近于测试集的评测标准,在汽车票务领域的词汇相似度计算中具有较好的应用效果。

关键词:  , 语义相似度, 知网, 义原, 搜索引擎

Abstract: This paper proposes a method of computing lexical semantic similarity based on HowNet and search engines. The similarity computation is optimized by using the depth, density and information of semantic primitive in the hierarchy tree. The search engine based lexical semantic similarity computation is optimized by combining the point by point common information (PMI) algorithm with the normalized Google distance (NGD) algorithm. The lexical part of speech is used as weighting factor to merge the word similarity computation between HowNet and search engine. The experimental results show that, compared with the semantic similarity calculation method based on HowNet and search engine, the average similarity of the proposed method on NLPCC test set is closer to the evaluation criteria of the test set, and lexical similarity in the car ticket calculation fields has a good application effect.

Key words: semantic similarity, HowNet, sememe, search engines

中图分类号: