计算机与现代化 ›› 2021, Vol. 0 ›› Issue (01): 111-119.

• • 上一篇    下一篇

基于Logistic函数的《同义词词林》语义相似度计算

  

  1. (北京师范大学汉语文化学院,北京100875)
  • 出版日期:2021-01-28 发布日期:2021-01-29
  • 作者简介:杨泉(1977—),女,山东平度人,副教授,博士,研究方向:自然语言处理,语义计算,E-mail: yangquan@bnu.edu.cn。
  • 基金资助:
    国家语委科研项目(YB135-91)

Semantic Similarity Calculation of Cilin Based on Logistic Function

  1. (College of Chinese Language and Culture, Beijing Normal University, Beijing 100875, China)
  • Online:2021-01-28 Published:2021-01-29

摘要: 目前,词语语义相似度计算结果与人工判别结果存在一定差距主要是因为基于知识本体的语义相似度计算一般都是从数学计算的角度直接利用语义分类词典,而没有从词汇学角度充分利用词典中的语言学知识。因而提出运用语义场理论分析《同义词词林》中词语间的组织关系,阐述深度对语义相似度的决定性作用及分支信息的辅助作用。并且在《词林》深度与分支信息相结合的基础上,提出Logistic函数计算模型。运用上述方法对MC30语义相似度的计算结果与人工标注值之间的皮尔逊相关系数达到0.9540;均方根误差为0.0191;对RG65语义相似度的计算结果与人工标注值之间的皮尔逊相关系数达到0.9434;均方根误差为0.0193。

关键词: 语义相似度, 《同义词词林》, 深度, Logistic函数

Abstract: At present, there is a certain gap between the calculation results of semantic similarity of words and the results of artificial discrimination, mainly because the semantic similarity calculation based on knowledge ontology generally uses the semantic classification dictionary directly from the perspective of mathematical calculation, but does not make full use of the linguistic knowledge in the dictionary from the perspective of lexicology. Therefore, we use the theory of semantic field to analyze the organizational relationship between words in Cilin, and expound the decisive role of depth in semantic similarity and the auxiliary role of branch information. On the basis of the combination of depth with branch information of Cilin, the Logistic function calculation model is proposed. The Pearson correlation coefficient between the calculated result of semantic similarity of MC30 and the manually labeled value is 0.9540; the average root error is 0.0191; the Pearson correlation coefficient between the calculated result of semantic similarity of RG65 and the manually labeled value is 0.9434; the average root error is 0.0193.

Key words: semantic similarity, Cilin, depth, Logistic function