计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

融合多特征的中文集成实体链接方法

  

  1. (河海大学计算机与信息学院,江苏南京211100)
  • 收稿日期:2018-06-20 出版日期:2019-01-30 发布日期:2019-01-30
  • 作者简介:冯钧(1969-),女,江苏常州人,教授,博士,CCF会员,研究方向:时空数据管理,智能数据处理与数据挖掘,水利信息化; 柳菁铧(1993-),女,硕士研究生,研究方向:信息抽取,知识图谱,E-mail: 1576236625@qq.com; 孔盛球(1990-),男,硕士研究生,研究方向:信息检索,知识图谱。
  • 基金资助:
    国家重点研发计划(2017YFC0405806); 国家自然科学基金面上项目(61602151,61370091)

Chinese Collective Entity Linking Method Based on Multiple Features

  1. (College of Computer and Information, Hohai University, Nanjing 211100, China)
  • Received:2018-06-20 Online:2019-01-30 Published:2019-01-30

摘要: 实体链接技术是将文本中的实体指称项正确链接到知识库中实体对象的过程,对知识库扩容起着关键作用。针对传统的实体链接方法主要利用上下文相似度等表层特征,而且忽略共现实体间的语义相关性,提出一种融合多特征的集成实体链接方法。首先结合同义词表、同名词表产生候选实体集,然后从多角度抽取语义特征,并将语义特征融合到构建的实体相关图中,最后对候选实体排序,选取top1实体作为链接目标。在NLP&CC2013中文微博实体链接评测数据集上进行实验,获得90.97%的准确率,与NLP&CC2013中文微博实体链接评测的最优系统相比,本文系统具有一定的优势。

关键词: 中文集成实体链接, 知识图谱, 实体消歧

Abstract: Entity linking is the process of mapping entity mentions in a document to their entities in Knowledge Base(KB) and plays a key role in the expansion of knowledge base. Aiming at traditional entity linking methods, which mainly utilize surface features such as context similarity and ignore the semantic correlation between co-occur mentions in a text corpus, a collective entity linking method based on multiple features is proposed. Firstly, it combines synonym list and namesake list to produce a set of candidate entities. After that, it extracts varieties of the semantic features and builds a referent graph. At last, it ranks the candidate entities and choses the top1 entity as the linking target. The evaluation on data sets of NLP&CC2013 Chinese micro-blog entity linking track shows a average accuracy of 90.97%, which is better than the state-of-art result.

Key words: Chinese collective entity linking, knowledge graph, entity disambiguation

中图分类号: