计算机与现代化 ›› 2023, Vol. 0 ›› Issue (01): 30-36.

• 人工智能 • 上一篇    下一篇

基于多特征因子融合的中文短文本实体消歧

  

  1. (江西师范大学软件学院,江西 南昌 330022)
  • 出版日期:2023-03-02 发布日期:2023-03-02
  • 作者简介:王永缔(1994—),男,辽宁沈阳人,硕士研究生,研究方向:自然语言处理,E-mail: wydwork2022@163.com; 雷刚(1974—),男,江西进贤人,副教授,研究方向:机器学习,自然语言处理,E-mail: leigang@jxnu.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(62062040); 江西省教育厅科技项目(GJJ160315)

Chinese Short Text Entity Disambiguation Based on Multi-feature Factor Fusion

  1. (School of Software,  Jiangxi Normal University, Nanchang 330022, China)
  • Online:2023-03-02 Published:2023-03-02

摘要: 现有中文短文本实体消歧模型在消歧过程中大多只考虑指称上下文与候选实体描述的语义匹配特征,对同一查询文本中候选实体间的共现特征以及候选实体与实体指称类别相似特征等有效的消歧特征考虑不足。针对这些问题,本文首先利用预训练语言模型获得指称上下文与候选实体描述的语义匹配特征;然后,针对实体嵌入和指称类别嵌入提出共现特征与类别特征;最后,通过融合上述特征实现基于多特征因子融合实体消歧模型。实验结果表明本文提出的共现特征及类别特征在实现实体消歧中的可行性和有效性,以及本文提出的基于多特征因子融合的实体消歧方法能够取得更好的消歧效果。

关键词: 共现特征, 类别特征, 多特征因子, 多头注意力, Ernie

Abstract: Most of the existing Chinese short text entity disambiguation models only consider the semantic matching features between the mention context and the description of the candidate entity in the disambiguation process, and do not consider the effective disambiguation features such as the co-occurrence features between the candidate entities in the same query text and the similarity features between the mention type of the candidate entities and entities. To solve these problems, this paper first uses the pre-training language model to obtain the semantic matching features of mention context and candidate entity description. Then, co-occurrence feature and type feature are proposed for entity embedding and mention type embedding. Finally, by fusing the above features, the entity disambiguation model based on multi feature factors is realized. The experimental results show that the co-occurrence features and type features proposed in this paper are feasible and effective in entity disambiguation, and the entity disambiguation method based on multi-feature factor fusion proposed in this paper can achieve better disambiguation effect.

Key words: co-occurrence feature, type feature, multi-feature factor, multi-head attention, Ernie