计算机与现代化 ›› 2024, Vol. 0 ›› Issue (02): 69-74.doi: 10.3969/j.issn.1006-2475.2024.02.011

• 人工智能 • 上一篇    下一篇

基于语义分割的嵌套命名实体识别方法

  

  1. (重庆师范大学计算机与信息科学学院,重庆 401331)
  • 出版日期:2024-02-19 发布日期:2024-03-19
  • 作者简介: 作者简介:崔少国(1974—),男,湖北十堰人,教授,博士,研究方向:大数据与人工智能,E-mail: csg@cqnu.edu.cn; 通信作者胡光平(1994—),男,重庆人,硕士研究生,研究方向:自然语言处理,E-mail: 790974688@qq.com。
  • 基金资助:
    国家自然科学基金资助项目(62003065); 重庆市科技局自然基金资助项目(2022NSCQ-MSX2933, 2022TFII-OFX0262, cstc2019jscx-mbdxX0061); 教育部人文社科规划基金资助项目(22YJA870005); 重庆市教委重点项目(KJZD-K202200510); 重庆市社会科学规划项目(2022NDYB119); 重庆师范大学人才基金项目(20XLB004)

Nested Named Entity Recognition Based on Semantic Segmentation

  1. (School of Computer and Information Science, Chongqing Normal University, Chongqing 401331, China)
  • Online:2024-02-19 Published:2024-03-19

摘要: 摘要:命名实体识别旨在从非结构化文本中提取实体,实体之间通常存在嵌套结构。然而,以往的研究大多只关注平面命名实体的识别,而忽略了嵌套实体。因此本文提出一种基于语义分割的嵌套命名实体识别方法,该方法将嵌套命名实体识别任务表述为一个语义分割任务。首先,计算单词和单词之间的元素相似性、余弦相似性以及双线性相似性;然后将3种相似性特征拼接作为一个图像输入到语义分割模型中,得到单词和单词之间的关系矩阵;最后,从关系矩阵提取出嵌套实体。实验结果表明,本文方法可以有效地识别出嵌套实体,在公开嵌套命名实体识别数据集GENIA上的F1值达到80.0%,优于现有大多数嵌套实体识别方法。

关键词: 关键词:嵌套命名实体识别, 关系矩阵, 语义分割, 相关性特征

Abstract: Abstract: Named entity recognition aims to extract entities from an unstructured text, and a nested structure often exists between entities. However, most of the previous studies only focused on the recognition of flat named entities while ignoring nested entities. Therefore, a nested named entity recognition method based on semantic segmentation is proposed, which describes the task of nested named entity recognition as a semantic segmentation task. First, we calculate the element similarity, cosine similarity and bilinear similarity between words and words. Then, the 3 similarity features are spliced as an image which will be input into the semantic segmentation model to obtain the relationship matrix between words and words. Finally, we extract nested entity from the relationship matrix. The experimental results show that the proposed method can effectively recognize nested entities, and the F1 value on the public nested named entity recognition dataset GENIA reaches 80.0%, which is superior to most existing nested entity recognition methods.

Key words: Key words: nested named entity recognition, relation matrix, semantic segmentation, correlation feature

中图分类号: