计算机与现代化 ›› 2021, Vol. 0 ›› Issue (01): 105-110.

• • 上一篇    下一篇

基于深度学习和语法规约的需求文档命名实体识别

  

  1. (中国电子科技集团公司第三十二研究所,上海201808)
  • 出版日期:2021-01-28 发布日期:2021-01-29
  • 作者简介:许梦笛(1995—),男,四川大竹人,硕士研究生,研究方向:深度学习,自然语言处理,E-mail: xmd_tdsl@163.com; 王金华(1976—),男,高级工程师,硕士,研究方向:大数据,数据工程,知识图谱,自然语言处理,E-mail: 15802196002@139.com。

Requirements Document Named Entity Recognition Based on Deep Learning and Grammatical Regulations 

  1. (The 32nd Research Institute of China Electronics Technology Group Corporation, Shanghai 201808, China)
  • Online:2021-01-28 Published:2021-01-29

摘要: 命名实体识别是自然语言处理中的一个关键。在需求文档中存在过长的实体:虚功能,使得普适的传统命名实体识别方法无法有效地识别得到完整的实体。本文针对需求文档实体识别模型进行深入研究,引入深度学习方法,提出基于深度残差网络(ResNet)的CNER方法与基于规则的方法相结合,进行针对中文需求文档的分词。本文的命名实体识别模型是一种编码-解码模型,使用带有注意力机制的双向长短期记忆网络(BiLSTM with attention)进行编码,得到分词后文本的上下文特征和句式特征,使用条件随机场(CRF)方法进行解码,再结合语法规约的干预进行需求文档实体识别。实验表明,所提方法在需求文档领域识别效果优于普适的传统方法。

关键词: 命名实体识别, CNER, 深度残差网络, 双向长短期记忆网络, 条件随机场, 语法规约

Abstract: Named entity recognition is particularly critical in natural language processing. There are overlong entities in the requirements document: virtual function, which makes it hard for pervasive traditional named entity recognition method to recognize entire entity. This paper conducts an in-depth research on the entity recognition model of requirements documents, introduces CNER method, which is based on Deep Residual Network (ResNet), to combine with the method based on grammatical regulations to perform word segmentation of Chinese requirements documents. This paper’s NER model is an encoder-decoder model, applies Bidirectional Long Short-Term Memory network (BiLSTM with attention) to encode, which obtains the context features and sentence pattern features of the text after word segmentation, employs conditional random field (CRF) method to decode, then identifies the requirements document entities with the intervention of grammatical regulations as a combination. Experiments show that the proposed method has better recognition effect than the pervasive traditional methods.

Key words: named entity recognition, CNER, ResNet, BiLSTM, CRF, grammatical regulations