计算机与现代化

• 算法设计与分析 • 上一篇    下一篇

特定领域的命名实体识别方法的研究

  

  1. (北京交通大学计算机与信息技术学院,北京100044)
  • 收稿日期:2017-07-12 出版日期:2018-04-03 发布日期:2018-04-03
  • 作者简介:张磊(1993-),女,河北张家口人,北京交通大学计算机与信息技术学院硕士研究生,研究方向:自然语言处理。

Research on Named Entity Recognition Method in Specific Fields

  1. (School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
  • Received:2017-07-12 Online:2018-04-03 Published:2018-04-03

摘要: 在特定领域的命名实体识别技术中,针对不同领域有各种不同的识别方法。不同领域文本具有其独特的文本特征,这导致已有领域的识别方法难以适应新的特定领域。针对该问题,提出一种基于条件随机场、半监督学习和主动学习相结合的方法,将其形成一个统一的技术框架来适应各个特定领域的命名实体识别。该方法首先选取特定文本的基本通用特征构建特征集合,训练条件随机场对特定领域进行命名实体的初步识别,再通过主动选取置信度低于选定阈值的样本进行人工标注,并迭代扩展训练样本来达到高识别效果。为验证所提方法,针对轨道交通领域文本进行了实验,实验结果表明该方法行之有效,在轨道交通领域取得了较好的识别效果。

关键词: 主动学习, 半监督学习, 条件随机场, 命名实体识别, 特定领域

Abstract: For named entity recognition technology in a specific domain, there are various identification methods corresponding to different fields. Different fileds of texts have their own unique textual features, which leads to the existing identification method is difficult to adapt to new specific domain. In order to solve this problem, this paper proposes a method based on conditional random field, semi-supervised learning and active learning, which forms a unified technical framework to adapt to the named entity recognition in each specific domain. This method constructs the feature set based on characteristics of rail transit text, then trains CRF to recognize named-entity of rail traffic text, and selects the samples with lower confidence level than the selected threshold, and then manually extends the training samples to achieve high goals. In order to validate the method, this paper carries on the experiment in the field of rail transit. The experimental results show that the method is effective and has a good recognition effect in the field of rail transit.

Key words: active learning, semi-supervised, conditional random field(CRF), named entity recognition(NER), specific domain

中图分类号: