计算机与现代化 ›› 2020, Vol. 0 ›› Issue (11): 60-64.

• 数据库与数据挖掘 • 上一篇    下一篇

基于CNN-BGRU-CRF的中文电子病历实体抽取方法

  

  1. (青岛科技大学,山东青岛266100)
  • 出版日期:2020-12-03 发布日期:2020-12-03
  • 作者简介:冯云霞(1977—),女,山东潍坊人,副教授,博士,研究方向:大数据共享与分析技术,安全与隐私保护,物联网应用技术,E-mail: cloudy_feng@163.com; 衣鹏(1994—),男,硕士研究生,研究方向:医疗数据挖掘,软件工程,E-mail: 1014853114@qq.com; 韩正亮(1993—),男,硕士研究生,研究方向:医疗大数据应用,大数据分析技术,E-mail: 1062718485@qq.com; 宋波(1978—),男,教授,博士,研究方向:软件工程,大数据集成技术,医疗大数据应用。
  • 基金资助:
    国家自然科学基金资助项目(61572268,61303193); 山东省重点研发计划项目(2017GSF18110,2018GGX101029)

Entity Extraction Method of Chinese Electronic Medical Record Based on CNN-BGRU-CRF

  1. (Qingdao University of Science and Technology, Qingdao 266100, China)
  • Online:2020-12-03 Published:2020-12-03

摘要: 针对传统方法在中文电子病历实体抽取任务中存在对词典和分词工具过于依赖,无法充分利用上下文特征等问题,本文提出一种基于字嵌入卷积(CNN)、双向门控循环单元(BGRU)和条件随机场(CRF)结合的中文电子病历实体抽取模型。首先利用字嵌入方法提取出潜在词特征,然后在使用字词特征联合方式的同时使用注意力机制突出特定的信息,最后通过合理性约束得到最终结果。该模型充分使用了字词特征避免了实体抽取受错误分词的影响,并且减少了人工构造特征的过程,提高了实体抽取效率。实验结果表明,该模型在诊断名称、症状名称、治疗方式类别的实体抽取中,F值表现优于传统的Bi-LSTM-CRF模型。

关键词: 中文电子病历, 实体抽取, 卷积网络, 双向门控循环单元, 注意力机制

Abstract: To solve the problem that traditional methods are too dependent on dictionaries and word segmentation tools in entity extraction of Chinese Electronic Medical Records and cannot make full use of contextual features, this paper proposes a Chinese EMR entity extraction model based on the combination of word embedded convolution (CNN), bidirectional gated loop unit (BGRU) and conditional random field (CRF). In the first place, the word embedding method is used to extract the potential word features, and then the attention mechanism is used to highlight the specific information while using the joint method of word features. At last, the final result is obtained by rationality constraint. This model makes full use of word features to avoid the influence of wrong word segmentation on entity extraction and to reduce the process of artificial feature construction, improve the efficiency of entity extraction. The experimental results show that the F value of the model performs better than the traditional Bi-LSTM-CRF model in entity extraction of diagnosis name, symptom name and treatment type. 

Key words: Chinese electronic medical record, entity extraction, CNN, BGRU, attention mechanism