计算机与现代化 ›› 2024, Vol. 0 ›› Issue (01): 87-91.doi: 10.3969/j.issn.1006-2475.2024.01.014

• 人工智能 • 上一篇    下一篇

基于BERT的电子病历命名实体识别

  

  1. (1.湖南中医药大学信息科学与工程学院,湖南 长沙 410208; 2.中南大学计算机学院,湖南 长沙 410083)
  • 出版日期:2024-01-23 发布日期:2024-02-26
  • 作者简介:郑立瑞(1995—),男,湖北南漳人,硕士研究生,研究方向:自然语言处理,E-mail: 1903482422@qq.com; 通信作者:肖晓霞(1977—),女,湖南浏阳人,副教授,博士,研究方向:中医智能辅助诊断,智能数据分析,嵌入式系统,E-mail: 173880937@qq.com。
  • 基金资助:
    2017年科技部十三五重点研发计划(2017YFC1703300); 科技创新2030-“新一代人工智能”重大项目课题(2018AAA0102102)

Named Entity Recognition in Electronic Medical Record Based on BERT

  1. (1. School of Information Science and Engineering, Hunan University of Chinese Medicine, Changsha 410208, China;
    2. School of Computer Science and Engineering, Central South University, Changsha 410083, China)
  • Online:2024-01-23 Published:2024-02-26

摘要: 摘要:电子病历是保存、管理、传输病人医疗记录的重要资源,是医生诊治疾病的重要文本记录。通过电子病历命名实体识别(NER)技术能够高效、智能地从电子病历中抽取症状、疾病、药名等诊疗信息,有利于结构化电子病历,使之能够使用机器学习等技术进行诊疗规律挖掘。为了高效识别电子病历中的命名实体,提出一种融合对抗训练(FGM)的基于BERT与双向长短期记忆网络(BILSTM)的命名实体识别方法(BERT-BILSTM-CRF-FGM, BBCF),对2017全国知识图谱与语义计算大会(CCKS2017)提供的中文电子病历语料做修正等预处理后,采用BBCF模型识别该语料中5种实体的平均F1值为92.84%,比基于膨胀卷积网络的BERT模型(BERT-IDCNN-CRF)和基于BILSTM的条件随机场模型(BILSTM-CRF)有更高的F1值和更快的收敛速度,能够更加高效地结构化电子病历文本。

关键词: 关键词:电子病历, 命名实体识别, BERT, FGM, 双向长短期记忆网络, 条件随机场

Abstract: Abstract:Electronic medical record is an important resource for the preservation, management and transmission of patients’medical records. It is also an important text record for doctors’ diagnosis and treatment of diseases. Through the electronic medical record named entity recognition (NER) technology, diagnosis and treatment information such as symptoms, diseases and drug names can be extracted from the electronic medical record efficiently and intelligently. It is helpful for structured electronic medical records to use machine learning and other technologies for diagnosis and treatment regularity mining. In order to efficiently identify named entities in electronic medical records, a named entity recognition method based on BERT and bidirectional long short-term memory network (BILSTM) with fusion adversarial training (FGM) is proposed, referred to as BERT-BILSTM-CRF-FGM (BBCF). After preprocessing by correcting the Chinese electronic medical record corpus provided by the 2017 National Knowledge Graph and Semantic Computing Conference (CCKS2017), the BERT-BILSTM-CRF-FGM model is used to recognize five types of entities in the corpus, with an average F1 score of 92.84%. Compared to the BERT model based on the inflated convolutional neural network (BERT-IDCNN-CRF) and the conditional random field model based on BILSTM (BILSTM-CRF), the proposed method has higher F1 score and faster convergence speed, which can more efficiently structure electronic medical record text.

Key words: Key words:electronic medical record, named entity recognition, BERT, FGM, BILSTM (Bidirectional Long Short-Term Memory Network), conditional random field

中图分类号: