Computer and Modernization ›› 2021, Vol. 0 ›› Issue (02): 51-55.

Previous Articles     Next Articles

Named Entity Recognition on Chinese Electronic Medical Records Based on RoBERTa-WWM

  

  1. (1. School of  Nursing, Bengbu Medical College, Bengbu 233030, China; 
    2. Science Island Branch, University of Science and Technology of China, Hefei 230001, China)
  • Online:2021-03-01 Published:2021-03-01

Abstract: Electronic Medical Records (EMRs) contain abundant information, such as clinical symptoms, diagnosis results and drug efficacy. Named Entity Recognition (NER) aims to extract named entities from unstructured texts. It is also the initial step to extract valuable information from the EMRs. This paper proposes a method to recognize named entities based on the RoBERTa-WWM (A Robustly Optimized BERT Pre-training Approach-Whole Word  Masking). RoBERTa-WWM is a kind of pre-training model, which is utilized to generate semantic representations with prior knowledge. Compared with BERT (Bidirectional Encoder Representations from Transformers), the semantic representations generated by RoBERTa-WWM are more suitable for Chinese NER task because it masks the whole word during pre-training. The semantic representations are then inputted into Bidirectional Long  Short-Term Memory (BiLSTM) and Conditional Random Field (CRF) models in turn. The experimental results show that this method can effectively improve the F1-score on “China Conference on Knowledge Graph and Semantic Computing 2019 (CCKS 2019)” dataset and improve the performance of NER in Chinese EMRs.

Key words: electronic medical records, named entity recognition, RoBERTa-WWM, information extraction