Computer and Modernization ›› 2021, Vol. 0 ›› Issue (11): 100-105.

Previous Articles     Next Articles

Named Entity Recognition of Medicinal Plant Texts Integrated with Attention Mechanism

  

  1. (1. College of Computer Science and Technology, Guizhou University, Guiyang 550025, China;
    2. State Key Laboratory of Public Big Data (Guizhou University), Guiyang 550025, China)
  • Online:2021-12-13 Published:2021-12-13

Abstract: Named entity recognition of medicinal plant texts plays an important role in information extraction and knowledge graph construction in the field of traditional Chinese medicine. Aiming at the problem of long sequence semantic sparsity in medicinal plant attribute text, a disease entity recognition method BAC based on attention mechanism of BiLSTM and CRF model is proposed. Firstly, the medicinal plant attribute text is preprocessed and semi-automatic annotation is used to construct the medicinal plant knowledge data set, and the low-dimensional word vector is obtained by pre-training. Then, these vectors are fed into the attention-based BiLSTM network to obtain feature vectors that better represent disease entities. Finally, the optimal tag sequence is obtained by conditional random field (CRF) algorithm. The comparison of experimental results shows that the accuracy of BAC method reaches 93.78%, which is 4.46% higher than BiLSTM-CRF model, it can effectively improve the recognition effect of named entity of disease in medicinal plant attribute text. The model trained by BAC method is used to identify disease named entities from 1680 text sentences, and a total of 1422 disease entities are extracted. By matching with the names of medicinal plants, a total of 4316 triples of the relationship between medicinal plants name and diseases entities are extracted.

Key words: knowledge graph, attention mechanism, bidirectional long-short term memory network (BiLSTM), conditional random field (CRF), disease named entity recognition