Computer and Modernization ›› 2023, Vol. 0 ›› Issue (01): 43-48.

Previous Articles     Next Articles

Tibetan Medical Entity Recognition Based on Tibetan BERT

  

  1. (1. School of Information Science and Technology, Tibetan University, Lhasa 850000, China; 2. State Key Laboratory of Artificial Intelligence for Tibetan Information Technology in Tibet Autonomous Region, Lhasa 850000, China; 3. Ministry of Education Engineering Research Center for Tibetan Information Technology, Lhasa 850000, China)
  • Online:2023-03-02 Published:2023-03-02

Abstract: Tibetan medicine character embedding is of great significance for Tibetan medical entity recognition, but there is a lack of high-quality Tibetan language model. Combined with Tibetan structural characteristics, the BERT model based on syllable is trained by using ordinary Tibetan news text, and a BERT-BiLSTM-CRF model is built by using the Tibetan BERT model. Firstly, the model uses Tibetan BERT model to learn the character embedding of Tibetan medicine text, and enhances the ability of character embedding to express Tibetan characters and their context information. And then, the BiLSTM layer is used to further extract the dependencies between characters in Tibetan medicine text. Finally, the CRF layer is used to strengthen the legitimacy of the label sequence. The experimental results show that using Tibetan BERT model to initialize character embedding is helpful to improve the recognition of Tibetan medical entity, and the F1 value reaches 96.18%.

Key words: Tibetan, Tibetan medicine, NER, BERT, BiLSTM