Computer and Modernization ›› 2021, Vol. 0 ›› Issue (07): 18-22.

Previous Articles     Next Articles

Named Entity Recognition Algorithm Based on Active Learning

  

  1. (School of Computer Science and Engineering, Nanjing University of Science & Technology, Nanjing 210094, China)
  • Online:2021-08-02 Published:2021-08-02

Abstract: The purpose of named entity recognition is to identify the boundaries and categories of entities in the text. In the process of training named entity recognition models, a large number of labeled samples are usually required. By implementing effective selection algorithms, this paper reduces the labeling of samples from a large number of samples suitable for model updates. By using five sets of comparison experiments, it is verified that a better set of samples can be obtained by effective selection algorithm, and a targeted sample of annotations is realized. Through experiments designed on microblog network data sets, it is verified that the current-based active learning algorithm can select more appropriate sample sets for a large amount of Internet text data, which can effectively reduce the cost of manual labeling. This paper uses two models to realize the boundary extraction and classification of entities. The sequence labeling model extracts the position of the entity in the sequence, the entity classification model realizes the classification of the labeling results, and uses the active learning method to realize the training on the unlabeled data set. Experiment on two data sets is done by using the training method in this article. Experiments on the Weibo dataset show that the algorithm can learn text features from the unlabeled dataset. The experimental results on the MSRA data set show that when the proportion of the pre-training data set reaches more than 40%, the F1 score of the model on the test data set is stable at about 90%, which is close to the result of using all the data sets, indicating that the model  in unlabeled data sets has certain feature extraction capabilities.

Key words: named entity recognition, activate learning, deep learning, Bi-LSTM