Named Entity Recognition Algorithm Based on Active Learning

Abstract

Abstract: The purpose of named entity recognition is to identify the boundaries and categories of entities in the text. In the process of training named entity recognition models, a large number of labeled samples are usually required. By implementing effective selection algorithms, this paper reduces the labeling of samples from a large number of samples suitable for model updates. By using five sets of comparison experiments, it is verified that a better set of samples can be obtained by effective selection algorithm, and a targeted sample of annotations is realized. Through experiments designed on microblog network data sets, it is verified that the current-based active learning algorithm can select more appropriate sample sets for a large amount of Internet text data, which can effectively reduce the cost of manual labeling. This paper uses two models to realize the boundary extraction and classification of entities. The sequence labeling model extracts the position of the entity in the sequence, the entity classification model realizes the classification of the labeling results, and uses the active learning method to realize the training on the unlabeled data set. Experiment on two data sets is done by using the training method in this article. Experiments on the Weibo dataset show that the algorithm can learn text features from the unlabeled dataset. The experimental results on the MSRA data set show that when the proportion of the pre-training data set reaches more than 40%, the F1 score of the model on the test data set is stable at about 90%, which is close to the result of using all the data sets, indicating that the model in unlabeled data sets has certain feature extraction capabilities.

Key words: named entity recognition, activate learning, deep learning, Bi-LSTM

ZHANG Cen-fang. Named Entity Recognition Algorithm Based on Active Learning[J]. Computer and Modernization, 2021, 0(07): 18-22.

References

［1］武文雅,陈钰枫,徐金安,等. 中文实体关系抽取研究综述［J］. 计算机与现代化, 2018(8):21-27.
［2］黄埔. 文本信息抽取优化关键技术研究与系统实现［D］. 北京:北京邮电大学, 2019.
［3］付瑞吉. 开放域命名实体识别及其层次化类别获取［D］. 哈尔滨:哈尔滨工业大学, 2014.
［4］ CHEN H H, DING Y W, TSAI S C, et al. Description of the NTU system used for MET2［C］// Proceedings of the 7th Message Understanding Conference. 1998.
［5］ BLACK W J, RINALDI F, MOWATT D. FACILE: Description of the NE system used for MUC-7［C］// Proceedings of the 7th Message Understanding Conference. 1998.
［6］伍玉凯. 基于规则匹配的语义处理系统研究与实现［D］. 哈尔滨:哈尔滨工业大学, 2018.
［7］刘一佳,车万翔,刘挺,等. 基于序列标注的中文分词、词性标注模型比较分析［J］. 中文信息学报, 2013,27(4):30-36.
［8］殷章志. 中文命名实体识别研究［D］. 大连:大连理工大学, 2019.
［9］冯静,李正武,张登云,等. 基于隐马尔可夫模型的桥梁检测文本命名实体识别［J］. 交通世界(工程技术), 2020(3):32-33.
［10］ABD M T, MOHD M. A comparative study of word representation methods with conditional random fields and maximum entropy markov for bio-named entity recognition［J］. Malaysian Journal of Computer Science, 2018,31(5):15-30.
［11］PATIL N, PATIL A, PAWAR B V. Named entity recognition using conditional random fields［J］. Procedia Computer Science,2020,167:1181-1188.
［12］曹波,苏一丹,邓琦. 基于最大熵模型的中国人名自动识别［J］. 计算机工程与应用, 2009,45(4):227-228.
［13］BORTHWICK A. A Maximum Entropy Approach To Named Entity Recognition［M］. NewYork: New York University, 1999.
［14］COLLOBERT R, WESTON J, BOTTOU L, et al. Natural language processing (almost) from scratch［J］. Journal of Machine Learning Research, 2011(12):2493-2537.
［15］万磊,佟鑫,盛明伟,等. Softmax分类器深度学习图像分类方法应用综述［J］. 导航与控制, 2019,18(6):1-9.
［16］LAMPLE G, BALLESTEROS M, SUBRAMANIAN S, et al. Neural architectures for named entity recognition［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016:260-270.
［17］LEE C. LSTM-CRF models for named entity recognition［J］. IEICE Transactions on Information and Systems, 2017(4):882-887.
［18］禤镇宇,蒋盛益,张礼明,等. 基于多特征Bi-LSTM-CRF的影评人名识别研究［J］. 中文信息学报, 2019,33(3):94-101.
［19］张磊. 特定领域的命名实体识别方法的研究［J］. 计算机与现代化, 2018(3):60-64.
［20］SETTLES B. Active learning literature survey［J］. Machine Learning, 2010,15(2):201-221.
［21］SHEN Y Y, YUN H, LIPTON Z C, et al. Deep active learning for named entity recognition［C］// Proceedings of the 2nd Workshop on Representation Learning for NLP. 2017.
［22］毛明毅,吴晨,钟义信,等. 加入自注意力机制的BERT命名实体识别模型［J］. 智能系统学报, 2020,15(4):146-153.
［23］THIREOU T, RECZKO M. Bidirectional long short-term memory networks for predicting the subcellular localization of eukaryotic proteins［J］. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2007,4(3):441-446.
［24］MIKOLOV T, CHEN K, CORRADO G S, et al. Efficient estimation of word representations in vector space［J］. arXiv preprint arXiv:1301.3781, 2013.

[1]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[2]	CHEN Kai1, LI Yiting1, 2, QUAN Huafeng1. A River Discarded Bottles Detection Method Based on Improved YOLOv8 [J]. Computer and Modernization, 2024, 0(11): 113-120.
[3]	YANG Jun1, HU Wei1, ZHU Wenfu2. Visual SLAM Loop Closure Detection Algorithm Based on Improved MobileNetV3 [J]. Computer and Modernization, 2024, 0(10): 21-26.
[4]	WANG Yingying, HAO Xiao. Fine-grained Image Classification Based on Res2Net and Recursive Gated Convolution [J]. Computer and Modernization, 2024, 0(10): 74-79.
[5]	SHI Xingyu1, LI Qiang2, ZHUANG Li3, LIANG Yi3, WANG Qiulin3, CHEN Kai3, WU Chenzhou3, CHANG Sheng1. Object Detection Models Distillation Technique for Industrial Deployment [J]. Computer and Modernization, 2024, 0(10): 93-99.
[6]	ZHANG Ze1, ZHANG Jianquan2, 3, ZHOU Guopeng2, 3. Camera Module Defect Detection Based on Improved YOLOv8s [J]. Computer and Modernization, 2024, 0(09): 107-113.
[7]	CHENG Yazi1, LEI Liang1, 2, CHEN Han1, ZHAO Yiran1. Multi-scale Depth Fusion Monocular Depth Estimation Based on Transposed Attention [J]. Computer and Modernization, 2024, 0(09): 121-126.
[8]	CHENG Meng, LI Hao. Improved Deciduous Tree Nest Detection Method Based on YOLOv5s [J]. Computer and Modernization, 2024, 0(08): 24-29.
[9]	WANG Mengxi, LI Jun. Review of Fall Detection Technologies for Elderly [J]. Computer and Modernization, 2024, 0(08): 30-36.
[10]	SHI Xianwei1, FAN Xin2. Semantic Segmentation of Video Frame Scene Based on Lightweight [J]. Computer and Modernization, 2024, 0(08): 49-53.
[11]	XU Xin’ai, LI Gang. An Image Generation Method of Classroom Expression Images [J]. Computer and Modernization, 2024, 0(08): 88-91.
[12]	GAO Shuaipeng, WANG Yifan. Survey on Group-level Emotion Recognition in Images [J]. Computer and Modernization, 2024, 0(08): 98-107.
[13]	HUANG Wendong, WANG Yifan. Survey on Multimodal Information Processing and Fusion Based on Modal Categories [J]. Computer and Modernization, 2024, 0(07): 47-62.
[14]	WU Li1, ZHANG Zhenghao2, GE Caicheng2, YU Jun2. Lane Line Detection Algorithm Based on Improved SCNN Network [J]. Computer and Modernization, 2024, 0(07): 87-92.
[15]	ZHANG Ke1, AI Zhongliang2, LIU Zhonglin3, GU Pingli1, LIU Xuelin4. Judicial Argumentation Understanding Method Based on Multiplet Loss [J]. Computer and Modernization, 2024, 0(06): 115-120.