Chinese Named Entity Recognition with Fusion of Lexicon Information and Sentence Semantics#br#
#br#

doi:10.3969/j.issn.1006-2475.2024.03.004

Abstract

Abstract: Abstract： The performance of named entity recognition tasks has significantly improved due to the rapid advancement of deep learning techniques. However， the outstanding results achieved by deep learning networks often rely on large amounts of labeled samples， making it challenging to fully exploit deep information in small datasets. In this paper， we propose a Chinese named entity recognition model （LS-NER） that combines lexicon and sentence semantics. Firstly， potential words matched by characters in the dictionary serve as a priori lexical information for the model， addressing the Chinese word segmentation issue. Then， sentence embeddings containing semantic information， typically used for calculating text similarity， are applied to the named entity recognition task， enabling the model to identify similar entities from analogous sentences. Finally， a feature fusion strategy is devised to allow the model to effectively learn the semantic information provided by sentence embeddings. The experimental results demonstrate that our approach achieves commendable performance on both small datasets Resume and Weibo. The incorporation of sentence semantics assists the model in learning deeper features without requiring additional external information， resulting in F1 scores that are 0.15 percentage points and 2.26 percentage points higher than those of the model without added sentence information， respectively.
Key words： named entity recognition； BERT； SoftLexicon； Sentence-Bert； CRF

CLC Number:

TP391

WANG Tan, CHEN Jin-guang, MA Li-li. Chinese Named Entity Recognition with Fusion of Lexicon Information and Sentence Semantics#br# #br#[J]. Computer and Modernization, 2024, 0(03): 24-28.

References

［1］ ZHAO S， CAI Z P， CHEN H W， et al. Adversarial training based lattice LSTM for Chinese clinical named entity recognition［J］. Journal of Biomedical Informatics， 2019，99：103290.
［2］ ETZIONI O， CAFARELLA M， DOWNEY D， et al. Unsupervised named-entity extraction from the Web： An experimental study［J］. Artificial Intelligence， 2005，165（1）：91-134.
［3］ GUO J F， XU G， CHENG X Q， et al. Named entity recognition in query［C］// Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval. 2009：267-274.
［4］ MOLLA D， VAN Z， SMITH D. Named entity recognition for question answering［C］// Proceedings of the Australasian Language Technology Workshop. 2006：51-58.
［5］ HE J Z， WANG H F. Chinese named entity recognition and word segmentation based on character［C］// Proceedings of the 6th SIGHAN Workshop on Chinese Language Processing. 2008：128-132.
［6］ LIU Z X， ZHU C H， ZHAO T J. Chinese named entity recognition with a sequence labeling approach： Based on characters， or based on words?［C］// International Conference on Intelligent Computing. 2010：634-640.
［7］ ZHANG Y， YANG J. Chinese NER using lattice LSTM［C］// Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics： 1： Long Papers. 2018：1554-1564.
［8］ LIU W， XU T G， XU Q H， et al. An encoding strategy based word-character LSTM for Chinese NER［C］// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. 2019：2379-2389.
［9］ MA R T， PENG M L， ZHANG Q， et al. Simplify the usage of lexicon in Chinese NER［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020：5951-5960.
［10］ LI X N， YAN H， QIU X P， et al. FLAT： Chinese NER using flat-lattice transformer［C］// Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020：6836-6842.
［11］ VASWANI A， SHAZEER N， PARMAR N， et al. Attention is all you need［J］. arXiv preprint arXiv：1706.03762， 2017.
［12］ PETERS M， NEUMANN M， IYYER M， et al. Deep contextualized word representations［C］// Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics： Human Language Technologies. 2018：2227-2237.
［13］ RADFORD A， NARASIMHAN K， SALIMANS T， et al. Improving language understanding by generative pre-training［J］. Computer Science， 2018.
［14］ DEVLIN J， CHANG M W， LEE K， et al. BERT： Pre-training of deep bidirectional transformers for language understanding［C］// Proceedings of the 2019 Conference of the North American chapter of the Association for Computational Linguistics： Human Language Technologies. 2019：4171-4186.
［15］ SUN Y， WANG S H， LI Y K， et al. Ernie： Enhanced representation through knowledge integration［J］. arXiv preprint arXiv：1904.09223， 2019.
［16］ REIMERS N， GUREVYCH I. Sentence-BERT： Sentence embeddings using siamese BERT-networks［C］// Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019：3980-3990.
［17］ HUANG Z H， XU W， YU K. Bidirectional LSTM-CRF models for sequence tagging［J］. arXiv preprint arXiv：1508.01991， 2015.
［18］ PENG N， DREDZE M. Named entity recognition for Chinese social media with jointly trained embeddings［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015：548-554.
［19］ LEVOW G A. The third international Chinese language processing bakeoff： Word segmentation and named entity recognition［C］// Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing. 2006：108-117.
［20］ PRADHAN S， MOSCHITTI A， XUE N， et al. Towards robust linguistic analysis using ontonotes［C］// Proceedings of the 17th Conference on Computational Natural Language Learning. 2013：143-152.
［21］ SRIVASTAVA N， HINTON G， KRIZHEVSKY A， et al. Dropout： A simple way to prevent neural networks from overfitting［J］. The Journal of Machine Learning Research， 2014，15（1）：1929-1958.
［22］ KINGMA D P， BA J. ADAM： A method for stochastic optimization［J］. arXiv preprint arXiv：1412.6980， 2014.
［23］ YAN H， DENG B C， LI X N， et al. TENER： Adapting transformer encoder for named entity recognition［J］. arXiv preprint arXiv：1911.04474， 2019.
［24］ GUI T， ZOU Y C， ZHANG Q， et al. A lexicon-based graph neural network for Chinese NER［C］// Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019：1040-1050.
［25］ CAO P F， CHEN Y B， LIU K， et al. Adversarial transfer learning for Chinese named entity recognition with self-attention mechanism［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018：182-192.
［26］ JIE Z M， LU W. Dependency-guided LSTM-CRF for named entity recognition［C］// Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019：3860-3870.
［27］吴炳潮，邓成龙，关贝，等. 动态迁移实体块信息的跨领域中文实体识别模型［J］. 软件学报， 2022，33（10）：3776-3792.

[1]	ZHENG Jiuchao, ZHAO Xinyuan. Entity Linking Method Based on Topics and Description Information [J]. Computer and Modernization, 2024, 0(12): 10-14.
[2]	ZHANG Kun1, ZHANG Yongwei1, WU Yongcheng1, ZHANG Xiaowen2, ZHAI Shichen2. An LLM-based Method for Automatic Construction of Equipment Failure Knowledge Graphs [J]. Computer and Modernization, 2024, 0(11): 46-53.
[3]	ZHANG Yu1, 2, LI Jing1, 2, MA Ming1, 2, WANG Zhongxiang1, 2, SUN Yan1, 2. YOLOLW: A Novel Lightweight Object Detection Model [J]. Computer and Modernization, 2024, 0(11): 91-98.
[4]	DU Mengjun1, LI Ang1, TONG Jun1, QIAN Jin1, KANG Kai1, WANG Ruoding1, JIN Wenxing2. Power Information Data Fusion Model Based on Improved Extreme Learning Algorithm [J]. Computer and Modernization, 2024, 0(10): 61-64.
[5]	JIAO Yikai1, 2, ZHU Xinjuan1, 2. Label Recommendation Methods for Public Cultural Resources [J]. Computer and Modernization, 2024, 0(10): 107-112.
[6]	YANG Yufeng1, 2, XIA Xiaoyun2, CHEN Zefeng3, LIAO Weizhi2, LI Jiwu2. Dung Beetle Optimization Algorithm Integrating Multiple Strategies for Take-out Order Distribution Route Optimization [J]. Computer and Modernization, 2024, 0(09): 25-32.
[7]	MA Yu, YANG Yong, REN Ge, Palidan Tuerxun. Automated Essay Scoring Method Based on GCN and Fine Tuned BERT [J]. Computer and Modernization, 2024, 0(09): 33-37.
[8]	LIU Wenliang1, WU Fei1, HE Deming1, ZHAO Weiwei2, PAN Jianhong3. Text Clustering Method for Fragmented Reply Based on Dissimilarity Matrix [J]. Computer and Modernization, 2024, 0(09): 56-60.
[9]	GAO Meng, ZENG Xianwen. Improved Pelican Optimization Algorithm Based on Circle Mapping and#br# Adaptive t-Distribution Mutation [J]. Computer and Modernization, 2024, 0(09): 69-73.
[10]	YU Chenxi, GU Lin. Recognition and Warning of Elevator Abnormal Behavior Based on Human Skeleton [J]. Computer and Modernization, 2024, 0(09): 114-120.
[11]	WANG Yan, CONG Xin, ZI Lingling. Combining Knowledge Tracing and Graph Convolution for Knowledge Concept#br# Recommendation [J]. Computer and Modernization, 2024, 0(08): 17-23.
[12]	FU Shugang1, 2, 3. Multi-object Tracking of UAV Based on Improved YOLOX and New Data Association Method [J]. Computer and Modernization, 2024, 0(08): 59-66.
[13]	WEI Jiakun, WANG Jiarun. Survey on Gesture Recognition and Interaction [J]. Computer and Modernization, 2024, 0(08): 67-76.
[14]	WANG Tao1, 2, HUANG Dan1, 2, LIU Chanyi1, 2, ZHU Tao1, 2. Vehicle Detection in UAV Image Based on YOLOv5s [J]. Computer and Modernization, 2024, 0(08): 108-113.
[15]	YANG Jiang1, SUN Xiaomei1, XU Tao2. Stock Price Prediction Based on Business Content to Construct Stock Association Relationships [J]. Computer and Modernization, 2024, 0(07): 21-25.

Chinese Named Entity Recognition with Fusion of Lexicon Information and Sentence Semantics#br# #br#

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments