基于BERT与图像自注意力机制的文本匹配模型

摘要/Abstract

摘要： 为了提高文本匹配的准确率，针对BERT（Bidirectional Encoder Representations from Transformers）模型和MatchPyramid模型在文本匹配中存在的局限性，提出一种基于BERT与图像自注意力机制的文本匹配模型。首先，利用BERT模型将一对文本编码为单词级别的特征向量。其次，根据词向量构建2段文本之间的词与词相似性的匹配矩阵，并将其视为单通道的图像表示。然后，通过图像的自注意力机制生成匹配矩阵的自注意力特征矩阵。最后，将匹配矩阵与自注意力特征矩阵连接为多通道，利用卷积神经网络捕获图像中的文本匹配信号，并将匹配信号与BERT模型输出的［CLS］编码向量连接后，输入全连接层得到2段文本的相似度。实验结果表明，该模型在WikiQA数据集上相比于BERT模型、MatchPyramid模型和其他文本匹配模型，可以有效地提高MAP和ＭRR衡量指标，验证了该模型的有效性。

关键词: 匹配矩阵, 图像自注意力机制, 特征融合, 文本匹配, BERT模型

Abstract: In order to improve the accuracy of text matching, a text matching model based on BERT(Bidirectional Encoder Representations from Transformers) and self-attention mechanism of image is proposed to overcome the limitations of BERT model and MatchPyramid model in text matching. Firstly, a pair of text is encoded into word-level feature vectors by using the BERT model. Secondly, the matching matrix of word to word similarity between two texts is constructed according to the word vector, which is regarded as a single channel image representation. Then the self-attention feature matrix of the matching matrix is generated by the self-attention mechanism of image. Finally, the matching matrix and the self-attention feature matrix are connected into multi-channel to capture the text matching signals in the image by the convolutional neural network. After the matching signal is connected with the coding vector called ［CLS］ which is yielded by the BERT model, the similarity of the two texts is obtained by inputting the fully connected neural layer. The experimental results show that the model can effectively improve the MAP and MRR metrics compared with BERT model, MatchPyramid model and other text matching models on WikiQA dataset, and the effectiveness of the model is verified.

Key words: matching matrix, self-attention mechanism of image, feature fusion, text matching, BERT model

宋爽, 陆鑫达. 基于BERT与图像自注意力机制的文本匹配模型[J]. 计算机与现代化, 2021, 0(11): 12-16.

SONG Shuang, LU Xin-da. Text Matching Model Based on BERT and Self-attention Mechanism of Image[J]. Computer and Modernization, 2021, 0(11): 12-16.

参考文献

［1］中国互联网络信息中心. CNNIC发布第47次《中国互联网络发展状况统计报告》［EB/OL］. （2021-02-03)［2021-02-03］. http://cnnic.cn/gywm/xwzx/rdxw/20172017_ 7084/202102/t20210203_71364.htm.
［2］陈肇雄，高庆狮. 自然语言处理［J］. 计算机研究与发展, 1989(11):1-16.
［3］ LIN D K, PANTEL P. Discovery of inference rules for question-answering［J］. Natural Language Engineering, 2001,7(4):343-360.
［4］ LI H, XU J. Semantic matching in search［J］. Foundations and Trends in information Retrieval, 2014,7(5):343-469.
［5］高璐璐,赵雯. 机器翻译研究综述［J］. 中国外语, 2020,17(6):97-103.
［6］ YIN W P, SCHUTZE H. Convolutional neural network for paraphrase identification［C］// Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2015:901-911.
［7］ FENG M W, XIANG B, GLASS M R, et al. Applying deep learning to answer selection: A study and an open task［C］// Proceedings of 2015 IEEE Workshop on Automatic Speech Recognition and Understanding. 2015:813-820.
［8］朱大奇. 人工神经网络研究现状及其展望［J］. 江南大学学报(自然科学版), 2004,3(1):103-110.
［9］陈珂,梁斌,柯文德,等. 基于多通道卷积神经网络的中文微博情感分析［J］. 计算机研究与发展, 2018,55(5):945-957.
［10］HU B T, LU Z D, LI H, et al. Convolutional neural network architectures for matching natural language sentences［C］// Proceedings of the 2014 Advances in Neural Information Processing Systems. 2014:2042-2050.
［11］PANG L, LAN Y Y, GUO J F, et al. Text matching as image recognition［C］// Proceedings of the 30th AAAI Conference on Artificial Intelligence. 2016:2793-2799.
［12］LECUN Y, BENGIO Y. Convolutional networks for images, speech, and time series［M］// The Handbook of Brain Theory and Neural Networks.MIP Press, Cambridge, 1998:255-258.
［13］SZEGEDY C, LIU W, JIA Y Q, et al. Going deeper with convolutions［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. 2015:1-9.

［14］张琦,张荣梅,陈彬. 基于深度学习的图像识别技术研究综述［J］. 河北省科学院学报, 2019,36(3):28-36.

［15］丁旭甫,王宏生. 基于深度学习的图像识别技术的研究［J］. 信息与电脑(理论版), 2019(7):124-125.
［16］DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding［J］. arXiv preprint arXiv:1810.04805, 2018.
［17］LAN Z Z, CHEN M D, GOODMAN S, et al. ALBERT: A lite BERT for self-supervised learning of language representations［J］. arXiv preprint arXiv:1909.11942, 2019.
［18］ZHANG Z S, WU Y W, ZHAO H, et al. Semantics-aware BERT for language understanding［C］// Proceedings of the 34th AAAI Conference on Artificial Intelligence. 2020:9628-9635.
［19］ABBAS F, MALIK M K, RASHID M U, et al. WikiQA: A question answering system on Wikipedia using freebase, DBpedia and Infobox［C］// Proceedings of the 6th IEEE International Conference on Innovative Computing Technology. 2016:185-193.
［20］YANG Y, YIH W T, MEEK C. WikiQA: A challenge dataset for open-domain question answering［C］// Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. 2015:2013-2018.
［21］ROCKTASCHEL T, GREFENSTETTE E, HERMANN K, et al. Reasoning about entailment with neural attention［C］// Proceedings of the 4th International Conference on Learning Representations. 2016:716-723.
［22］YIN W P, SCHUTZE H. MultiGranCNN: An architecture for general matching of text chunks on multiple levels of granularity［C］// Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics & the 7th International Joint Conference on Natural Language Processing. 2015:63-73.
［23］YIN W P, SCHUTZE H, XIANG B, et al. ABCNN: Attention-based convolutional neural network for modeling sentence pairs［J］. Transactions of the Association for Computational Linguistics, 2016,4:259-272.
［24］TYMOSHENKO K, MOSCHITTI A. Cross-pair text representations for answer sentence selection［C］// Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018:2162-2173.
［25］WANG Z G, HAMZA W, FLORIAN R. Bilateral multi-perspective matching for natural language sentences［C］// Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017:4144-4150.
［26］TAY Y, TUAN A L, HUI S C. Hyperbolic representation learning for fast and efficient neural question answering［C］// Proceedings of the 11th ACM International Conference on Web Search and Data Mining. 2018:583-591.
［27］SHAO B, GONG Y Y, QI W Z, et al. Aggregating bidirectional encoder representations using matchLSTM for sequence matching［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019:6059-6063.

[1]	张思敏, 刘新妹, 殷俊龄, 李宝玲. 基于YOLOv7改进的PCB缺陷检测方法[J]. 计算机与现代化, 2024, 0(12): 45-52.
[2]	王海洋, 弓同鑫, 杨锦涛, 陈再龙. 多尺度时间编码的工业园区短期负荷预测[J]. 计算机与现代化, 2024, 0(12): 59-65.
[3]	马钰, 杨勇, 任鸽, 帕力旦·吐尔逊. 基于GCN和微调BERT的作文自动评分方法[J]. 计算机与现代化, 2024, 0(09): 33-37.
[4]	郑尚坡1, 陈德富1, 李坚利2, 林国贤2, 王星平3. 基于改进YOLOv5s和DeepSORT的行人跟踪算法[J]. 计算机与现代化, 2024, 0(08): 54-58.
[5]	庞梅, 汪珙, 詹泳, 黄哲法. 基于YOLOv5改进算法的海洋水下垃圾检测方法[J]. 计算机与现代化, 2024, 0(07): 120-126.
[6]	符灵利, 邱宇, 张新晨 . 基于改进U-Net多特征融合的血管分割#br#[J]. 计算机与现代化, 2024, 0(06): 76-82.
[7]	朱纷, 何立风, 孙爽, 张梦颖, 于佳佳. 基于形变残差和级联编码的胰腺分割模型[J]. 计算机与现代化, 2024, 0(06): 83-88.
[8]	武昭盟1, 张成刚2. 适用于网络新闻数据的未配对跨模态哈希方法[J]. 计算机与现代化, 2024, 0(03): 54-60.
[9]	宁娟, 周庆华, 曾小为. 改进YOLOv7算法在西林瓶轧盖缺陷检测中的应用[J]. 计算机与现代化, 2023, 0(12): 82-86.
[10]	谷明轩, 范冰冰. 基于多模态特征融合的抑郁症识别[J]. 计算机与现代化, 2023, 0(10): 17-22.
[11]	陈俊义. 基于图节点动静态特征的健康事件预测模型[J]. 计算机与现代化, 2023, 0(10): 39-44.
[12]	邢世帅, 刘丹凤, 王立国, 潘月涛, 孟灵鸿, 岳晓晗. 基于空间注意力残差网络的图像超分辨率重建模型[J]. 计算机与现代化, 2023, 0(10): 45-52.
[13]	陈嘉敏, 张伯泉, 麦海鹏. 基于特征融合的海马体分割[J]. 计算机与现代化, 2023, 0(08): 1-6.
[14]	王鸿, 葛红. 基于注意力机制和语义相似度的跨模态哈希检索[J]. 计算机与现代化, 2023, 0(08): 44-53.
[15]	王杰, 潘凤, 张艳莎, 谭棉, 严晓波, 王林, . 融合带权非局部模块的铝型材表面缺陷分类[J]. 计算机与现代化, 2023, 0(05): 86-92.