Machine Translation System Based on Self-Attention Model

doi:10.3969/j.issn.1006-2475.2019.07.002

Abstract

Abstract: In recent years, neural machine translation (NMT) has developed rapidly. The proposed Seq2Seq framework brings great advantages to machine translation. It can generate arbitrary output sequences after observing the entire input sentence. However, this model still has great limitations on the ability to capture long-distance information. The proposed recurrent neural network (RNN) and LSTM network were all proposed to improve this problem, but the effect is not obvious. The presentation of the attention mechanism effectively compensates for this deficiency. The Self-Attention model is proposed on the basis of attention mechanism, and an encoder-decoder framework is built based on Self-Attention. This paper explores the previous neural network translation model. The mechanism and principle of the Self-Attention model are analyzed. The translation system is realized based on Self-Attention model by TensorFlow deep learning framework. In the English-to-Chinese translation experiment, compared with the previous neural network translation model, it shows that the model has a good translation effect.

Key words: neural machine translation, Seq2Seq, attention mechanism, Self-Attention model

CLC Number:

TP391

SHI Yan, WANG Yu, WU Shui-qing. Machine Translation System Based on Self-Attention Model[J]. Computer and Modernization, 2019, 0(07): 9-.

References

［1］张家俊,宗成庆. 神经网络语言模型在统计机器翻译中的应用［J］. 情报工程, 2017,3(3):21-28.
［2］刘洋. 神经机器翻译前沿进展［J］. 计算机研究与发展, 2017,54(6):1144-1149.
［3］ SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks［C］// Advances in Neural Information Processing Systems 27 (NIPS 2014). 2014:3104-3112.
［4］ BAHDANAU D, CHO K H, BENGIO Y. Neural Machine Translation by Jointly Learning to Align and Translate［J/OL］. (2014-12-19)［2018-12-10］. https://arxiv.org/pdf/1409.0473v4.pdf.
［5］ KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models［C］// Proceedings of the 2013 ACL Conference on Empirical Methods in Natural Language Processing (EMNLP). 2013:1700-1709.
［6］ CHO K H, VAN MERRIENBOER B, BAHDANAU D, et al. On the properties of neural machine translation: Encoder-Decoder approaches［C］// Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 2014:103-111.
［7］ DYER C, KUNCORO A, BALLESTEROS M, et al. Recurrent neural network grammars［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016:199-209.
［8］ CHUNG J Y, GULCEHRE , CHO K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling［J/OL］. (2014-12-11)［2018-12-10］. https://arxiv.org/pdf/1412.3555.pdf.
［9］ GULCEHRE , FIRAT O, XU K, etal. On Using Monolingual Corpora in Neural Machine Translation［J/OL］. (2015-06-12)［2018-12-10］. https://arxiv.org/pdf/1503.03535.pdf.
［10］WU Y H, SCHUSTER M, CHEN Z F, et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation［J/OL］. (2016-09-26)［2018-12-10］. https://arxiv.org/pdf/1609.08144v1.pdf.
［11］PASCANU A, MIKOLOV T, BENGIO Y. On the Difficulty of Training Recurrent Neural Networks［J/OL］. (2013-02-16)［2018-12-10］. https://arxiv.org/pdf/1211.5063.pdf.
［12］HOCHREITER S, BENGIO Y, FRASCONI P, et al. Gradient flow in recurrent nets: The difficulty of learning long-term dependencies［M］// A Field Guide to Dynamical Recurrent Neural Networks. Wiley, 2001:237-243.
［13］〖JP+2〗HOCHREITER S, SCHMIDHUBER J. Long short-term memory［J］. Neural Computation, 1997,9(8):1735-1780.
［14］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［15］KIM Y, DENTON C, HOANG L, et al. Structured Attention Networks［J/OL］. (2017-02-16)［2018-12-10］. https://arxiv.org/pdf/1702.00887.pdf.
［16］LUONG M T, PHAM H, MANNING C D. Effective Approaches to Attention-based Neural Machine Translation［J/OL］. (2015-09-20)［2018-12-10］. https://arxiv.org/pdf/1508.04025.pdf.
［17］VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need［J/OL］. (2017-06-30)［2018-12-10］. https://arxiv.org/pdf/1706.03762v4.pdf.
［18］BRITZ D, GOLDIE A, LUONG M T, et al. Massive Exploration of Neural Machine Translation Architectures［J/OL］. (2017-03-21)［2018-12-10］. https://arxiv.org/pdf/1703.03906.pdf.
［19］CHO K H, VAN MERRIENBOER B, GULCEHRE , et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation［J/OL］. (2014-09-03)［2018-12-10］. https://arxiv.org/pdf/1406.1078.pdf.
［20］KAISER L, BENGIO S. Can Active Memory Replace Attention?［J/OL］. (2016-10-27)［2018-12-10］. https://arxiv.org/pdf/1610.08613v1.pdf.
［21］BA J L, KIROS J R, HINTON G E. Layer Normalization［J/OL］. (2016-07-21)［2018-12-10］. https://arxiv.org/pdf/1607.06450.pdf.
［22］GEHRING J, AULI M, GRANGIER D, et al. Convolutional Sequence to Sequence Learning［J/OL］. (2017-05-12)［2018-12-10］. https://arxiv.org/pdf/1705.03122v2.pdf.
［23］PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation［C］// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002,7:311-318.

[1]	HE Sida, CHEN Pinghua. Intent-based Lightweight Self-Attention Network for Sequential Recommendation [J]. Computer and Modernization, 2024, 0(12): 1-9.
[2]	ZHAO Chenyang, XUE Tao, LIU Junhua. Fashion Clothing Pattern Generation Based on Improved Stable Diffusion [J]. Computer and Modernization, 2024, 0(12): 15-23.
[3]	HUANG Tingpei1, MA Lubiao1, LI Shibao2, LIU Jianhang1. Gesture Recognition Method Based on WiFi and Prototypical Network [J]. Computer and Modernization, 2024, 0(12): 34-39.
[4]	ZHANG Xiaodong1, BAI Guangzhi1, LI Min1, LI Haoyang2. Oil and Gas Well Production Prediction Model Based on Empirical Wavelet Transform [J]. Computer and Modernization, 2024, 0(12): 53-58.
[5]	WANG Yanyuan, MAO Zhengchong. Detection and Recognition Algorithms for Chinese and English Scene Text Images [J]. Computer and Modernization, 2024, 0(12): 84-90.
[6]	LI Junchao1, YOU Fei1, ZHANG Chao2, SU Lele2, GONG Yan2 . BiLSTM-Attention Prediction Model and Error Analysis #br# Based on Novel Multi-objective Coati Optimization Algorithm [J]. Computer and Modernization, 2024, 0(11): 70-76.
[7]	ZHANG Yu1, 2, LI Jing1, 2, MA Ming1, 2, WANG Zhongxiang1, 2, SUN Yan1, 2. YOLOLW: A Novel Lightweight Object Detection Model [J]. Computer and Modernization, 2024, 0(11): 91-98.
[8]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[9]	YANG Jun1, HU Wei1, ZHU Wenfu2. Visual SLAM Loop Closure Detection Algorithm Based on Improved MobileNetV3 [J]. Computer and Modernization, 2024, 0(10): 21-26.
[10]	WEI Xuecheng1, JIANG Lingyun1, LI Yan2, HE Fei2. Improved Roadside Monocular View Small Target Detection Algorithm Based on YOLOv5 [J]. Computer and Modernization, 2024, 0(10): 27-34.
[11]	DU Mengjun1, LI Ang1, TONG Jun1, QIAN Jin1, KANG Kai1, WANG Ruoding1, JIN Wenxing2. Power Information Data Fusion Model Based on Improved Extreme Learning Algorithm [J]. Computer and Modernization, 2024, 0(10): 61-64.
[12]	YANG Shijun1, DI Guangyi1, GAO Jun1, CHEN Jianfei1, WANG Yaokun1, JI Xiaohan2. Sentiment Consistency Detection Based on Cross Modal Attention Fusion and#br# Information Perception [J]. Computer and Modernization, 2024, 0(10): 113-119.
[13]	HOU Congying, YANG Wengqing, WANG Zhao, CHENG Cong. Speech Enhancement Based on Time-frequency Self-attention Residual Temporal#br# Convolutional Networks [J]. Computer and Modernization, 2024, 0(09): 20-24.
[14]	ZHANG Ze1, ZHANG Jianquan2, 3, ZHOU Guopeng2, 3. Camera Module Defect Detection Based on Improved YOLOv8s [J]. Computer and Modernization, 2024, 0(09): 107-113.
[15]	ZHENG Shangpo1, CHEN Defu1, LI Jianli2, LIN Guoxian2, WANG Xingping3. Pedestrian Tracking Algorithm Based on Improved YOLOv5s and DeepSORT [J]. Computer and Modernization, 2024, 0(08): 54-58.