基于Self-Attention模型的机器翻译系统

doi:10.3969/j.issn.1006-2475.2019.07.002

计算机与现代化 ›› 2019, Vol. 0 ›› Issue (07): 9-.doi: 10.3969/j.issn.1006-2475.2019.07.002

基于Self-Attention模型的机器翻译系统

(河海大学计算机与信息学院，江苏南京211100)

收稿日期:2019-01-11 出版日期:2019-07-05 发布日期:2019-07-08
作者简介:师岩(1993-)，男，河北巨鹿人，硕士研究生，研究方向：自然语言处理，E-mail: yansirsy@qq.com; 王宇(1979-)，男，研究员，博士，研究方向：云计算技术，E-mail: won9805@hhu.edu.cn; 吴水清(1994-)，女，硕士研究生，研究方向：目标检测与识别，E-mail: wsq30332@163.com。
基金资助:
国家自然科学基金青年科学基金资助项目(61103017); 中国科学院感知中国先导专项子课题(XDA06040504)

Machine Translation System Based on Self-Attention Model

(College of Computer and Information, Hohai University, Nanjing 211100, China)

Received:2019-01-11 Online:2019-07-05 Published:2019-07-08

摘要/Abstract

摘要： 近几年来神经机器翻译（Neural Machine Translation, NMT）发展迅速，Seq2Seq框架的提出为机器翻译带来了很大的优势，可以在观测到整个输入句子后生成任意输出序列。但是该模型对于长距离信息的捕获能力仍有很大的局限，循环神经网络（RNN）、 LSTM网络都是为了改善这一问题提出的，但是效果并不明显。注意力机制的提出与运用则有效地弥补了该缺陷。Self-Attention模型就是在注意力机制的基础上提出的，本文使用Self-Attention为基础构建编码器-解码器框架。本文通过探讨以往的神经网络翻译模型，分析Self-Attention模型的机制与原理，通过TensorFlow深度学习框架对基于Self-Attention模型的翻译系统进行实现，在英文到中文的翻译实验中与以往的神经网络翻译模型进行对比，表明该模型取得了较好的翻译效果。

关键词: 神经机器翻译, Seq2Seq框架, 注意力机制, Self-Attention模型

Abstract: In recent years, neural machine translation (NMT) has developed rapidly. The proposed Seq2Seq framework brings great advantages to machine translation. It can generate arbitrary output sequences after observing the entire input sentence. However, this model still has great limitations on the ability to capture long-distance information. The proposed recurrent neural network (RNN) and LSTM network were all proposed to improve this problem, but the effect is not obvious. The presentation of the attention mechanism effectively compensates for this deficiency. The Self-Attention model is proposed on the basis of attention mechanism, and an encoder-decoder framework is built based on Self-Attention. This paper explores the previous neural network translation model. The mechanism and principle of the Self-Attention model are analyzed. The translation system is realized based on Self-Attention model by TensorFlow deep learning framework. In the English-to-Chinese translation experiment, compared with the previous neural network translation model, it shows that the model has a good translation effect.

Key words: neural machine translation, Seq2Seq, attention mechanism, Self-Attention model

中图分类号:

TP391

师岩，王宇，吴水清. 基于Self-Attention模型的机器翻译系统[J]. 计算机与现代化, 2019, 0(07): 9-.

SHI Yan, WANG Yu, WU Shui-qing. Machine Translation System Based on Self-Attention Model[J]. Computer and Modernization, 2019, 0(07): 9-.

参考文献

［1］张家俊,宗成庆. 神经网络语言模型在统计机器翻译中的应用［J］. 情报工程, 2017,3(3):21-28.
［2］刘洋. 神经机器翻译前沿进展［J］. 计算机研究与发展, 2017,54(6):1144-1149.
［3］ SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks［C］// Advances in Neural Information Processing Systems 27 (NIPS 2014). 2014:3104-3112.
［4］ BAHDANAU D, CHO K H, BENGIO Y. Neural Machine Translation by Jointly Learning to Align and Translate［J/OL］. (2014-12-19)［2018-12-10］. https://arxiv.org/pdf/1409.0473v4.pdf.
［5］ KALCHBRENNER N, BLUNSOM P. Recurrent continuous translation models［C］// Proceedings of the 2013 ACL Conference on Empirical Methods in Natural Language Processing (EMNLP). 2013:1700-1709.
［6］ CHO K H, VAN MERRIENBOER B, BAHDANAU D, et al. On the properties of neural machine translation: Encoder-Decoder approaches［C］// Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 2014:103-111.
［7］ DYER C, KUNCORO A, BALLESTEROS M, et al. Recurrent neural network grammars［C］// Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016:199-209.
［8］ CHUNG J Y, GULCEHRE , CHO K H, et al. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling［J/OL］. (2014-12-11)［2018-12-10］. https://arxiv.org/pdf/1412.3555.pdf.
［9］ GULCEHRE , FIRAT O, XU K, etal. On Using Monolingual Corpora in Neural Machine Translation［J/OL］. (2015-06-12)［2018-12-10］. https://arxiv.org/pdf/1503.03535.pdf.
［10］WU Y H, SCHUSTER M, CHEN Z F, et al. Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation［J/OL］. (2016-09-26)［2018-12-10］. https://arxiv.org/pdf/1609.08144v1.pdf.
［11］PASCANU A, MIKOLOV T, BENGIO Y. On the Difficulty of Training Recurrent Neural Networks［J/OL］. (2013-02-16)［2018-12-10］. https://arxiv.org/pdf/1211.5063.pdf.
［12］HOCHREITER S, BENGIO Y, FRASCONI P, et al. Gradient flow in recurrent nets: The difficulty of learning long-term dependencies［M］// A Field Guide to Dynamical Recurrent Neural Networks. Wiley, 2001:237-243.
［13］〖JP+2〗HOCHREITER S, SCHMIDHUBER J. Long short-term memory［J］. Neural Computation, 1997,9(8):1735-1780.
［14］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［15］KIM Y, DENTON C, HOANG L, et al. Structured Attention Networks［J/OL］. (2017-02-16)［2018-12-10］. https://arxiv.org/pdf/1702.00887.pdf.
［16］LUONG M T, PHAM H, MANNING C D. Effective Approaches to Attention-based Neural Machine Translation［J/OL］. (2015-09-20)［2018-12-10］. https://arxiv.org/pdf/1508.04025.pdf.
［17］VASWANI A, SHAZEER N, PARMAR N, et al. Attention Is All You Need［J/OL］. (2017-06-30)［2018-12-10］. https://arxiv.org/pdf/1706.03762v4.pdf.
［18］BRITZ D, GOLDIE A, LUONG M T, et al. Massive Exploration of Neural Machine Translation Architectures［J/OL］. (2017-03-21)［2018-12-10］. https://arxiv.org/pdf/1703.03906.pdf.
［19］CHO K H, VAN MERRIENBOER B, GULCEHRE , et al. Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation［J/OL］. (2014-09-03)［2018-12-10］. https://arxiv.org/pdf/1406.1078.pdf.
［20］KAISER L, BENGIO S. Can Active Memory Replace Attention?［J/OL］. (2016-10-27)［2018-12-10］. https://arxiv.org/pdf/1610.08613v1.pdf.
［21］BA J L, KIROS J R, HINTON G E. Layer Normalization［J/OL］. (2016-07-21)［2018-12-10］. https://arxiv.org/pdf/1607.06450.pdf.
［22］GEHRING J, AULI M, GRANGIER D, et al. Convolutional Sequence to Sequence Learning［J/OL］. (2017-05-12)［2018-12-10］. https://arxiv.org/pdf/1705.03122v2.pdf.
［23］PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation［C］// Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. 2002,7:311-318.

[1]	何思达, 陈平华. 基于意图的轻量级自注意力序列推荐模型[J]. 计算机与现代化, 2024, 0(12): 1-9.
[2]	赵晨阳, 薛涛, 刘俊华. 基于改进Stable Diffusion的时尚服饰图案生成[J]. 计算机与现代化, 2024, 0(12): 15-23.
[3]	黄庭培1, 马禄彪1, 李世宝2, 刘建航1. 基于WiFi和原型网络的手势识别方法[J]. 计算机与现代化, 2024, 0(12): 34-39.
[4]	张晓东1, 白广芝1, 李敏1, 李昊洋2. 基于经验小波变换的油气井产量预测模型 [J]. 计算机与现代化, 2024, 0(12): 53-58.
[5]	刘云海1, 冯广1, 吴晓婷2, 杨群2. 复杂施工场景下的安全帽佩戴检测算法[J]. 计算机与现代化, 2024, 0(12): 66-71.
[6]	谷岳, 邓松峰, 沈霁, 穆文涛, 赵恩棋. 基于改进YOLOv8的SAR舰船目标检测算法[J]. 计算机与现代化, 2024, 0(12): 78-83.
[7]	王艳媛, 茅正冲. 中英文场景文本图像的检测和识别算法[J]. 计算机与现代化, 2024, 0(12): 84-90.
[8]	李钧超1, 尤菲1, 张超2, 苏乐乐2, 龚龑2. 基于新型多目标浣熊优化算法的BiLSTM-Attention#br# 预测模型及误差分析[J]. 计算机与现代化, 2024, 0(11): 70-76.
[9]	张宇1, 2, 黎靖1, 2, 马铭1, 2, 王众祥1, 2, 孙妍1, 2. YOLOLW:一个新的轻量级目标检测模型[J]. 计算机与现代化, 2024, 0(11): 91-98.
[10]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[11]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[12]	魏学诚1, 江凌云1, 李研2, 何非2. 改进YOLOv5的路侧单目视角小目标检测算法[J]. 计算机与现代化, 2024, 0(10): 27-34.
[13]	杜猛俊1, 李昂1, 童俊1, 钱锦1, 康恺1, 王若丁1, 靳文星2. 基于改进极限学习算法的电力信息数据融合模型[J]. 计算机与现代化, 2024, 0(10): 61-64.
[14]	杨世军1, 狄广义1, 高军1, 陈见飞1, 王耀坤1, 季晓晗2. 跨模态注意力融合和信息感知的情感一致检测[J]. 计算机与现代化, 2024, 0(10): 113-119.
[15]	候聪颖, 杨文清, 王召, 程聪. 基于时频自注意力残差时序卷积网络的语音增强[J]. 计算机与现代化, 2024, 0(09): 20-24.

基于Self-Attention模型的机器翻译系统

Machine Translation System Based on Self-Attention Model

PDF (PC)

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价