基于ELMO的低资源神经机器翻译

计算机与现代化 ›› 2021, Vol. 0 ›› Issue (07): 38-42.

基于ELMO的低资源神经机器翻译

（1.东北石油大学计算机与信息技术学院,黑龙江大庆163318;
2.哈尔滨工业大学计算机科学与技术学院,黑龙江哈尔滨150001)

出版日期:2021-08-02 发布日期:2021-08-02
作者简介:王浩畅(1974—),女，黑龙江大庆人,教授,博士后,研究方向:人工智能,自然语言处理,数据挖掘,生物信息学,E-mail: kinghaosing@gmail.com; 通信作者：孙孟冉(1994—),男，安徽滁州人,硕士研究生,研究方向:神经机器翻译，E-mail: sunmr@foxmail.com; 赵铁军(1962—),男，黑龙江哈尔滨人,教授,博士生导师,博士,研究方向:机器翻译,自然语言处理，E-mail: tjzhao@hit.edu.cn。
基金资助:
国家自然科学基金资助项目(61402099, 61702093)

Low-resource Neural Machine Translation Based on ELMO

(1. School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China;
2. School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China)

Online:2021-08-02 Published:2021-08-02

摘要/Abstract

摘要： 低资源神经机器翻译的研究难点是缺乏大量的平行语料来给模型进行训练。随着预训练模型的发展，并且在各大自然语言处理任务中均取得很大的提升，本文提出一种融合ELMO预训练模型的神经机器翻译模型来解决低资源神经机器翻译问题。本文模型在土耳其语-英语低资源翻译任务上相比于反向翻译提升超过0.7个BLEU，在罗马尼亚语-英语翻译任务上提升超过0.8个BLEU。此外，在模拟的中-英、法-英、德-英、西-英这4组低资源翻译任务上相比于传统神经机器翻译模型分别提升2.3、3.2、2.6、3.2个BLEU。实验表明使用融合ELMO的模型来解决低资源神经机器翻译问题是有效的。

关键词: 低资源, 平行语料, 预训练模型, 神经机器, 翻译模型

Abstract: The difficulty in low-resource neural machine translation is lack of numerous parallel corpus to train the model. With the development of the pre-training model, it has made great improvements in various natural language processing tasks. In this paper, a neural machine translation model combining ELMO is proposed to solve the problem of low-resource neural machine translation. There are more than 0.7 BLEU improvements in the Turkish-English low-resource translation task compared to the back translation, and more than 0.8 BLEU improvements in the Romanian-English translation task. In addition, compared with the traditional neural machine translation model, the simulated low-resource translation tasks of Chinese-English, French-English, German-English and Spanish-English increase by 2.3, 3.2, 2.6 and 3.2 BLEU respectively. The experimental results show that the ELMO model is effective for low-resource neural machine translation.

Key words: low-resource, parallel corpus, pre-training model, neural machine, translation model

王浩畅, 孙孟冉, 赵铁军. 基于ELMO的低资源神经机器翻译[J]. 计算机与现代化, 2021, 0(07): 38-42.

WANG Hao-chang, SUN Meng-ran, ZHAO Tie-jun. Low-resource Neural Machine Translation Based on ELMO[J]. Computer and Modernization, 2021, 0(07): 38-42.

参考文献

［1］ CHO K, VAN MERRIENBOER B, GULCEHRE C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation［J］. arXiv preprint arXiv:1406.1078, 2014.
［2］ SUTSKEVER I, VINYALS O, LE Q V. Sequence to sequence learning with neural networks［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014:3104-3112.
［3］ CHO K, MONTREAL U D, BAHDANAU D, et al. On the properties of neural machine translation: Encoder-Decoder approaches［C］// Proceedings of the 8th Workshop on Syntax, Semantics and Structure in Statistical Translation. 2014:103-111.
［4］ MIKOLOV T, CHEN K, CORRADO G, et al. Efficient estimation of word representations in vector space［J］. arXiv preprint arXiv:1301.3781, 2013.
［5］ MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality［J］. arXiv preprint arXiv:1310.4546, 2013.
［6］ PETERS M E, NEUMANN M, IYYER M, et al. Deep contextualized word representations［J］. arXiv preprint arXiv:1802.05365, 2018.
［7］ RADFORD A, NARASIMHAN K, SALIMANS T, et al. Improving Language Understanding by Generative Pre-Training［R］. Technical Report, 2018.
［8］ DEVLIN J, CHANG M W, LEE K, et al. BERT: Pre-training of deep bidirectional transformers for language understanding［J］. arXiv preprint arXiv:1810.04805, 2018.
［9］ KOEHN P, KNOWLES R. Six challenges for neural machine translation［C］// Proceedings of the 1st Workshop on Neural Machine Translation. 2017:28-39.
［10］BROWN P F, COCKE J, DELLA PIETRA S A, et al. A statistical approach to machine translation［J］. Computational Linguistics, 1990,16(2):79-85.
［11］SENNRICH R, HADDOW B, BIRCH A. Improving neural machine translation models with monolingual data［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2015:86-96.
［12］CURREY A, BARONE A V M, HEAFIELD K. Copied monolingual data improves low-resource neural machine translation［C］// Proceedings of the 2nd Conference on Machine Translation. 2017:148-156.
［13］FADAEE M, BISAZZA A, MONZ C. Data augmentation for low-resource neural machine translation［C］// Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics. 2017:567-573.
［14］NGUYEN X P, OTY S, KUI W, et al. Data diversification: A simple strategy for neural machine translation［J］. arXiv preprint arXiv: 1911.01986, 2019.
［15］ZOPH B, YURET D, MAY J, et al. Transfer learning for low-resource neural machine translation［C］// Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. 2016:1568-1575.
［16］NIU X, DENKOWSKI M, CARPUAT M. Bi-directional neural machine translation with synthetic parallel data［C］// Proceedings of the 2nd Workshop on Neural Machine Translation and Generation. 2018:84-91.
［17］BAZIOTIS C, HADDOW B, BIRCH A. Language model prior for low-resource neural machine translation［J］. arXiv preprint arXiv:2004.14928, 2020.
［18］GULCEHRE C, FIRAT O, XU K, et al. On using monolingual corpora in neural machine translation［J］. arXiv preprint arXiv:1503.03535, 2015.
［19］BAHDANAU D, CHO K, BENGIO Y. Neural machine translation by jointly learning to align and translation［J］. arXiv preprint arXiv:1409.0473, 2016.
［20］HOCHREITERS, SCHMIDHUBER J. Long short-term memory［J］. Neural Computation, 1997,9(8):1735-1780.
［21］BOJAR O, CHATTERJEE R, FEDERMANN C, et al. Findings of the 2016 conference on machine translation［C］// Proceedings of the 1st Conference on Machine Translation. 2016:131-198.
［22］SENNRICH R, HADDOW B, BIRCH A. Neural machine translation of rare words with subword units［C］// Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics. 2016:1715-1725.
［23］KINGMA D P, BAJ. Adam: A method for stochastic optimization［J］. arXiv preprint arXiv:1412.6980, 2014.
［24］BOULANGER-LEWANDOWSKI N, BENGIO Y, VINCENT P. Audio chord recognition with recurrent neural networks［C// Proceedings of the 14th International Society for Music Information Retrieval Conference. 2013:335-340.
［25］GRAVES A. Sequence transduction with recurrent neural networks［J］. arXiv preprint arXiv:1211.3711, 2012.
［26］PAPINENI K, ROUKOS S, WARD T, et al. BLEU: A method for automatic evaluation of machine translation［C］// Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. 2002:311-318.
［27］VASWANI A, SHAZEER N, PARMAR N, et al. Attention is all you need［C］// Proceedings of the 31st Conference on Neural Information Processing Systems. 2017:5998-6008.

[1]	张可1, 艾中良2, 刘忠麟3, 顾平莉1, 刘学林4. 基于多元组匹配损失的司法论辩理解方法[J]. 计算机与现代化, 2024, 0(06): 115-120.
[2]	仁青卓玛1, 2, 3, 拥措1, 2, 3, 唐超超1, 2, 3. 面向藏汉神经机器翻译的数据筛选方法[J]. 计算机与现代化, 2024, 0(06): 19-24.
[3]	王浩畅, 刘如意. 基于预训练模型的关系抽取研究综述[J]. 计算机与现代化, 2023, 0(01): 49-57.
[4]	张紫芸, 王文发, 马乐荣, 丁苍峰. 文本摘要模型的研究进展[J]. 计算机与现代化, 2022, 0(06): 56-66.
[5]	孙李丽, 郭琳, 文旭, 张文诺. 基于双向GRU神经机器模型的乡土小说翻译方法[J]. 计算机与现代化, 2021, 0(04): 27-31.
[6]	师岩，王宇，吴水清. 基于Self-Attention模型的机器翻译系统[J]. 计算机与现代化, 2019, 0(07): 9-.
[7]	明芳，徐金安，王楠，陈钰枫，张玉洁. 融合时态特征的日英层次短语翻译模型[J]. 计算机与现代化, 2017, 0(6): 1-7.