计算机与现代化

• 人工智能 •    下一篇

融合时态特征的日英层次短语翻译模型

  

  1. (北京交通大学计算机与信息技术学院,北京 100044)
  • 收稿日期:2016-10-08 出版日期:2017-06-23 发布日期:2017-06-23
  • 作者简介:明芳(1991-),女,四川自贡人,北京交通大学计算机与信息技术学院硕士研究生,研究方向:自然语言处理,机器翻译; 徐金安(1970-),男,副教授,博士,研究方向:自然语言处理,机器翻译; 王楠(1992-),女,硕士研究生,研究方向:自然语言处理,机器翻译; 陈钰枫(1981-),女,副教授,博士,研究方向:自然语言处理,机器翻译; 张玉洁(1961-),女,教授,博士,研究方向:自然语言处理,机器翻译。
  • 基金资助:
    国家自然科学基金资助项目(61370130, 61473294); 中央高校基本科研业务费专项资金资助项目(2015JBM033); 科学技术部国际科技合作计划项目(K11F100010)

A Japanese-English Hierarchical Phrase-based Translation Model Integrating Tense Features

  1. (School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044, China)
  • Received:2016-10-08 Online:2017-06-23 Published:2017-06-23

摘要: 针对基于层次短语翻译模型的统计机器翻译使用上下文信息有限,时态翻译质量不高的问题,提出一种融合时态特征的日英统计机器翻译方法。该方法通过引入翻译规则的时态分类约束信息,解码器可以根据每条规则的潜在时态分类,为相应时态的句子匹配到最合适的规则进行翻译。首先从双语训练语料中抽取时态特征构建最大熵分类模型,然后再抽取包含各类时态信息的层次短语规则的时态特征,最后将规则的时态分类结果作为一类新特征,融入基于层次短语的翻译系统中。实验结果表明,与基线系统相比,该方法在多个测试集上提高了翻译质量,在一定程度上解决了日英层次短语模型的时态翻译问题。

关键词: 层次短语翻译模型, 时态特征, 最大熵分类模型

Abstract: In view of the problem that limited contextual information is used in the hierarchical phrase-based (HPB) translation model and the quality of tense translation is not high, this paper proposes a method to integrate tense features into Japanese-English HPB translation. Our method adopts the information of tense as constraints for tense classification model construction, and integrates tense features into HPB translation model, the decoder can get the best-matching rules according to the results of potential tense classification of rules. Firstly, we extract training data from bilingual training corpus to train tense classification models by using maximum entropy. Secondly, we extract tense features from hierarchy phrase rules to classify each kind of rules which include tense information, then we take the tense classification results as a kind of new translation features, and integrate the features into hierarchy phrase-based translation model. The experimental results show that our method can achieve good performance in Japanese-English HPB translation.

Key words: hierarchical phrase-based translation model, tense features, maximum entropy classification model

中图分类号: