Computer and Modernization ›› 2022, Vol. 0 ›› Issue (05): 28-32.

Previous Articles     Next Articles

Tibetan-Chinese Bidirectional Machine Translation Based on VOLT

  

  1. (1. School of Information Science and Technology, Tibet University, Lhasa 850000, China;
    2. State Key Laboratory of Artificial Intelligence for Tibetan Information Technology in Tibet Autonomous Region, Lhasa 850000, China;
    3. Ministry of Education Engineering Research Center for Tibetan Information Technology, Lhasa 850000, China)
  • Online:2022-06-08 Published:2022-06-08

Abstract: The generation of Tibetan-Chinese vocabulary is not only the first step of Tibetan-Chinese bi-directional machine translation task, but also affects the performance of Tibetan-Chinese bi-directional machine translation. This paper improves the performance of downstream Tibetan-Chinese bidirectional translation by improving the generation of Tibetan-Chinese word lists. On the one hand, it starts with word list splicing, using normal word lists for high frequencies and byte pair encoding word lists for low frequencies, and finding the optimal word frequency threshold through iterative training; On the other hand, according to the optimal transport theory proposed by vocabulary learning approach, the Tibetan-Chinese vocabulary is generated, which is improved according to the characteristics of Tibetan language and applied to Tibetan-Chinese bidirectional translation. The experimental results show that, it is demonstrated that the byte pair encoding plus optimal transmission lexical learning method proposed in this paper for Tibetan language characteristics works best, reaching a BLEU value of 37.35 for the Tibetan-Chinese translation task and 27.60 for the Chinese-Tibetan translation task.

Key words: Tibetan word list, byte pair encoding, two-way Tibetan-Chinese translation, VOLT