计算机与现代化 ›› 2022, Vol. 0 ›› Issue (07): 85-90.

• 人工智能 • 上一篇    下一篇

基于残差门控循环卷积和注意力机制的端到端光学乐谱识别方法

  

  1. (东华理工大学信息工程学院,江西南昌330013)
  • 出版日期:2022-07-25 发布日期:2022-07-25
  • 作者简介:孙弘扬(1997—),男,江西南昌人,硕士研究生,研究方向:计算机视觉,E-mail: sunhy@ecut.edu.cn;王尚(1997—),男,江西丰城人,硕士研究生,研究方向:计算机视觉,E-mail: 1510943347@qq.com。
  • 基金资助:
    国家自然科学基金资助项目(61662002); 江西省自然科学基金资助项目(20171BAB202005); 江西省核地学数据科学与系统工程技术研究中心开放基金资助项目(JETRCNGDSS201802)

End-to-end Optical Music Recognition Method Based on Residual Gated Recurrent Convolutional Neural Network and Attention Mechanism

  1. (School of Information Engineering, East China University of Technology, Nanchang 330013, China)
  • Online:2022-07-25 Published:2022-07-25

摘要: 光学乐谱识别对推动音乐智能化与数字化有着重大意义。传统的乐谱识别流程冗杂,易导致错误积累,但目前基于序列建模的乐谱识别方法不能从全尺度上获取音符上下文信息,在识别效果上仍有提升空间。为此,提出一种基于残差门控循环卷积和注意力机制的端到端光学乐谱识别方法。以残差门控循环卷积作为骨干网络,丰富模型提取上下文信息能力;结合一个注意力机制解码器,能更好地挖掘乐谱特征信息及其内部相关性,增强模型表征能力并对乐谱图像中的音符及音符序列进行识别。实验结果表明,改进后的网络与原卷积循环神经网络(CRNN)模型相比,符号错误率和序列错误率均显著下降。

关键词: 光学乐谱识别, 门控循环卷积; 注意力机制; 端到端

Abstract: Optical music recognition(OMR) is of great significance to promote the intelligence and digitization of music. The traditional music recognition process is complicated and easy to lead to the accumulation of errors, but current sequence modeling-based optical music recognition methods cannot obtain notes context information from the full scale, there is still room for improvement in the recognition effect. To this end, this paper proposes an end-to-end optical music recognition method based on residual gated recurrent convolution and attention mechanism. The method uses residual gated recurrent convolution as the backbone network to enrich the model’s ability to extract contextual information; Combined with an attention mechanism decoder, the feature information of the music score and its internal correlation can be better mined to enhance the representation ability of the model and identify the notes and notes sequences in the score image. The experimental results show that, compared with the Convolutional Recurrent Neural Network (CRNN) model, the improved network has a significant decrease in both the symbol error rate and the sequence error rate.

Key words: optical music recognition, gated recurrent convolution, attention mechanism, end-to-end