End-to-end Optical Music Recognition Method Based on Residual Gated Recurrent Convolutional Neural Network and Attention Mechanism

Abstract

Abstract: Optical music recognition(OMR) is of great significance to promote the intelligence and digitization of music. The traditional music recognition process is complicated and easy to lead to the accumulation of errors, but current sequence modeling-based optical music recognition methods cannot obtain notes context information from the full scale, there is still room for improvement in the recognition effect. To this end, this paper proposes an end-to-end optical music recognition method based on residual gated recurrent convolution and attention mechanism. The method uses residual gated recurrent convolution as the backbone network to enrich the model’s ability to extract contextual information; Combined with an attention mechanism decoder, the feature information of the music score and its internal correlation can be better mined to enhance the representation ability of the model and identify the notes and notes sequences in the score image. The experimental results show that, compared with the Convolutional Recurrent Neural Network (CRNN) model, the improved network has a significant decrease in both the symbol error rate and the sequence error rate.

Key words: optical music recognition, gated recurrent convolution, attention mechanism, end-to-end

SUN Hong-yang, WANG Shang. End-to-end Optical Music Recognition Method Based on Residual Gated Recurrent Convolutional Neural Network and Attention Mechanism[J]. Computer and Modernization, 2022, 0(07): 85-90.

References

［1］ FUJINAGA I. Optical music recognition using projections［D］. McGill University, Canada, 1988.
［2］刘晓翔,张树生,王贺,等. 计算机光学乐谱识别技术［J］. 计算机工程, 2003,29(2):14-15.
［3］ REBELO A, FUJINAGA I, PASZKIEWICZ F, et al. Optical music recognition: State-of-the-art and open issues［J］. International Journal of Multimedia Information Retrieval, 2012,1(3):173-190.
［4］ CALVO-ZARAGOZA J, GALLEGO A J. A selectional auto-encoder approach for document image binarization［J］. Pattern Recognition, 2019,86:37-47.
［5］ CALVO-ZARAGOZA J, PERTUSA A, ONCINA J. Staff-line detection and removal using a convolutional neural network［J］. Machine Vision and Applications, 2017,28（5-6）:665-674.
［6］ PACHA A, CHOI KY, COUASNON B, et al. Handwritten music object detection: Open issues and baseline results［C］// 2018 13th IAPR International Workshop on Document Analysis Systems (DAS). 2018:163-168.
［7］ FORNS A, DUTTA A, GORDO A, et al. The ICDAR 2011 music scores competition: Staff removal and writer identification［C］// 2011 International Conference on Document Analysis and Recognition. 2011:1511-1515.
［8］宋爽,陆鑫达. 基于BERT与图像自注意力机制的文本匹配模型［J］. 计算机与现代化, 2021(11):12-16.
［9］张志刚,游安清. 基于CPN网络的车辆关键点检测［J］. 计算机与现代化, 2021(10):75-80.
［10］CALVO-ZARAGOZA J, HAJIC JR J, PACHAA. Understanding optical music recognition［J］. ACM Computing Surveys, 2020, 53(4): Article 77. DOI:10.1145/3397499.
［11］HAJIC JR.J， DORFER M， WIDMER G， et al. Towards full-pipeline handwritten OMR with musical symbol detection by U-nets［C］// Proceedings of the 19th International Society for Music Information Retrieval Conference. 2018:225-232.
［12］RONNEBERGER O， FISCHER P， BROX T. U-Net： Convolutional networks for biomedical image segmentation［C］// International Conference on Medical Image Computing and Computer-Assisted Intervention. 2015:234-241.
［13］TUGGENER L, ELEZI I, SCHMIDHUBER J, et al. Deep watershed detector for music object recognition［C］// Proceedings of the 19th International Society for Music Information Retrieval Conference. 2018:271-278.
［14］VAN DER WEL E， ULLRICH K. Optical music recognition with convolutional sequence-to-sequence models［C］// The 2017 International Society for Music Information Retrieval. 2017:731-737.
［15］CALVO-ZARAGOZA J, RIZO D. Camera-PrIMuS: Neural end-to-end optical music recognition on realistic monophonic scores［C］// Proceedings of the 19th International Society for Music Information Retrieval Conference. 2018:248-255.
［16］CALVO-ZARAGOZA J, RIZO D. End-to-end neural optical music recognition of monophonic scores［J］. Applied Sciences, 2018,8(4):606. DOI:10.3390/app8040606.
［17］SHI B G, BAI X, YAO C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(11):2298-2304.
［18］SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition［J］. arXiv preprint arXiv:1409.1556, 2014.
［19］HOCHREITER S, SCHMIDHUBER J. Long short-term memory［J］. Neural Computation, 1997,9(8):1735-1780.
［20］GRAVES A, FERNANDEZ S, GOMEZ F, et al. Connectionist temporal classification: Labelling unsegmented sequence data with recurrent neural networks［C］// Proceedings of the 23rd International Conference on Machine Learning. 2006:369-376.
［21］WANG J F, HU X L. Gated recurrent convolution neural network for OCR［C］// Proceedings of the 31st International Conference on Neural Information Processing Systems. 2017:334-343.
［22］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.
［23］LIN M, CHEN Q, YAN S C. Network in network［J］. arXiv preprint arXiv:1312.4400, 2013.
［24］MNIH V, HEESS N, GRAVES A， et al. Recurrent models of visual attention［C］// Proceedings of the 27th International Conference on Neural Information Processing Systems. 2014:2204-2212.
［25］陈聪,贺杰,陈佳. 混合连接时间/注意力机制端到端语音识别［J］. 控制工程, 2021,28(3):585-591.
［26］陈瑛,陈平平,林志坚. 基于层次自注意力的高效场景文本识别［J］. 无线电工程, 2022,52(1):70-75.
［27］CHOROWSKI J, BAHDANAU D, SERDYUK D, et al. Attention-based models for speech recognition［C］// Proceedings of the 28th International Conference on Neural Information Processing Systems. 2015:577-585.
［28］ALFARO-CONTRERAS M, CALVO-ZARAGOZA J, IESTA J M. Approaching end-to-end optical music recognition for homophonic scores［C］// Iberian Conference on Pattern Recognition and Image Analysis. 2019:147-158.
［29］BAR A, RIBA P, CALVO-ZARAGOZA J, et al. From optical music recognition to handwritten music recognition:A baseline［J］. Pattern Recognition Letters, 2019,123:1-8.
［30］XU Y, XU Y H, QIAN Q, et al. Towards understanding label smoothing［J］. arXiv preprint arXiv:2006.11653, 2020.
［31］吴琼,李锵,关欣. 基于多尺度残差式卷积神经网络与双向简单循环单元的光学乐谱识别方法［J］. 激光与光电子学进展, 2020,57(8):59-68.
［32］Verovio Website. Verovio | Music Notation Engraving Library for MEI with MusicXML and Humdrum Support and Various Toolkits［EB/OL］. ［2021-12-21］. https://www.verovio.org/index.xhtml.

[1]	HE Sida, CHEN Pinghua. Intent-based Lightweight Self-Attention Network for Sequential Recommendation [J]. Computer and Modernization, 2024, 0(12): 1-9.
[2]	ZHAO Chenyang, XUE Tao, LIU Junhua. Fashion Clothing Pattern Generation Based on Improved Stable Diffusion [J]. Computer and Modernization, 2024, 0(12): 15-23.
[3]	HUANG Tingpei1, MA Lubiao1, LI Shibao2, LIU Jianhang1. Gesture Recognition Method Based on WiFi and Prototypical Network [J]. Computer and Modernization, 2024, 0(12): 34-39.
[4]	ZHANG Xiaodong1, BAI Guangzhi1, LI Min1, LI Haoyang2. Oil and Gas Well Production Prediction Model Based on Empirical Wavelet Transform [J]. Computer and Modernization, 2024, 0(12): 53-58.
[5]	WANG Yanyuan, MAO Zhengchong. Detection and Recognition Algorithms for Chinese and English Scene Text Images [J]. Computer and Modernization, 2024, 0(12): 84-90.
[6]	LI Junchao1, YOU Fei1, ZHANG Chao2, SU Lele2, GONG Yan2 . BiLSTM-Attention Prediction Model and Error Analysis #br# Based on Novel Multi-objective Coati Optimization Algorithm [J]. Computer and Modernization, 2024, 0(11): 70-76.
[7]	ZHANG Yu1, 2, LI Jing1, 2, MA Ming1, 2, WANG Zhongxiang1, 2, SUN Yan1, 2. YOLOLW: A Novel Lightweight Object Detection Model [J]. Computer and Modernization, 2024, 0(11): 91-98.
[8]	QI Xian, LIU Daming, CHANG Jiaxin. Multi-view 3D Reconstruction Based on Improved Self-attention Mechanism [J]. Computer and Modernization, 2024, 0(11): 106-112.
[9]	YANG Jun1, HU Wei1, ZHU Wenfu2. Visual SLAM Loop Closure Detection Algorithm Based on Improved MobileNetV3 [J]. Computer and Modernization, 2024, 0(10): 21-26.
[10]	WEI Xuecheng1, JIANG Lingyun1, LI Yan2, HE Fei2. Improved Roadside Monocular View Small Target Detection Algorithm Based on YOLOv5 [J]. Computer and Modernization, 2024, 0(10): 27-34.
[11]	DU Mengjun1, LI Ang1, TONG Jun1, QIAN Jin1, KANG Kai1, WANG Ruoding1, JIN Wenxing2. Power Information Data Fusion Model Based on Improved Extreme Learning Algorithm [J]. Computer and Modernization, 2024, 0(10): 61-64.
[12]	YANG Shijun1, DI Guangyi1, GAO Jun1, CHEN Jianfei1, WANG Yaokun1, JI Xiaohan2. Sentiment Consistency Detection Based on Cross Modal Attention Fusion and#br# Information Perception [J]. Computer and Modernization, 2024, 0(10): 113-119.
[13]	HOU Congying, YANG Wengqing, WANG Zhao, CHENG Cong. Speech Enhancement Based on Time-frequency Self-attention Residual Temporal#br# Convolutional Networks [J]. Computer and Modernization, 2024, 0(09): 20-24.
[14]	ZHANG Ze1, ZHANG Jianquan2, 3, ZHOU Guopeng2, 3. Camera Module Defect Detection Based on Improved YOLOv8s [J]. Computer and Modernization, 2024, 0(09): 107-113.
[15]	ZHENG Shangpo1, CHEN Defu1, LI Jianli2, LIN Guoxian2, WANG Xingping3. Pedestrian Tracking Algorithm Based on Improved YOLOv5s and DeepSORT [J]. Computer and Modernization, 2024, 0(08): 54-58.

End-to-end Optical Music Recognition Method Based on Residual Gated Recurrent Convolutional Neural Network and Attention Mechanism

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

Comments