Layout Analysis Method of Multi-scale Feature Fusion

doi:10.3969/j.issn.1006-2475.2024.05.004

Abstract

Abstract: Abstract： Aiming at the problems of list and text misclassification， the difficulty of recognizing small-scale text in tables， and the poor preservation of spatial features in the current document layout element analysis， according to bottom-up thinking， the paper proposes a multi-feature fusion layout analysis method based on SegNet network. In this paper， the MSCAN-SE module is introduced into SegNet to solve the problem of low recognition rate of small-scale elements in tables. The strip features in the attention mechanism MSCAN-SE are used to improve the extraction ability of multi-scale features of the model， so that the network can retain feature information of more scales. Aiming at the problem that the features of list elements and text elements are too similar， the receptive field of the network in the feature extraction process is expanded through the dilated convolution and channel attention branch in the attention mechanism MSCAN-SE. The performance of the proposed method is compared with the classical semantic segmentation network through experiments. The results show that the pixel accuracy of the proposed method on the test set of layout analysis is 97.9%， and the mean intersection over union ratio is 91.7%. Compared with U-Net semantic segmentation model， FCN semantic segmentation model， DeepLabV3+ semantic segmentation model， and SegNet semantic segmentation model， the mean intersection and union ratio is increased by 7.6%， 2.4%， 2.6%and 1.5% respectively.

Key words: Key words： document layout analysis, multi-scale attention, semantic segmentation, channel attention

CLC Number:

TP391.41

QIAO Jia, XU Kun, HU Peirong. Layout Analysis Method of Multi-scale Feature Fusion[J]. Computer and Modernization, 2024, 0(05): 16-21.

References

［1］ BINMAKHASHEN G M， MAHMOUD S A. Document layout analysis： A comprehensive survey［J］. ACM Computing Surveys （CSUR）， 2019，52（6）：192.1-192.36.
［2］路敏. 蒙古文铅活字报纸图像识别关键技术研究［D］. 呼和浩特：内蒙古大学， 2022.
［3］ BREUEL T M. Two geometric algorithms for layout analysis［C］// Proceedings of the 5th International Workshop on Document Analysis Systems. Springer， 2002：188-199.
［4］ WAHL F M， WONG K Y， CASEY R G. Block segmentation and text extraction in mixed text/image documents［J］. Computer Graphics and Image Processing， 1982，20（4）：375-390.
［5］ MAO S， ROSENFELD A， KANUNGO T. Document structure analysis algorithms： A literature survey［J］. Document Recognition and Retrieval X， 2003，5010. DOI：10.1117/12.476326.
［6］ BOULID Y， SOUHAR A， ELKETTANI M Y. Arabic handwritten text line extraction using connected component analysis from a multi agent perspective［C］// 2015 15th International Conference on Intelligent Systems Design and Applications（ISDA）. IEEE， 2015：80-87.
［7］ O’GORMAN L. The document spectrum for page layout analysis［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 1993，15（11）：1162-1173.
［8］ BUKHARI S S， BREUEL T M， ASI A， et al. Layout analysis for arabic historical document images using machine learning［C］// 2012 International Conference on Frontiers in Handwriting Recognition. IEEE， 2012：639-644.
［9］ BUKHARI S S， AL AZAWI M I A， SHAFAIT F， et al. Document image segmentation using discriminative learning over connected components［C］// Proceedings of the 9th IAPR International Workshop on Document Analysis Systems. ACM， 2010：183-190.
［10］李玺，查宇飞，张天柱，等. 深度学习的目标跟踪算法综述［J］. 中国图象图形学报， 2019，24（12）：2057-2080.
［11］ SOTO C， YOO S. Visual detection with context for document layout analysis［C］// Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing（EMNLP-IJCNLP）. ACL， 2019：3464-3470.
［12］ GIRSHICK R. Fast R-CNN［C］// Proceedings of the 2015 IEEE International Conference on Computer Vision. IEEE，2015：1440-1448.
［13］ XU C H， SHI C， BI H Y， et al. A page object detection method based on mask R-CNN［J］. IEEE Access， 2021，9：143448-143457.
［14］ HE K M， GKIOXARI G， DOLLÁR P， et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. IEEE， 2017：2961-2969.
［15］ PRUSTY A， AITHA S， TRIVEDI A， et al. Indiscapes： Instance segmentation networks for layout parsing of historical indic manuscripts［C］// 2019 International Conference on Document Analysis and Recognition （ICDAR）. IEEE， 2019：999-1006.
［16］ CHEN K， SEURET M， HENNEBERT J， et al. Convolutional neural networks for page segmentation of historical document images［C］// Proceedings of the 14th IAPR International Conference on Document Analysis and Recognition（ICDAR 2017）. IEEE， 2017：965-970.
［17］ LONG J， SHELHAMER E， DARRELL T. Fully convolutional networks for semantic segmentation［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2015：3431-3440.
［18］ WICK C， PUPPE F. Fully convolutional neural networks for page segmentation of historical document images［C］// Proceedings of the 13th IAPR International Workshop on Document Analysis Systems（DAS）. IEEE， 2018：287-292.
［19］ RONNEBERGER O， FISCHER P， BROX T. U-Net： Convolutional networks for biomedical image segmentation［C］// Proceedings of the 18th International Conference on Medical Image Computing and Computer-Assisted Intervention（MICCAI 2015）. Springer， 2015：234-241.
［20］ KISE K. Page segmentation techniques in document analysis［M］// Handbook of Document Image Processing and Recognition. Springer， 2014：135-175.
［21］ BADRINARAYANAN V， KENDALL A， CIPOLLA R. SegNet： A deep convolutional encoder-decoder architecture for image segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2017，39（12）：2481-2495.
［22］ ZHONG X， TANG J B， YEPES A J. PubLayNet： Largest dataset ever for document layout analysis［C］// 2019 International Conference on Document Analysis and Recognition（ICDAR）. IEEE， 2019：1015-1022.
［23］ GUO M H， LU C Z， HOU Q， et al. SegNeXt： Rethinking convolutional attention design for semantic segmentation［J］. Advances in Neural Information Processing Systems， 2022， 35：1140-1156.
［24］ WANG W H， XIE E Z， LI X， et al. PVT v2： Improved baselines with pyramid vision transformer［J］. Computational Visual Media， 2022，8（3）：415-424.
［25］ SANDLER M， HOWARD A， ZHU M， et al. MobileNetv2： Inverted residuals and linear bottlenecks［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018：4510-4520.
［26］ YU F， KOLTUN V， FUNKHOUSER T. Dilated residual networks［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2017：472-480.
［27］ HU J， SHEN L， SUN G， et al. Squeeze-and-excitation networks［C］// Proceedings of the 2018 IEEE Conference on Computer Vision and Pattern Recognition. IEEE， 2018：7132-7141.
［28］周飞燕，金林鹏，董军. 卷积神经网络研究综述［J］. 计算机学报， 2017，40（6）：1229-1251.
［29］ LOSHCHILOV I， HUTTER F. SGDR： Stochastic gradient descent with warm restarts［J］. arXiv preprint arXiv：1608.03983， 2016.
［30］ CHEN L C， ZHU Y， PAPANDREOU G， et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］// Proceedings of the 2018 European Conference on Computer Vision （ECCV）. Springer， 2018：801-818.

[1]	ZHOU Anda, TANG Chaoying. Semantic Segmentation Algorithm for Rainy Road Scene and Its Mobile Deployment [J]. Computer and Modernization, 2024, 0(10): 7-13.
[2]	SHI Xianwei1, FAN Xin2. Semantic Segmentation of Video Frame Scene Based on Lightweight [J]. Computer and Modernization, 2024, 0(08): 49-53.
[3]	CUI Shao-guo, HU Guang-ping. Nested Named Entity Recognition Based on Semantic Segmentation [J]. Computer and Modernization, 2024, 0(02): 69-74.
[4]	HU Chong-jia, LIU Jin-zhou, FANG Li. Unsupervised Domain Adaptation for Outdoor Point Cloud Semantic Segmentation [J]. Computer and Modernization, 2024, 0(01): 74-79.
[5]	XU Hong-kui, LI Zhen-ye, GUO Wen-tao, ZHAO Jing-zheng, GUO Xu-bin. Real-time Detection of Arbitrary Shape Scene Text Based on Segmentation [J]. Computer and Modernization, 2023, 0(11): 95-100.
[6]	YE Si-jia, WEI Yan, DU Han-yu, DENG Jin-zhi. HRNet Image Semantic Segmentation Algorithm Combined with Attention Mechanism [J]. Computer and Modernization, 2023, 0(10): 65-69.
[7]	LIU Xu, ZHA Ke-ke. An Environmental Target Recognition Method for Airport Special Vehicle Operation [J]. Computer and Modernization, 2023, 0(08): 18-24.
[8]	NIU Yu-heng, LI Yong-ke, CHEN Yan-hong, JANG Ping-an. Image Segmentation Method of Residual Film on Cotton Field Surface Based on Improved SegFormer Model#br# [J]. Computer and Modernization, 2023, 0(07): 93-98.
[9]	YE Li-ming, CHEN Wei-wen. A Cascaded Insulator Defect Detection Model Combining Semantic Segmentation and Object Detection [J]. Computer and Modernization, 2023, 0(06): 82-88.
[10]	TANG Shu-fang, WANG Zhi-sheng. Semantic Segmentation of Street Scenes Based on Double Attention Mechanism [J]. Computer and Modernization, 2021, 0(10): 69-74.
[11]	ZHOU Xian-lai. Big Data Mining Algorithm of Heterogeneous Multi-core Platform Based on Semantic Segmentation [J]. Computer and Modernization, 2020, 0(10): 40-43.
[12]	ZHU Da-qing, CAO Guo. Particle Size Detection of Sandstone Images Based on Full Convolutional Network [J]. Computer and Modernization, 2020, 0(07): 111-116.
[13]	ZHOU Chen-yi, WANG Wen, LU Shan， XU Yi-bai. Real-time Semantic Segmentation Based on Multi-scale Fusion #br# and Its Application in Electric Power Scene [J]. Computer and Modernization, 2019, 0(08): 17-.
[14]	WANG Wen, XU Yi-bai, LU Shan, FENG Yu. A SLAM Technology Combining Area Detection and Semantic Segmentation [J]. Computer and Modernization, 2019, 0(07): 55-.
[15]	YANG Zhi-yao1,2, PENG Zhao-yi1,2, WEN Zhi-qiang1,2. Image Semantic Segmentation Based on Region Proposal Network [J]. Computer and Modernization, 2018, 0(02): 122-.