基于多尺度融合网络的视频快照压缩感知重建

计算机与现代化 ›› 2021, Vol. 0 ›› Issue (12): 58-64.

基于多尺度融合网络的视频快照压缩感知重建

(南京信息工程大学江苏省大数据分析技术重点实验室，江苏南京210044)

出版日期:2021-12-24 发布日期:2021-12-24
作者简介:陈勋豪（1996—），男，江苏南京人，硕士研究生，研究方向：深度学习，压缩感知，E-mail: xunhao.c@nuist.edu.cn；杨莹（1996—），女，江苏淮安人，硕士研究生，研究方向：压缩感知重建，E-mail: yingyang@nuist.edu.cn；黄俊茹（1998—），女，江苏徐州人，硕士研究生，研究方向：深度学习，压缩感知，E-mail: 20201249083@nuist.edu.cn；通信作者：孙玉宝（1983—），男，江苏连云港人，副教授，博士，研究方向：深度学习理论与方法，计算成像，E-mail: sunyb@nuist.edu.cn。
基金资助:
国家自然科学基金资助项目（U2001211， 61672292）

Video Snapshot Compressed Sensing Reconstruction Based on Multi-scale Fusion Network

(Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science & Technology, Nanjing 210044, China)

Online:2021-12-24 Published:2021-12-24

摘要/Abstract

摘要： 视频快照压缩感知基于压缩感知理论，仅在一次曝光过程中将多帧画面投影至二维快照测量，进而实现高速成像。为了从二维快照测量信号恢复出原视频信号，经典的重建算法基于视频的稀疏性先验进行迭代优化求解，但重建质量较低，且耗时过长。深度学习因优异的学习能力而受到广泛关注，基于深度学习的视频快照压缩重建方法也得到关注，但现有深度方法缺乏对于时空特征的有效表达，重建质量仍有待进一步提高。本文提出视频快照压缩感知重建的多尺度融合重构网络（MSF-Net），该网络从横向的卷积深度和纵向的分辨率2个维度展开，分辨率维度利用三维卷积进行不同尺度的视频特征的提取，横向维度利用伪三维卷积残差模块对同分辨率尺度的特征图进行层级提取，并通过不同尺度下的特征交叉融合来学习视频的时空特征。实验结果表明，本文方法能够同时提升重建质量与重建速度。

关键词: 视频快照, 压缩感知；深度学习；多尺度融合

Abstract: Video snapshot compressed sensing is based on the theory of compressed sensing, which only projects multiple frames to a two-dimensional snapshot measurement during one exposure process to achieve high-speed imaging. In order to recover the original video signal from the two-dimensional snapshot measurement signal, the classical reconstruction algorithm is based on the sparsity of the video prior to iterative optimization solution, but the reconstruction quality is low and time-consuming. Deep learning has attracted much attention because of its excellent learning ability as well as video snapshot compression reconstruction methods that developed based on it. However, the existing deep methods lack effective expression of spatiotemporal features, and the reconstruction quality still needs to be further improved. This paper proposes a multi-scale fusion reconstruction network (MSF-Net) for compressed sensing reconstruction of video snapshots. The network expands from the two dimensions of horizontal convolution depth and vertical resolution. The resolution dimension uses three-dimensional convolution to perform different scales. In the extraction of video features, the horizontal dimension uses the pseudo three-dimensional convolution residual module to extract hierarchically the feature maps of the same resolution scale, and learns the spatiotemporal features of the video through the cross fusion of features at different scales. Experimental results show that this method can improve the reconstruction quality and reconstruction speed at the same time.

Key words: video snapshot, compressed sensing, deep learning, multi-scale fusion

陈勋豪, 杨莹, 黄俊茹, 孙玉宝. 基于多尺度融合网络的视频快照压缩感知重建[J]. 计算机与现代化, 2021, 0(12): 58-64.

CHEN Xun-hao, YANG Ying, HUANG Jun-ru, SUN Yu-bao. Video Snapshot Compressed Sensing Reconstruction Based on Multi-scale Fusion Network[J]. Computer and Modernization, 2021, 0(12): 58-64.

参考文献

［1］ JALALI S, YUAN X. Snapshot compressed sensing: Performance bounds and algorithms［J］. IEEE Transactions on Information Theory, 2019,65(12):8005-8024.
［2］ DONOHO D L. Compressed sensing［J］. IEEE Transactions on Information Theory, 2006,52(4):1289-1306.
［3］戴琼海,付长军,季向阳. 压缩感知研究［J］. 计算机学报, 2011,34(3):425-434.
［4］邵文泽,韦志辉. 压缩感知基本理论:回顾与展望［J］. 中国图象图形学报, 2012,17(1):1-12.
［5］ SUN Y B, CHEN J W, LIU Q S, et al. Dual-path attention network for compressed sensing image reconstruction［J］. IEEE Transactions on Image Processing, 2020,29:9482-9495.
［6］ HITOMI Y, GU J W, GUPTA M, et al. Video from a single coded exposure photograph using a learned over-complete dictionary［C］// 2011 International Conference on Computer Vision. IEEE, 2011:287-294.
［7］ REDDY D, VEERARAGHAVAN A, CHELLAPPA R. P2C2: Programmable pixel compressive camera for high speed imaging［C］// CVPR’11. 2011:329-336.
［8］ LLULL P, LIAO X J, YUAN X, et al. Coded aperture compressive temporal imaging［J］. Optics Express, 2013,21(9):10526-10545.
［9］ YUAN X, LLULL P, LIAO X J, et al. Low-cost compressive sensing for color video and depth［C］// Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition. 2014:3318-3325.
［10］SUN Y Y, YUAN X, PANG S. High-speed compressive range imaging based on active illumination［J］. Optics Express, 2016,24(20):22836-22846.
［11］YUAN X. Generalized alternating projection based total variation minimization for compressive sensing［C］// 2016 IEEE International Conference on Image Processing (ICIP). 2016:2539-2543.
［12］YANG J B, YUAN X, LIAO X J, et al. Video compressive sensing using Gaussian mixture models［J］. IEEE Transactions on Image Processing, 2014,23(11):4863-4878.
［13］YANG J B, LIAO X J, YUAN X, et al. Compressive sensing by learning a Gaussian mixture model from measurements［J］. IEEE Transactions on Image Processing, 2014,24(1):106-119.
［14］GOODFELLOW I, BENGIO Y, COURVILLE A, et al. Deep Learning［M］. Cambridge: MIT press, 2016.
［15］ILIADIS M, SPINOULAS L, KATSAGGELOS A K. Deep fully-connected networks for video compressive sensing［J］. Digital Signal Processing, 2018,72:9-18.
［16］JI S W, XU W, YANG M, et al. 3D convolutional neural networks for human action recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013,35(1):221-231.
［17］ALBAWI S, MOHAMMED T A, AL-ZAWI S. Understanding of a convolutional neural network［C］// 2017 International Conference on Engineering and Technology(ICET). 2017.DOI:10.1109/ICEngTechnol.2017.8308186.
［18］HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. 2016:770-778.

［19］QIU Z F, YAO T, MEI T. Learning spatio-temporal representation with pseudo-3d residual networks［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision. 2017:5534-5542.

［20］YUAN X. Various total variation for snapshot video compressive imaging［J］. arXiv preprint arXiv:2005.08028, 2020.
［21］IEK , ABDULKADIR A, LIENKAMP S S, et al. 3D U-Net: Learning dense volumetric segmentation from sparse annotation［C］// 2016 International Conference on Medical Image Computing and Computer-Assisted Intervention. 2016:424-432.
［22］YUAN X, LIU Y, SUO J L, et al. Plug-and-play algorithms for large-scale snapshot compressive imaging［C］// Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition. 2020:1444-1454.
［23］KAY W, CARREIRA J, SIMONYAN K, et al. The kinetics human action video dataset［J］. arXiv preprint arXiv:1705.06950, 2017.
［24］LIU Y, YUAN X, SUO J L, et al. Rank minimization for snapshot compressive imaging［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019,41(12):2990-3006.
［25］BADRINARAYANAN V, KENDALL A, CIPOLLA R. SegNet: A deep convolutional encoder-decoder architecture for image segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017,39(12):2481-2495.