计算机与现代化 ›› 2021, Vol. 0 ›› Issue (12): 58-64.

• 图像处理 • 上一篇    下一篇

基于多尺度融合网络的视频快照压缩感知重建

  

  1. (南京信息工程大学江苏省大数据分析技术重点实验室,江苏南京210044)
  • 出版日期:2021-12-24 发布日期:2021-12-24
  • 作者简介:陈勋豪(1996—),男,江苏南京人,硕士研究生,研究方向:深度学习,压缩感知,E-mail: xunhao.c@nuist.edu.cn; 杨莹(1996—),女,江苏淮安人,硕士研究生,研究方向:压缩感知重建,E-mail: yingyang@nuist.edu.cn; 黄俊茹(1998—),女,江苏徐州人,硕士研究生,研究方向:深度学习,压缩感知,E-mail: 20201249083@nuist.edu.cn; 通信作者:孙玉宝(1983—),男,江苏连云港人,副教授,博士,研究方向:深度学习理论与方法,计算成像,E-mail: sunyb@nuist.edu.cn。
  • 基金资助:
    国家自然科学基金资助项目(U2001211, 61672292)

Video Snapshot Compressed Sensing Reconstruction Based on Multi-scale Fusion Network

  1. (Jiangsu Key Laboratory of Big Data Analysis Technology, Nanjing University of Information Science & Technology, Nanjing 210044, China)
  • Online:2021-12-24 Published:2021-12-24

摘要: 视频快照压缩感知基于压缩感知理论,仅在一次曝光过程中将多帧画面投影至二维快照测量,进而实现高速成像。为了从二维快照测量信号恢复出原视频信号,经典的重建算法基于视频的稀疏性先验进行迭代优化求解,但重建质量较低,且耗时过长。深度学习因优异的学习能力而受到广泛关注,基于深度学习的视频快照压缩重建方法也得到关注,但现有深度方法缺乏对于时空特征的有效表达,重建质量仍有待进一步提高。本文提出视频快照压缩感知重建的多尺度融合重构网络(MSF-Net),该网络从横向的卷积深度和纵向的分辨率2个维度展开,分辨率维度利用三维卷积进行不同尺度的视频特征的提取,横向维度利用伪三维卷积残差模块对同分辨率尺度的特征图进行层级提取,并通过不同尺度下的特征交叉融合来学习视频的时空特征。实验结果表明,本文方法能够同时提升重建质量与重建速度。

关键词: 视频快照, 压缩感知; 深度学习; 多尺度融合

Abstract: Video snapshot compressed sensing is based on the theory of compressed sensing, which only projects multiple frames to a two-dimensional snapshot measurement during one exposure process to achieve high-speed imaging. In order to recover the original video signal from the two-dimensional snapshot measurement signal, the classical reconstruction algorithm is based on the sparsity of the video prior to iterative optimization solution, but the reconstruction quality is low and time-consuming. Deep learning has attracted much attention because of its excellent learning ability as well as video snapshot compression reconstruction methods that developed based on it. However, the existing deep methods lack effective expression of spatiotemporal features, and the reconstruction quality still needs to be further improved. This paper proposes a multi-scale fusion reconstruction network (MSF-Net) for compressed sensing reconstruction of video snapshots. The network expands from the two dimensions of horizontal convolution depth and vertical resolution. The resolution dimension uses three-dimensional convolution to perform different scales. In the extraction of video features, the horizontal dimension uses the pseudo three-dimensional convolution residual module to extract hierarchically the feature maps of the same resolution scale, and learns the spatiotemporal features of the video through the cross fusion of features at different scales. Experimental results show that this method can improve the reconstruction quality and reconstruction speed at the same time.

Key words: video snapshot, compressed sensing, deep learning, multi-scale fusion