基于通道切分的人体姿态估计算法

计算机与现代化 ›› 2021, Vol. 0 ›› Issue (12): 27-36.

基于通道切分的人体姿态估计算法

（1.南通大学张謇学院，江苏南通226019；2.南通大学交通与土木工程学院，江苏南通226019）

出版日期:2021-12-24 发布日期:2021-12-24
作者简介:周昆阳（2000—），男，江苏盐城人，本科生，研究方向：计算机视觉，E-mail: 1752465993@qq.com；赵梦婷（2001—），女，江苏沭阳人，本科生，研究方向：图像处理,E-mail: 3248463196@qq.com；张海潮（2001—），女，四川德阳人，本科生，研究方向：图像处理,E-mail: 1908966460@qq.com；通信作者：邵叶秦（1978—），男，浙江海宁人，副教授，博士，研究方向：计算机视觉，E-mail： hnsyk@ntu.edu.cn。
基金资助:
国家自然科学基金面上项目(61671255)；江苏省大学生创新训练计划项目(201910304158H, 202010304180H, 202010304122Y)

Human Pose Estimation Algorithm Based on Channel Splitting

（1. School of Zhang Jian, Nantong University, Nantong 226019, China；
2. School of Transportation and Civil Engineering, Nantong University, Nantong 226019, China）

Online:2021-12-24 Published:2021-12-24

摘要/Abstract

摘要： 为了提高人体姿态估计的准确率和识别速度，提出一种基于通道切分的人体姿态估计算法Channel-Split Residual Steps Network(Channel-Split RSN)。首先，提出通道切分模块，对切分后的特征通道通过卷积提取特征再融合起来，以获得丰富的特征表示。接着，引入特征增强模块，对特征通道进一步分组，并对不同的分组采取不同的处理策略，以减少特征通道内的相似特征。最后，结合改进的空间注意力机制，提出一种基于特征空间相关性的姿态修正机Context-PRM，得到更加准确的人体姿态估计。在COCO test-dev数据集上的实验结果表明，本文方法达到75.9%的AP和55.36的FPS，并且模型的大小Params(M)仅为18.3。相较于传统的RSN18和传统的RSN50，模型的AP分别提高了5和3.4个百分点，FPS比传统的RSN50快12.08。在更具挑战性的CrowdPose数据集上，本文方法达到66.9%的AP和19.16的FPS，相较于RSN18，AP提高了4.6个百分点。有效提高了人体姿态估计的准确率，且模型具有较快的识别速度。本文源代码公开在https://github.com/qdd1234/Channel-Split-RSN。

关键词: Channel-Split RSN, 人体姿态估计, 通道切分模块, 特征增强模块, Context-PRM

Abstract: To improve the accuracy and speed of human pose estimation, a channel-split-based human pose estimation algorithm, named Channel-Split Residual Steps Network (Channel-Split RSN), is proposed. First of all, channel-split blocks are proposed to apply convolution operation for split feature in order to obtain rich feature representation. Then, feature enhancement blocks are introduced to further split feature channel and employ different strategies for different groups which can reduce similar features in feature channels. Finally, to further enhance the pose refine machine in Channel-Split RSN, combined with improved spatial attention mechanism, a pose refine machine based on feature spatial correlation, named Context-PRM, is proposed. Experimental results show that on the COCO test-dev dataset, our algorithm reaches 75.9% AP and 55.36 FPS, and the Params(M) of the model is only 18.3. Compared with the traditional RSN18 and RSN50, the AP of the model is improved by 5 and 3.4 percentage points, respectively. FPS is 12.08 faster than the traditional RSN50. On the more challenging CrowdPose dataset, our approach achieves 66.9% AP and 19.16 FPS, an AP improvement of 4.6 percentage points compared to RSN18, which effectively improves the accuracy of human pose estimation and the model has a faster recognition speed. Our source code is available at https://github.com/qdd1234/Channel-Split-RSN.

Key words: Channel-Split RSN, human pose estimation, channel-split block, feature enhancement block, Context-PRM

周昆阳, 赵梦婷, 张海潮, 邵叶秦. 基于通道切分的人体姿态估计算法[J]. 计算机与现代化, 2021, 0(12): 27-36.

ZHOU Kun-yang, ZHAO Meng-ting, ZHANG Hai-chao, SHAO Ye-qin. Human Pose Estimation Algorithm Based on Channel Splitting[J]. Computer and Modernization, 2021, 0(12): 27-36.

参考文献

［1］ LIU S Q, ZHANG J C, ZHU R. A wearable human motion tracking device using micro flow sensor incorporating a micro accelerometer［J］. IEEE Transactions on Biomedical Engineering, 2020,67(4):940-948.
［2］王恬,李庆武,刘艳,等. 利用姿势估计实现人体异常行为识别［J］. 仪器仪表学报, 2016,37(10):2366-2372.
［3］ WEI S H, RAMAKRISHNA V, KANADE T, et al. Convolutional pose machines［C］// 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016:4724-4732.
［4］ CAO Z, SIMON T, WEI S H, et al. Realtime multi-person 2D pose estimation using part affinity fields［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:1302-1310.
［5］ HUNG L, THANH N T, HANH C P, et al. Pose toolkit’s evaluation in the video traditional martial arts presentation［C］// 2019 International Symposium on Communications and Information Technologies(ISCIT). 2019:76-81.
［6］唐心宇,宋爱国. 人体姿态估计及在康复训练情景交互中的应用［J］. 仪器仪表学报, 2018,39(11):195-203.
［7］冯文宇,朱洪堃,殷佳炜,等. 无人CT智能姿态识别算法研究［J］. 仪器仪表学报, 2020,41(8):188-195.
［8］ CHEN Y L, WANG Z C, PENG Y X, et al. Cascaded pyramid network for multi-person pose estimation［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2018:7103-7112.
［9］ FANG H S, XIE S Q, TAI Y W, et al. RMPE: Regional multi-person pose estimation［C］// 2017 International Conference on Computer Vision. 2017:2353-2362.
［10］CAI Y H, WANG Z C, LUO Z X, et al. Learning delicate local representations for multi-person pose estimation［C］// 2020 European Conference on Computer Vision. 2020:455-472.
［11］LI Z M, MA Y C, CHEN Y K, et al. Joint COCO and mapillary workshop at ICCV 2019: COCO instance segmentation challenge track［J］. arXiv preprint arXiv:2010.02475, 2020.
［12］BOCHKOVSKIY A， WANG C Y， LIAO H Y M. YOLOv4:Optimal speed and accuracy of object detection［J］. arXiv preprint arXiv:2004.10934, 2020.
［13］HAN K, WANG Y H, TIAN Q, et al. GhostNet: More features from cheap operations［C］// The 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2020:1577-1586.
［14］HUA B S, TRAN M K, YEUNG S K. Pointwise convolutional neural networks［C］// 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018:984-993.
［15］ZHANG T, QI G J, XIAO B, et al. Interleaved group convolutions［C］// 2017 IEEE International Conference on Computer Vision(ICCV). 2017:4383-4392.
［16］LIN T Y, MAIRE M, BELONGIE S J, et al. Microsoft COCO: Common objects in context［C］// 2014 European Conference on Computer Vision(ECCV). 2014:740-755.
［17］LI J F, WANG C, ZHU H, et al. CrowdPose: Efficient crowded scenes pose estimation and a new benchmark［C］// 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition（CVPR）. 2019:10855-10864.
［18］王柳程,欧阳城添,梁文. 基于改进特征金字塔网络的人体姿态跟踪［J/OL］. 计算机工程:1-9［2021-05-08］. https://doi.org/10.19678/j.issn.1000-3428.0058544.
［19］DENG J, DONG W, SOCHER R, et al. ImageNet: A large-scale hierarchical image database［C］// The 2009 IEEE Conference on Computer Vision and Pattern Recognition(CVPR). 2009:248-255.
［20］WOO S, ARK P J, LEE J Y, et al. CBAM: Convolutional block attention module［C］// 2018 European Conference on Computer Vision(ECCV). 2018:3-19.

［21］HU J, SHEN L, ALBANIE S, et al. Squeeze-and-excitation networks［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020,42(8):2011-2023.

［22］PAPANDREOU G, ZHU T, KANAZAWA N, et al. Towards accurate multi-pers on pose estimation in the wild［C］// The 2017 IEEE Conference on Computer Vision and Pattern Recognition. 2017:3711-3719.
［23］胡保林. 基于深度学习的人体关节点检测［D］. 成都:电子科技大学, 2019.
［24］XIAO B, WU H P, WEI Y C. Simple baselines for human pose estimation and tracking［C］// The 2018 European Conference on Computer Vision(ECCV). 2018:472-487.
［25］SUN K, XIAO B, LIU D, et al. Deep high-resolution representation learning for human pose estimation［C］// The 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2019:5686-5696.
［26］罗梦诗,徐杨,叶星鑫. 融入双注意力的高分辨率网络人体姿态估计［J/OL］. 计算机工程:1-10［2021-05-08］. https://doi.org/10.19678/j.issn.1000-3428.0060493.
［27］HE K M, GKIOXARI G, DOLLAR P, et al. Mask R-CNN［C］// Proceedings of the 2017 IEEE International Conference on Computer Vision(ICCV). 2017:2980-2988.
［28］CHENG B W, XIAO B, WANG J D, et al. HigherHRNet: Scale-aware representation learning for bottom-up human pose estimation［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). 2020:5385-5394.
［29］GENG Z G, SUN K, XIAO B, et al. Bottom-up human pose estimation via disentangled keypoint regression［J］. arXiv preprint arXiv:2104.02300,2021.