计算机与现代化 ›› 2025, Vol. 0 ›› Issue (07): 63-68.doi: 10.3969/j.issn.1006-2475.2025.07.009

• 人工智能 • 上一篇    下一篇

不同视角三维人体关键点动作相似度计算

  


  1. (北方工业大学信息学院,北京 100144)
  • 出版日期:2025-07-22 发布日期:2025-07-22
  • 作者简介:作者简介:李子贺(2000—),男,天津人,硕士研究生,研究方向:数字图像处理与模式识别,E-mail: lizihe2022@mail.ncut.edu.cn; 通信作者:王一丁(1967—),男,北京人,教授,博士,研究方向:生物特征识别,计算机视觉等,E-mail: wangyd@ncut.edu.cn。
  • 基金资助:
    基金项目:国家自然科学基金资助项目(62276018)

3D Human Motion Similarity Estimation from Different Perspectives 


  1. (School of Information Science and Technology, North China University of Technology, Beijing 100144, China) 
  • Online:2025-07-22 Published:2025-07-22
  • Supported by:

摘要: 摘要:目前线上健身、舞蹈教学视频资源丰富,但学员在学习过程中为比较与教学的动作自行拍摄的视频无法保证与教学的视角一致,会有角度和尺度的差异,不便于比较动作相似度。针对此问题,本文利用现有的三维人体姿态估计技术,提出一种可以用于不同视角下的单目摄像头拍摄的视频的动作相似度评估算法。对于2个不同视角的人物动作视频,首先用YOLOv8pose网络提取二维人体关键点,然后用GraphMLP网络升维成三维关键点。基于2组三维关键点序列计算欧氏距离矩阵,用DTW算法找出2组动作的对应帧,将对应帧的三维关键点通过旋转、放缩等手段调整视角,将不同视角的动作序列调整到同一方向,最后采用骨骼向量的余弦相似度作为相似度评判指标。利用不同视角的动作捕捉动画进行实验,验证了本文方法的有效性。


关键词: 关键词:YOLOv8pose, GraphMLP, 人体姿态估计, DTW, 余弦相似度, 不同视角

Abstract:
Abstract: With the abundance of online fitness and dance instructional videos, students often face difficulties in comparing their movements with those of the instructors due to inconsistencies in angles and scales when filming themselves, which hinders accurate movement similarity comparison. To fix this problem, this paper leverages existing 3D human pose estimation methods and proposes a motion similarity evaluation algorithm for videos filmed from different angles with a monocular camera. For two videos of human actions from different perspectives, this paper first extracts 2D human key points using the YOLOv8pose network, then elevates these to 3D key points using the GraphMLP network. This paper calculates the Euclidean distance matrix based on the two sets of 3D key point sequences and uses the DTW algorithm to identify corresponding frames between the two sets of actions. By adjusting the perspective of corresponding frames’3D key points through rotation and scaling, this paper aligns action sequences from different perspectives. Finally, the cosine similarity of skeletal vectors is used as the similarity evaluation metric. Experiments using mocap animations from different perspectives was conducted, the results demonstrated the effectiveness of the method proposed in this paper.

Key words: Key words: YOLOv8pose, GraphMLP, human pose estimation, dynamic time warping, cosine similarity, different perspectives

中图分类号: