计算机与现代化 ›› 2022, Vol. 0 ›› Issue (09): 68-77.

• 人工智能 • 上一篇    下一篇

DNeStCount:数据相关的拆分注意力机制的编码器-解码器结构的人群计数方法

  

  1. (1.上海师范大学旅游学院,上海201418;2.上海旅游高等专科学校计算机教研室,上海201418)
  • 出版日期:2022-09-22 发布日期:2022-09-22
  • 作者简介:孟晓龙(1988—),男,内蒙古呼和浩特人,讲师,硕士,研究方向:机器学习,数据挖掘,E-mail: mengxl@shnu.edu.cn。
  • 基金资助:
    校(院)人才队伍建设工程项目(RS2021-CY04); 校(院)学术骨干培育项目(KY2020-DL13)

DNeStCount: A Data-dependent Encoder-decoder Architecture with Split-attention for Crowd Counting#br#

  1. (1. School of Tourism, Shanghai Normal University, Shanghai 201418, China;
    2. Computer Department, Shanghai Institute of Tourism, Shanghai 201418, China)
  • Online:2022-09-22 Published:2022-09-22

摘要: 人群数量估计是人群管理系统的关键,对于预防踩踏事故和引导人群至关重要,已成为一个日益重要的任务和具有挑战性的研究方向。本文提出一种数据相关的拆分注意力机制的编码器-解码器结构的人群计数方法,称为DNeStCount。为应对视频监控的尺度变化和透视失真的挑战,将更密集的空洞采样比率应用到密集空洞空间金字塔池化模块DASPP设计中。为提升密度图估计的准确性,将可学习的、数据相关的上采样方法DUpsampling应用到特征聚合模块DFA设计中。为弥补欧几里德损失可能存在对离群值敏感、训练不稳定等缺点,采用Smooth L1损失设计损失函数。在具有挑战性的数据集上进行的实验和分析表明,本文提出的人群计数方法DNeStCount与其他主流方法相比更具有竞争力。

关键词: 人群计数, 编码器-解码器结构, 拆分注意力机制, 密集空洞空间金字塔池化, 数据相关上采样, Smooth L1损失

Abstract: Crowd count estimation is the linchpin of the crowd management system, which is very important to prevent stampede accident and guide crowd. It has become an increasingly important task and challenging research direction. This paper proposes a data-dependent encoder-decoder architecture with split-attention for crowd counting, called DNeStCount. In order to cope with the challenges of scale variation and perspective distortion of video surveillance, a more dense atrous ratio is applied to the design of the dense atrous spatial pyramid pooling block. In order to improve the accuracy of density map estimation, a learnable and data-dependent upsampling method DUpsampling is applied to the design of the data-dependent feature aggregation. In order to compensate outlier sensitive and untrainable Euclidean loss, Smooth L1 loss is used to the design of loss function. The experiments and analyses on challenging datasets show that DNeStCount is more competitive compared to thoughtful approaches.

Key words: crowd counting, encoder-decoder architecture, split-attention mechanism, dense atrous spatial pyramid pooling; data-dependent upsampling; Smooth L1 loss