基于深度学习的短视频中的物体检测与内容推荐系统研究

doi:10.3969/j.issn.1006-2475.2018.11.014

计算机与现代化 ›› 2018, Vol. 0 ›› Issue (11): 69-.doi: 10.3969/j.issn.1006-2475.2018.11.014

基于深度学习的短视频中的物体检测与内容推荐系统研究

（1.华中师范大学物理科学与技术学院，湖北武汉430079；2.百度时代网络技术（北京）有限公司，北京100089）

收稿日期:2018-04-16 出版日期:2018-11-22 发布日期:2018-11-23
作者简介:石殷巧（1993-），女，安徽安庆人，华中师范大学物理科学与技术学院硕士研究生，研究方向：机器学习与软件开发；刘守印（1964-），男，河南周口人，教授，博士生导师，博士，研究方向：无线通信，物联网与机器学习；马超（1990-），男，湖北武汉人，百度时代网络技术（北京）有限公司工程师，研究方向：软件开发，机器视觉。

Research on Object Detection and Content Recommendation System in Short Video Based on Deep Learning

(1. College of Physical Science and Technology， Central China Normal University， Wuhan 430079， China；
2. Baidu.com Times Technology (Beijing) Co., Ltd.， Beijing 100089， China)

Received:2018-04-16 Online:2018-11-22 Published:2018-11-23

摘要/Abstract

摘要： 近年来短视频发展迅猛，短视频广告投放具有良好的市场前景，但是以往长视频的贴片广告投放方式不适合短视频。本文依据高相关、低打扰、短而精的准则，提出一种基于深度学习的视频物体检测与内容推荐系统方案。根据短视频来源、网络环境等不同，本文介绍2种实现模式：云端模式和移动终端模式。云端模式由服务器、内容分发网络（Content Delivery Network， CDN）和终端组成，服务器可预先对CDN短视频进行物体检测和识别，将短视频与对应广告内容匹配，并在移动端播放。移动终端模式主要处理本地视频，在移动端有限的资源上完成短视频的物体检测和内容推荐。在算法上，移动终端模式下该系统采用深度学习轻量级模型MobileNet以提高检测速度和准确率，降低内存。在实现上，通过联合编译Java和C++代码提高算法运行效率，通过反馈系统减小物体类别数量，提高实时性。

关键词: 深度学习, 物体检测, 内容推荐, Faster R-CNN, MobileNet

Abstract: Short video has been developing rapidly in recent years, and short video advertising has a promising prospect. However, the traditional advertisements are usually stiffly inserted into the videos, which are inefficient and always decrease users’ experience. This thesis proposes a systematic scheme for video object detection and content recommendation based on the deep learning model Faster R-CNN. This scheme will match the video contents to the displayed advertisements based on the principles of high correlation, precision and low interruption, thus obtains a balance between recommendation and user experience. Two system modes are available according to the video sources and network environments, named as Cloud Mode and Mobile Terminal Mode. The Cloud Mode is composed of a server, Content Delivery Network (CDN) and clients. The server will detect and recognize the contents of the CDN videos in advance, match them to corresponding advertisements by some recommendation algorithms and play the contents on the mobile Clients. The Mobile Terminal Mode mainly processes non-CDN resources such as some local videos, completes the tasks of object detection, recognition and content recommendation with limited computation ability. We apply the MobileNet model to improve the detection speed and accuracy, as well as to reduce memory footprint. To further increase efficiency and achieve real-time performance under the Mobile Terminal Mode, we implement joint compilation of Java and C++ code, adopt a self-developed player and cut down the object category based on the feedback system.

Key words: deep learning, object detection, content recommendation, Faster R-CNN, MobileNet

中图分类号:

TP302

石殷巧1，刘守印1，马超2. 基于深度学习的短视频中的物体检测与内容推荐系统研究[J]. 计算机与现代化, 2018, 0(11): 69-.

SHI Yin-qiao1， LIU Shou-yin1， MA Chao2. Research on Object Detection and Content Recommendation System in Short Video Based on Deep Learning[J]. Computer and Modernization, 2018, 0(11): 69-.

参考文献

［1］ iResearch. 中国短视频行业发展研究报告2016年［R］. 艾瑞咨询系列研究报告, 2016.
［2］ SENGAMEDU S H, SAWANT N, WADHWA S. vADeo: Video advertising system［C］// Proceedings of the 15th International Conference on Multimedia.ACM. 2007:455-456.
［3］曹雄. 面向视频内容的广告定向投放方法研究［D］. 哈尔滨:哈尔滨工业大学, 2014.
［4］ DALAL N, TRIGGS B. Histograms of oriented gradients for human detection［C］// IEEE Computer Society Conference on Computer Vision & Pattern Recognition. 2005:886-893.
［5］ REN S, HE K, GIRSHICK R, et al. Faster R-CNN: Towards real-time object detection with region proposal networks［C］// International Conference on Neural Information Processing Systems. 2015:91-99.
［6］ ZHANG H J, WU J, ZHONG D, et al. An integrated system for content-based video retrieval and browsing［J］. Pattern Recognition, 1997,30(4):643-658.
［7］ WOLF W. Key frame selection by motion analysis［C］// Proceedings of 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing. 1996:1228-1231.
［8］曹晋高. 视频关键帧提取方法研究［D］. 重庆:重庆大学, 2008.
［9］ TIRADO E F P. 自适应关键帧提取算法研究［D］. 重庆:重庆大学, 2012.
［10］LOWE D G. Object recognition from scale-invariant keypoints［C］// ICCV. 1999:1150-1157.
［11］GIRSHICK R, DONAHUE J, DARRELL T, et al. Region-based convolutional networks for accurate object detection and segmentation［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016,38(1):142-158.
［12］HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2015,37(9):1904-1916.
［13］GIRSHICK R. Fast R-CNN［C］// ICCV. 2015: arXiv:1504.08083.
［14］衣世东. 基于深度学习的图像识别算法研究［J］. 网络安全技术与应用, 2018(1):39-41.
［15］LIAO W S, CHEN K T, HSU W H. AdImage: Video advertising by image matching and Ad scheduling optimization［C］// Proceedings of the 31st Annual International Conference on Research and Development in Information Retrieval. ACM, 2008:767-768.
［16］MEIER Reto. Android高级编程［M］. 北京:清华大学出版社, 2010.
［17］张小菲. Android平台上音视频系统的研究及播放器开发［D］. 西安:西安电子科技大学, 2012.
［18］HOWARD A G, ZHU M, CHENB, et al. MobileNets: Efficient convolutional neural networks for mobile vision applications［J］. Computer Vision and Pattern Recognition, 2017: arXiv:1704.04861.
［19］王小莹,易尧华. 基于SVM的模糊图像识别［J］. 包装工程, 2016(13):179-183.

[1]	祁贤, 刘大铭, 常佳鑫. 基于改进自注意力机制的多视图三维重建[J]. 计算机与现代化, 2024, 0(11): 106-112.
[2]	陈凯1, 李宜汀1, 2, 全华凤1 . 基于改进YOLOv8的河道废弃瓶检测方法[J]. 计算机与现代化, 2024, 0(11): 113-120.
[3]	杨骏1, 胡为1, 朱文福2. 基于改进MobileNetV3的视觉SLAM回环检测算法[J]. 计算机与现代化, 2024, 0(10): 21-26.
[4]	王莹莹, 郝潇. 基于Res2Net和递归门控卷积的细粒度图像分类[J]. 计算机与现代化, 2024, 0(10): 74-79.
[5]	史星宇1, 李强2, 庄莉3, 梁懿3, 王秋琳3, 陈锴3, 伍臣周3, 常胜1. 一种面向工业部署的目标检测模型蒸馏技术[J]. 计算机与现代化, 2024, 0(10): 93-99.
[6]	张泽1, 张建权2, 3, 周国鹏2, 3. 基于改进YOLOv8s的摄像头模组缺陷检测[J]. 计算机与现代化, 2024, 0(09): 107-113.
[7]	程亚子1, 雷亮1, 2, 陈瀚1, 赵毅然1. 基于转置注意力的多尺度深度融合单目深度估计[J]. 计算机与现代化, 2024, 0(09): 121-126.
[8]	程萌, 李浩. 改进YOLOv5s的落叶树鸟巢检测方法[J]. 计算机与现代化, 2024, 0(08): 24-29.
[9]	王梦溪, 李峻. 老年人跌倒检测技术研究综述[J]. 计算机与现代化, 2024, 0(08): 30-36.
[10]	时现伟1, 范鑫2. 基于轻量化的视频帧场景语义分割方法[J]. 计算机与现代化, 2024, 0(08): 49-53.
[11]	徐新爱, 李钢. 基于DCGAN的课堂表情图像生成方法[J]. 计算机与现代化, 2024, 0(08): 88-91.
[12]	高帅鹏, 王怡凡. 基于图像的群体情绪识别综述[J]. 计算机与现代化, 2024, 0(08): 98-107.
[13]	黄文栋, 王怡凡. 基于模态类别的多模态信息处理与融合综述[J]. 计算机与现代化, 2024, 0(07): 47-62.
[14]	武丽1, 张征浩2, 葛彩成2, 俞俊2. 基于改进SCNN网络的车道线检测算法[J]. 计算机与现代化, 2024, 0(07): 87-92.
[15]	张可1, 艾中良2, 刘忠麟3, 顾平莉1, 刘学林4. 基于多元组匹配损失的司法论辩理解方法[J]. 计算机与现代化, 2024, 0(06): 115-120.

基于深度学习的短视频中的物体检测与内容推荐系统研究

Research on Object Detection and Content Recommendation System in Short Video Based on Deep Learning

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

本文评价