基于数字内容偏好的多标签分类应用

计算机与现代化 ›› 2021, Vol. 0 ›› Issue (02): 45-50.

基于数字内容偏好的多标签分类应用

(陕西科技大学电子信息与人工智能学院,陕西西安710021)

出版日期:2021-03-01 发布日期:2021-03-01
作者简介:刘斌(1972—),男,陕西咸阳人,副教授,硕士,研究方向:大数据分析,数据挖掘,E-mail: Liubin@sust.edu.cn; 李笑(1996—),女,陕西咸阳人,硕士研究生,研究方向:数据挖掘,大数据分析,E-mail: 254428765@qq.com。
基金资助:
国家自然科学基金资助项目(61871260)

Application of Multi-label Classification Based on Digital Content Preference

(School of Electronic Information and Artificial Intelligence, Shaanxi University of Science & Technology, Xi’an 710021, China)

Online:2021-03-01 Published:2021-03-01

摘要/Abstract

摘要： 目前电信行业的数字内容研究主要是基于业务口径进行不同偏好的用户洞察，多以业务经验进行判断，不利于数字内容用户规模的发展扩大。为此，本文利用大流量客户的历史数据，基于多标签分类算法对数字内容偏好进行研究，得到各类别的潜在目标客户，最终通过营销推荐客户喜好内容，提高精准营销能力。首先以M电信公司用户的基础、消费属性等脱敏数据作为数据源，并获取近3个月视频、音乐、阅读活跃用户清单，人工进行活跃维度的标注，得到初始数据集；由于正负样本不均衡，故采用多次下采样的方法随机抽样得到3份样本数据，并使用CC、ML-KNN、RakelD等6种算法进行对比实验验证；实验结果表明：采用RakelD及ML-KNN多标签分类算法在数字内容用户偏好洞察方面有较好的预测能力，故采用ML-KNN作为RakelD算法的基本分类器，即RakelD_MLKNN方法，对正负样比例不同的数据集分别进行预测，效果均优于前6种已经存在的常用多标签分类算法及传统经验选型方法。

关键词: 数字内容偏好, 多标签分类, CC算法, ML-KNN算法, RakelD算法

Abstract: At present, the research on digital content in telecom industry is mainly based on the user insight of different preferences based on business caliber, and most of them are based on business experience, which is not conducive to the development and expansion of the scale of digital content users. To this end, this paper makes use of the historical data of large-volume customers and studies the digital content preference based on multi-label classification algorithm, so as to obtain various potential target customers, and finally recommend customers’ preferences through marketing to improve precision marketing ability. Firstly, desensitization data such as the basis and consumption attributes of M telecom users are taken as the data source, and the list of active users of video, music and reading in the last three months is obtained. The active dimension is manually annotated to obtain the initial data set. Because the positive and negative samples are not balanced, three samples are randomly sampled by multiple down-sampling method, and six algorithms including CC, ML-KNN and RakelD are used for comparative experimental verification. The experimental results show that the RakelD and ML-KNN multi-tag classification algorithms have better predictive ability in the perspective of user preference. Therefore, ML-KNN is adopted as the basic classifier of RakelD algorithm, namely RakelD_MLKNN method, to respectively predict the data sets with different positive and negative sample proportions, and the results are all better than the previous 6 existing common multi-label classification algorithms and traditional empirical selection methods.

Key words: digital content preference, multi-label classification, Classifier Chains (CC) algorithm, Multi-Label K-Nearest Neighbor (ML-KNN) algorithm, Random k labelsets Disjoint (RakelD) algorithm

刘斌, 李笑. 基于数字内容偏好的多标签分类应用[J]. 计算机与现代化, 2021, 0(02): 45-50.

LIU Bin, LI Xiao. Application of Multi-label Classification Based on Digital Content Preference[J]. Computer and Modernization, 2021, 0(02): 45-50.

参考文献

［1］张明钟. 5G时代运营商创新流量经营策略［J］. 通信企业管理, 2018(10):60-61.
［2］ LI L,WANG M, ZHANG L, et al. Learning semantic similarity for multi-label text categorization［C］// Chinese Lexical Semantics 15th Workshop. 2014:260-269.
［3］ RUBIN T N, CHAMBERS A, SMYTH P, et al. Statistical topic models for multi-label document classification ［J］. Machine Learning, 2012,88(1-2):157-208.
［4］ JIANG J Y, TSAI S C, LEE S J. FSKNN: Multi-label text categorization based on fuzzy similarity and k nearest neighbors［J］. Expert Systems with Applications, 2012,39(3):2813-2821.
［5］ LIU S M, CHEN J H. A multi-label classification based approach for sentiment classification［J］． Expert Systems with Applications, 2015,42(3):1083-1093.
［6］ HUANG S, PENG W, LI J X, et al. Sentiment and topic analysis on social media:A multi-task multi-label classification approach［C］// Proceedings of the 5th Annual ACM Web Science Conference. 2013:172-181.
［7］ LO H Y, WANG J C, WANG H M, et al. Cost-sensitive multilabel learning for audio tag annotation and retrieval［J］. IEEE Transactions on Multimedia, 2011,13(3):518-529.
［8］ WU B Y, LYU S W, HU B G, et al. Multi-label learning with missing labels for image annotation and facial action unit recognition［J］. Pattern Recognition, 2015,48(7):2279-2289.
［9］ ZHANG M L, ZHOU Z H. Multi-label neural networks with applications to functional genomics and text categorization［J］. IEEE Transactions on Knowledge and Data Engineering, 2006,18(10):1338-1351.
［10］ZHOU Y, XUE H, GENG X． Emotion distribution recognition from facial expressions［C］// Proceedings of the 2015 ACM Multimedia Conference. 2015:1247-1250.
［11］刘阳. 多标签数据分类技术研究［D］. 西安：西安电子科技大学， 2018.
［12］QI H W, ZHOU Y Q, GUO Q. A hierarchical ML-KNN method for complex emotion analysis on customer reviews［C］// International Conference on Mechatronics Engineering and Information Technology. 2016:74-79.
［13］YANG X D, ZHOU L H, WANG L Z. An improved ML-KNN approach based on coupled similarity［C］// Asia-Pacific Web Conference. 2016:77-89.
［14］MAHDAVI-SHAHRI A, HOUSHMAND M, YAGHOOBI M , et al. Applying an ensemble learning method for improving multi-label classification performance［C］// 2016 2nd International Conference of Signal Processing and Intelligent Systems (ICSPIS). 2016:170-175.
［15］余鹰. 多标记学习研究综述［J］. 计算机工程与应用, 2015,51(17):20-27.
［16］李思男,李宁,李战怀. 多标签数据挖掘技术:研究综述［J］. 计算机科学, 2013,40(4):14-21.
［17］郑伟,王朝坤,刘璋,等. 一种基于随机游走模型的多标签分类算法［J］. 计算机学报, 2010,33(8):1418-1426.
［18］ZHANG M L, ZHOU Z H. A review on multi-label learning algorithms［J］. IEEE Transactions on Knowledge and Data Engineering, 2014,26(8):1819-1837.
［19］TSOUMAKAS G, KATAKIS I, VLAHAVAS L. Random k-Labelsets for multilabel classification［J］． IEEE Transactions on Knowledge & Data Engineering， 2011,23(7):1079-1089.
［20］GEURTS P, ERNST D, WEHENKEL L. Extremely randomized trees［J］. Machine Learning, 2006,63(1):3-42.
［21］FOLORUNSO S O, FASHOTO S G, OLAOMI J, et al. A multi-label learning model for psychotic diseases in Nigeria［J］. Informatics in Medicine Unlocked, 2020,19:100326.
［22］林倩瑜. 云服务环境下的大数据多标签属性分类技术［J］. 微电子学与计算机, 2019,36(2):101-104.
［23］王进,王鸿,夏翠萍,等. 基于Spark的组合分类器链多标签分类方法［J］. 中国科学技术大学学报, 2017,47(4):350-357.

[1]	焦一凯1, 2, 朱欣娟1, 2. 公共文化资源标签推荐方法[J]. 计算机与现代化, 2024, 0(10): 107-112.
[2]	唐诗琪, 周瑞平, 谢仕斌, 刘梦赤, 肖文, . 基于栈式降噪编码器的跨语言多标签情感分类[J]. 计算机与现代化, 2023, 0(11): 6-12.
[3]	李传栋, 邱磊, 于雁. 基于改进残差密集网络的心律失常自动分类[J]. 计算机与现代化, 2021, 0(11): 106-111.
[4]	张晶1,2. 基于AdaBoost回归树的多目标预测算法[J]. 计算机与现代化, 2017, 0(9): 89-95,105.
[5]	周恩波，叶荣华，张微微，周子涵. 一种基于成对标签的Rakel算法改进[J]. 计算机与现代化, 2016, 0(3): 16-18+23.