基于犹豫模糊Canopy-K均值聚类算法的研究与应用

计算机与现代化 ›› 2022, Vol. 0 ›› Issue (11): 17-21.

基于犹豫模糊Canopy-K均值聚类算法的研究与应用

（1.曲阜师范大学统计与数据科学学院，山东济宁273165；2.山东农业大学经管学院，山东泰安271018）

出版日期:2022-11-30 发布日期:2022-11-30
作者简介:张子璇(2001—)，女，山东临沂人，本科生，研究方向:决策分析，数据预测; E-mail: 15376096928@163.com；通信作者:沙秀艳（1977—），女，山东枣庄人，副教授，博士，研究方向:数据预测，决策分析，E-mail: shaxiuyan@sina.com。
基金资助:
国家自然科学基金资助项目(12071251); 全国统计科学研究项目(2019LY47); 山东省大学生创新创业训练计划项目(S202010446020)

Research and Application of Hesitant Fuzzy Canopy-K-means Clustering Algorithm

（1. School of Statistics and Data Science, Qufu Normal University, Jining 273165, China;
2. College of Economics and Management, Shandong Agricultural University, Taian 271018, China）

Online:2022-11-30 Published:2022-11-30

摘要/Abstract

摘要： 针对传统K均值聚类算法对初始值敏感、易陷入局部极值点，导致数据分类结果不理想的问题，本文提出一种基于犹豫模糊Canopy-K均值聚类算法。首先利用Canopy算法对原始数据进行初步分类，形成多个数据重合的Canopy中心集合，即得到K均值算法的初始聚类中心。然后再利用K均值聚类算法进行聚类，得到最终的聚类结果。最后结合疫情后复工复产企业评价信息数据进行实例分析，从6个方面对复工复产的5个企业发展情况进行评估。将新提出的算法和基于层次分析的K均值聚类算法进行对比分析。结果表明，新提出的方法较大地减少了迭代次数，聚类结果更加合理、稳定和有效。

关键词: 犹豫模糊集, 聚类分析, K均值聚类, Canopy算法

Abstract: Aiming at the problem that the traditional K-means clustering algorithm is sensitive to the initial value and fall into local extreme points easily, resulting in unsatisfactory data classification results, this paper proposes a hesitant fuzzy Canopy-K-means clustering algorithm. Firstly, the original data is preliminarily classified by the Canopy algorithm, and a set of Canopy centers with overlapping data is formed, that is, the initial cluster center of the K-means algorithm is obtained. Then, the K-means clustering algorithm is used for clustering to obtain the final clustering result. Finally, based on the evaluation information data of enterprises that resumed work and production after the epidemic, an example analysis is carried out, and 5 enterprises that have resumed work and production are analyzed from 6 aspects to evaluate the enterprises’ business development. The new proposed algorithm and the traditional K-means clustering algorithm are compared and analyzed, and the results show that the new proposed method greatly reduces the number of iterations, and the clustering results are more reasonable, stable and effective.

Key words: hesitant fuzzy set, clustering analysis, K-means clustering, Canopy algorithm

张子璇, 沙秀艳, 肖霏, 粟宝婵, 隋雨陆, 孟子宸. 基于犹豫模糊Canopy-K均值聚类算法的研究与应用[J]. 计算机与现代化, 2022, 0(11): 17-21.

ZHANG Zi-xuan, SHA Xiu-yan, XIAO Fei, SU Bao-chan, SUI Yu-lu, MENG Zi-chen. Research and Application of Hesitant Fuzzy Canopy-K-means Clustering Algorithm[J]. Computer and Modernization, 2022, 0(11): 17-21.

参考文献

［1］王若兰. 新冠肺炎疫情对全球经济的影响及应对策略——基于全球生产供应链视角［J］. 国际金融, 2020(4):31-36.
［2］刘卫东. 新冠肺炎疫情对经济全球化的影响分析［J］. 地理研究, 2020,39(7):1439-1449.
［3］许宪春,常子豪,唐雅. 从统计数据看新冠肺炎疫情对中国经济的影响［J］. 经济学动态, 2020(5):41-51.
［4］吴婷婷,朱昂昂. 新冠肺炎疫情对中国经济的影响及应对策略［J］. 南方金融, 2020(5):3-11.
［5］何诚颖,闻岳春,常雅丽,等. 新冠病毒肺炎疫情对中国经济影响的测度分析［J］. 数量经济技术经济研究, 2020,37(5):3-22.
［6］朱武祥,张平,李鹏飞,等. 疫情冲击下中小微企业困境与政策效率提升——基于两次全国问卷调查的分析［J］. 管理世界, 2020,36(4):13-26.
［7］黄庆华,周志波,周密. 新冠肺炎疫情对我国中小企业的影响及应对策略［J］. 西南大学学报(社会科学版), 2020,46(3):56-68.
［8］李鹏,牛少杰. 复工复产重塑地方经济生态［J］. 瞭望, 2020(26):2.
［9］陈一飞,周伟. 重大疫情后全面推进复工复产的思考与建议［J］. 决策与信息, 2020(5):21-27.
［10］中国就业. 复工进行时：施创新举措稳就业大局——推动复工复产的新举措、新亮点及促进稳就业的新建议［J］. 中国就业, 2020(5):4-9.
［11］陈璐. 基于犹豫模糊的综合评价方法研究［D］. 西安:西安工程大学, 2019.
［12］朱君. 上市公司对区域经济发展的影响研究［D］. 长春:东北师范大学, 2014.
［13］ZADEH L A. Fuzzy sets［J］. Information and Control, 1965,8(3):338-353.
［14］TORRA V, NARUKAWA Y. On hesitant fuzzy sets and decision［C］// Proceedings of the 18th IEEE International Conference on Fuzzy Systems. IEEE, 2009:1378-1382.
［15］TORRA V. Hesitant fuzzy sets［J］. International Journal of Intelligent Systems, 2010,25(6):529-539.
［16］XIA M M, XU Z S. Hesitant fuzzy information aggregation in decision making［J］. International Journal of Approximate Reasoning, 2011,52(3):395-407.
［17］ZHU B, XU Z S. Some results for dual hesitant fuzzysets［J］. Journal of Intelligent and Fuzzy Systems, 2014,26(4):1657-1668.
［18］XIA M M. Distance and similarity measuresfor hesitant fuzzy sets［J］. Information Sciences, 2011,181(11):2218-2138.
［19］WANG Z, XU Z S, LIU S S. A netting clustering analysis method under intuitionistic fuzzy environment［J］. Applied Soft Computing, 2011,11(8):5558-5564.
［20］HWANG C, RHEE F C H. Uncertain fuzzy clustering: Interval type-2 fuzzy approach to C-means［J］. IEEE Transactionson Fuzzy Systems, 2007,15(1):107-120.
［21］陈娜. 犹豫模糊环境下的决策方法及聚类算法研究［D］. 南京:东南大学, 2015:107-118.
［22］吴春旭,吴镝,蒋宁. 一种基于信息熵与K均值迭代模型的模糊聚类算法［J］. 中国管理科学, 2008(S1):152-156.
［23］宋建林. K-means聚类算法的改进研究［D］. 合肥:安徽大学, 2016.
［24］夏梅梅. 模糊决策信息集成方式及测度研究［D］. 南京:东南大学, 2012.
［25］杨成德. 高质量发展背景下县域乡镇科技企业评价指标研究——以太仓乡镇科技企业为例［J］. 山西农经, 2020(1):103-104.
［26］杨雄胜,杨臻黛. 企业综合评价指标体系研究［J］. 财政研究, 1998(5):40-47.
［27］LU J F, TANG J B, TANG Z M, et al． Hierarchical initialization approach for K-means clustering［J］． Pattem Recognition Letters, 2008,29(6):787-795．

[1]	孟雅蕾1, 师红宇1, 王予2. 一种无阻流量预测方法[J]. 计算机与现代化, 2024, 0(04): 33-37.
[2]	王秋忆, 周浩, 郑婷婷. 改进RetinaNet的电力设备目标检测方法[J]. 计算机与现代化, 2024, 0(01): 47-52.
[3]	韩雪. 基于约束聚类和粒子群算法的多路径规划[J]. 计算机与现代化, 2023, 0(08): 7-11.
[4]	孙子雨, 任燃, 魏曦哲. 基于DTW-TCN的股票分类及预测研究[J]. 计算机与现代化, 2023, 0(08): 31-37.
[5]	李春生, 冯阳宵, 富宇, 张可佳, 吴润桐. 基于均值聚类的员工行为分析方法[J]. 计算机与现代化, 2022, 0(09): 19-24.
[6]	李舒, 张伟业, 汪坤, 段照斌. 基于聚类分析的航班油耗组合估计[J]. 计算机与现代化, 2022, 0(08): 65-69.
[7]	王杉文, 欧鸥, 马万民, 陈建林. 基于用户信息向量聚类和改进SAMME的推荐算法[J]. 计算机与现代化, 2021, 0(07): 23-28.
[8]	郝敏,刘航,李扬,简单,王俊影. 基于聚类分析与说话人识别的语音跟踪[J]. 计算机与现代化, 2020, 0(04): 7-.
[9]	李娜，毛国君，邓康立. 基于k-means聚类的股票KDJ类指标综合分析方法[J]. 计算机与现代化, 2018, 0(10): 12-.
[10]	赵会群,乔玉衡. 大数据复杂事件分析方法研究与应用[J]. 计算机与现代化, 2018, 0(08): 86-.
[11]	赖琮霖,李力卡,张慧嫦. 多元分析算法在世界杯球队综合实力中的预测[J]. 计算机与现代化, 2018, 0(08): 92-.
[12]	李煜，冯翱，邹书蓉. 基于改进k近邻的直推式支持向量机学习算法[J]. 计算机与现代化, 2018, 0(04): 22-.
[13]	范新南,吴晶晶,史朋飞,张学武. 基于劳伦茨信息值的水下大坝裂缝提取算法[J]. 计算机与现代化, 2018, 0(03): 73-.
[14]	孙笑音，周围. 基于MOOC的大数据分析技术[J]. 计算机与现代化, 2017, 0(4): 89-93,108.
[15]	徐聪,黄文准,黄世奇. 基于自组织映射的遗传聚类算法[J]. 计算机与现代化, 2017, 0(4): 38-43.