基于熵与邻域约束的模糊C均值改进算法

计算机与现代化 ›› 2021, Vol. 0 ›› Issue (11): 89-94.

基于熵与邻域约束的模糊C均值改进算法

(南京理工大学理学院,江苏南京210094)

出版日期:2021-12-13 发布日期:2021-12-13
作者简介:冯俊淇（1997—），男，辽宁沈阳人，硕士研究生，研究方向：数据挖掘，E-mail： 1193875868@qq.com；通信作者：张正军（1965—），男，江苏阜宁人，副教授，硕士生导师，博士，研究方向：数据挖掘，E-mail： zjzhang@njust.edu.cn；章曼（1998—），女，安徽安庆人，硕士研究生，研究方向：数据挖掘，E-mail： 1277167538@qq.com；严涛（1977—），男，江苏泰兴人，副教授，硕士生导师，博士，研究方向：最优化理论与算法，E-mail： tyan@njust.edu.cn。
基金资助:
国家自然科学基金资助项目（11671205, 61773014）

Improved FCM Algorithm Based on Entropy and Neighborhood Constraint

(School of Science, Nanjing University of Science and Technology, Nanjing 210094, China)

Online:2021-12-13 Published:2021-12-13

摘要/Abstract

摘要： 针对模糊C均值(FCM)聚类算法没有考虑样本不同属性的重要程度、邻域信息等问题，提出一种基于熵与邻域约束的FCM算法。首先通过计算样本各属性的熵值来为各属性赋予权重，结合属性权重改进距离度量函数；随后根据邻域样本与中心样本间的距离计算邻域隶属度权重，加权得到邻域隶属度，利用邻域隶属度约束目标函数，修正隶属度迭代过程，最终达到提升FCM聚类算法性能的目的。理论分析和在人造数据集、多个UCI数据集的试验结果表明，改进后的算法在聚类效果、鲁棒性上均优于传统FCM算法、PCM算法、KFCM算法、KPCM算法和DSFCM算法，表明了本文算法的有效性。

关键词: 模糊C均值算法, 聚类算法, 邻域信息, 熵权法

Abstract: Aiming at the problems of fuzzy C-means (FCM) clustering algorithm that does not consider the importance of different attributes of samples and neighborhood information, a FCM algorithm based on entropy and neighborhood constraints is proposed. First the entropy value of each attribute of the sample is calculated to give weight to each attribute, the attribute weight is combined to improve the distance measurement function; then the neighborhood membership weight is calculated according to the distance between the neighborhood sample and the center sample, and the neighborhood membership is got by weighting. The membership degree of the neighborhood constrains the objective function, and the iterative process of the degree of membership is modified, finally the purpose of improving the performance of the FCM clustering algorithm is achieved. Theoretical analysis and experimental results on artificial data sets and multiple UCI data sets show that the improved algorithm is superior to the traditional FCM algorithm, PCM algorithm, KFCM algorithm, KPCM algorithm, and DSFCM algorithm in terms of clustering effect and robustness, which shows the effectiveness of this algorithm.

Key words: fuzzy C-means algorithm, clustering algorithm, neighborhood information, entropy weight method

冯俊淇, 张正军, 章曼, 严涛. 基于熵与邻域约束的模糊C均值改进算法[J]. 计算机与现代化, 2021, 0(11): 89-94.

FENG Jun-qi, ZHANG Zheng-jun, ZHANG Man, YAN Tao. Improved FCM Algorithm Based on Entropy and Neighborhood Constraint[J]. Computer and Modernization, 2021, 0(11): 89-94.

参考文献

［1］ MACQUEEN J. Some methods for classification and analysis of multivariate observations［C］// Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability. 1967:281-297.
［2］ CHEN X Q, PENG H, HU J S. K-medoids substitution clustering method and a new clustering validity index method［C］// Proceedings of the 6th World Congress on Intelligent Control and Automation. 2006:5896-5900.
［3］ GALAN S F. Comparative evaluation of region query strategies for DBSCAN clustering［J］. Information Sciences, 2019,502:76-90.
［4］郑志娴，吴为民，李慧敏. 基于CURE聚类优化的数据挖掘算法研究［J］. 哈尔滨商业大学学报（自然科学版）, 2017,33(6):723-727.
［5］杨洁，王国胤，王飞. 基于密度峰值的网格聚类算法［J］. 计算机应用， 2017,37(11):3080-3084.
［6］ DUNN J C. A fuzzy relative of the isodata process and its use in detecting compact well-separated clusters［J］. Journal of Cybernetics, 1973,3(3):32-57.
［7］ BEZDEK J C. Pattern Recognition with Fuzzy Objective Function Algorithms［M］. Springer, 1981.
［8］王国伟，闫丽，姚玉霞.基于熵权法加权的模糊C均值聚类算法研究［J］. 农业网络信息, 2010(8):148-150.
［9］ HATHAWAY R J , HU Y K. Density-weighted fuzzy C-means clustering［J］. IEEE Transactions on Fuzzy Systems, 2009,17(1):243-252.
［10］王丽娟，关守义，王晓龙，等. 基于属性权重的Fuzzy C Mean算法［J］. 计算机学报， 2006(10):1797-1803.
［11］林甲祥，吴丽萍，巫建伟，等. 基于样本与特征双加权的自适应FCM聚类算法［J］. 黑龙江大学自然科学学报, 2018,35(2):244-252.
［12］周世波,徐维祥,徐良坤. 融合密度峰值和空间邻域信息的FCM聚类算法［J］. 仪器仪表学报, 2019,40(4):137-144.
［13］WU Z H, WU Z C, ZHANG J. An improved FCM algorithm with adaptive weights based on SA-PSO［J］. Neural Computing and Applications, 2017,28(10):3113-3118.
［14］钱雪忠，姚琳燕. 面向稀疏高维大数据的扩展增量模糊聚类算法［J］. 计算机工程, 2019,45(6):75-81.
［15］肖满生，肖哲，文志诚，等. 一种空间相关性与隶属度平滑的FCM改进算法［J］. 电子与信息学报, 2017,39(5):1123-1129.
［16］吴鹏. 基于点密度与邻域信息的模糊C均值算法［J］. 软件导刊, 2018,17(4):85-88.
［17］ZARINBAL M, ZARANDI M H F, TURKSEN I B. Relative entropy fuzzy C-means clustering［J］. Information Sciences, 2014,260:74-97.
［18］高云龙，王志豪，潘金艳，等.基于自适应松弛的鲁棒模糊C均值聚类算法［J］. 电子与信息学报, 2020,42(7):1774-1781.
［19］肖满生，张居武. 一种基于子集测度的FCM聚类加权指数计算方法［J］. 模糊系统与数学, 2013,27(2):136-141.
［20］肖满生，肖哲，文志强，等. 模糊C均值聚类区间型模糊化参数模型［J］. 系统工程与电子技术, 2015,37(4):868-873.
［21］陈小辉，张功萱. 基于信息熵的符号属性精确赋权聚类方法［J］. 重庆邮电大学学报(自然科学版), 2014，26(6):850-855.
［22］原福永,张晓彩,罗思标. 基于信息熵的精确属性赋权K-means聚类算法［J］. 计算机应用, 2011,31(6):1675-1677.
［23］VINH N X, EPPS J, BAILEY J. Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance［J］. Journal of Machine Learning Research, 2010，11:2837-2854.

[1]	敖博超, 范冰冰. 基于AP聚类算法的联邦学习聚合算法[J]. 计算机与现代化, 2024, 0(04): 5-11.
[2]	丁绪东, 杨东润, 刘慧, 赵星凯, 张迎, 孙梅, . 数据驱动的蒸发器在线建模方法[J]. 计算机与现代化, 2022, 0(11): 22-31.
[3]	申智, 徐丽, 符祥远. 基于改进YOLO v4光线模糊场景下交通标志检测[J]. 计算机与现代化, 2022, 0(07): 27-32.
[4]	肖宏宇, 曾文驱, 王淑营. 基于模型特征匹配的BIM模型混合推荐算法[J]. 计算机与现代化, 2022, 0(01): 28-32.
[5]	蔡丽萍, 张晨晨, 李世宝, 刘建航. 移动群智感知中图片情境信息的聚类动态查找算法[J]. 计算机与现代化, 2021, 0(07): 43-48.
[6]	郑钦浩, 杨贞, 杨振 . 面向车辆和行人检测的KM-SSD方法[J]. 计算机与现代化, 2021, 0(03): 51-56.
[7]	杨文亮, 冯慧芳. 基于出租车GPS轨迹的城市区域时空交互特征分析[J]. 计算机与现代化, 2021, 0(01): 87-93.
[8]	曹磊, 刘强, 姚辉. 基于改进聚类算法构建智慧医院的研究与实践[J]. 计算机与现代化, 2020, 0(12): 38-42.
[9]	盖璇. 基于聚类分析算法的垃圾邮件识别[J]. 计算机与现代化, 2020, 0(10): 17-22.
[10]	常雪，石鸿雁. 基于改进蝙蝠算法优化的FCM聚类算法[J]. 计算机与现代化, 2020, 0(05): 29-.
[11]	张子晔1,刘玉龙1,呼北2. 基于数据虚拟化技术的多来源数据集成方法[J]. 计算机与现代化, 2019, 0(11): 18-.
[12]	余丽玲，金浩宇 . 基于K-均值聚类的RBF神经网络血糖浓度预测[J]. 计算机与现代化, 2019, 0(03): 9-.
[13]	孙源,刘汉强. 基于萤火虫算法的无监督最小视觉差彩色图像分割[J]. 计算机与现代化, 2018, 0(12): 85-.
[14]	邹臣嵩1,刘松2. 基于谱聚类的全局中心快速更新聚类算法[J]. 计算机与现代化, 2018, 0(10): 6-.
[15]	刘微，杨慧婕，刘守印. 基于ACCA-FCM和SVM-RFE的蓄电池SOH特征选择算法[J]. 计算机与现代化, 2018, 0(01): 11-18.