计算机与现代化

• 数据库与数据挖掘 • 上一篇    下一篇

基于均匀抽样的二阶差分聚类数确定方法

  

  1. 1.东华大学信息科学与技术学院,上海201620;2.湖南人文科技学院自动化系,湖南娄底417000
  • 收稿日期:2017-03-10 出版日期:2017-10-30 发布日期:2017-10-31
  • 作者简介:陈艳(1993-),女,湖南岳阳人,东华大学信息科学与技术学院硕士研究生,研究方向:计算机,软件定义网络; 通信作者:陈光(1957-),男,广东汕头人,教授,博士,研究方向:无线移动通信,电磁场理论。
  • 基金资助:
    国家自然科学基金资助项目(61671006)

A Method for Determining Two Order Difference Cluster Number Based on Uniform Sampling

  1.  
    1. College of Information Science & Technology, Donghua University, Shanghai 201620, China;
     2. Department of Automation, Hunan Institute of Humanities, Science and Technology, Loudi 417000, China
  • Received:2017-03-10 Online:2017-10-30 Published:2017-10-31

摘要: 目标函数二阶差分方法利用目标函数值随类别数的梯度变化作为判定准则,直接利用目标函数值与聚类数的关系,实现在不同数据集上自动得到正确的聚类数,但计算最佳聚类数会占用一定时间。当样本总数较大时,采用该方法得到最佳聚类数计算量非常大。针对此问题,本文提出基于均匀抽样的二阶差分聚类数确定方法,首先采用改进的均匀抽样设计,然后在所得到的数据子集上进行二阶差分设计。实验结果表明,该方法在减少计算量的同时达到了期望的正确判断。

关键词: 二阶差分, 最佳聚类数, 均匀抽样设计

Abstract: Two order difference method for objective function uses it as a decision criteria that value of the objective function changes with the gradient of classes number. The two order difference algorithm directly uses the relation between the objective function value and the number of clusters to achieve the correct number of clusters on different data sets. But the calculation of the optimal cluster number will occupy a period of time. When the number of samples is large, the amount of calculation of using this method to obtain the optimum clustering number, will be also very large. To solve this problem, this paper proposes a method for determining two order difference cluster number based on uniform sampling. First, the improved uniform sampling design is adopted, and then the two order difference design is carried out on the subset of the data obtained. Experimental results show that this method not only can greatly reduce the amount of calculation, but also can achieve the desired correct judgment.

Key words: two order difference, optimal cluster number, uniform sampling design