计算机与现代化 ›› 2021, Vol. 0 ›› Issue (07): 54-59.

• 数据库与数据挖掘 • 上一篇    下一篇

基于动态双子种群的差分进化K中心点聚类算法

  

  1. (广州华商学院数据科学学院,广东广州511300)
  • 出版日期:2021-08-02 发布日期:2021-08-02
  • 作者简介:邓斌涛(1966—),男,广东广州人,工程师,硕士,研究方向:计算机网络与云计算,E-mail: dengbintao_2020@126.com; 徐胜超(1980—),男,湖北武汉人,讲师,硕士,研究方向:计算机网络与云计算,E-mail: isdooropen@126.com。
  • 基金资助:
    广州华商学院校内导师制科研项目(2020HSDS04)

A Differential Evolution K-mediods Clustering Algorithm Based on Dynamic Gemini Population

  1. (School of Data Science, Guangzhou Huashang College,  Guangzhou 511300, China)
  • Online:2021-08-02 Published:2021-08-02

摘要: 随着海量大数据的出现,聚类算法需要新型计算模式来提高计算速度与运行效率。本文提出一种基于动态双子种群的差分进化K中心点聚类算法DGP-DE-K-mediods(Dynamic Gemini Population based DE-K-mediods)。DGP-DE-K-mediods利用动态双子种群方法,解决聚类算法在维持种群密度的时候避免陷入局部最优的问题;采用差分进化(Differential Evolution, DE)算法来提高全局最优能力的强健性;基于Hadoop云平台来并行处理DGP-DE-K-mediods,加快算法的运行速度和效率;描述基于MapReduce的并行聚类算法的编程过程;DGP-DE-K-mediods利用UIC的大数据分类的案例数据和网络入侵检测这种大数据应用来仿真算法的效果。实验结果表明,与已有的聚类算法相比,DGP-DE-K-mediods在检测精度、运行时间上有明显的优势。

关键词: 云计算, 并行处理, K中心点聚类, 差分进化, 入侵检测系统

Abstract: With the appearance of massive big data, some new parallel computing models have been proposed for clustering algorithm. A dynamic gemini population based differential evolution K-mediods clustering algorithm called DGP-DE-K-mediods in cloud environments is proposed in this paper. In DGP-DE-K-mediods, gemini population scheme is adopted to improve the problem of being easily trapped into a local optimum while maintaining population diversity. The differential evolution algorithm is also used to make DGP-DE-K-mediods have strong global optimization capabilities. The DGP-DE-K-mediods clustering algorithm is designed and implemented in parallel under the Hadoop MapReduce framework and thus the time of the big data process has been significantly reduced. The programming model of MapReduce has been also described in detail for parallel cluster algorithm. A serial of simulation experiments are also done using UIC datasets and the intrusion detection processing of big data. Experimental results show that the overall detection effect of DGP-DE-K-mediods is significantly better than the existing intrusion detection algorithms.

Key words: cloud computing, parallel process, K-mediods clustering, differential evolution, intrusion detection system