计算机与现代化 ›› 2015, Vol. 0 ›› Issue (12): 39-.doi: 10.3969/j.issn.1006-2475.2015.12.008

• 算法设计与分析 • 上一篇    下一篇



  1. 1.南京南瑞集团公司,江苏南京210003;
  • 收稿日期:2015-10-28 出版日期:2015-12-23 发布日期:2015-12-30
  • 作者简介:杨维永(1978-),男,江苏宿迁人,南京南瑞集团公司高级工程师,硕士,研究方向:信息安全及大数据分析; 何军(1978-),男,南京信息工程大学电子与信息工程学院副教授,研究方向: 机器学习与大数据分析; 郑生军(1977-),男,北京国电通网络技术有限公司高级工程师,硕士,研究方向:电力信息系统安全; 张旭东(1969-),男,国网浙江省电力公司信息通信分公司高级工程 师,硕士,研究方向:网络、通信及公司运营监测。
  • 基金资助:
    国家自然科学基金资助项目(61203273); 国家电网公司科技项目(524681140009)

An Outlier Detection Algorithm for Subspace Clustering

  1. 1. NARI Group Corporation, Nanjing 210003, China;

    2. School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, 
    Nanjing 210044, China; 3. Guodiantong Corporation, Beijing 100070, China;
    4. State Grid Information & Telecommunication Branch of Zhejiang Province Electric Co., Hangzhou 310007, China
  • Received:2015-10-28 Online:2015-12-23 Published:2015-12-30



关键词: 大数据处理, 离群点检测, 子空间聚类


There are several challenging difficulties in modern big data analytics, such as missing data, unstructured data, and outlier corruption, etc. The foremost important
preprocess is outlier detection and removal. In this paper, for tackling the popular subspace clustering problem in data analytics, we consider the more challenging scenario in
which the data set is corrupted by sparse outliers. Based on the sparsity assumption, the classic ksubspace algorithm is adapted to incorporate the 1 norm regularization
to alleviate outlier sideeffect. In order to overcome the huge requirements of computation and memory in big data, the modified ksubspace clustering algorithm exploits
stochastic gradient descent (SGD) for fast computation and memory efficiency. Simulation experiments show that even the data set is heavily corrupted by outliers the proposed
approach can guarantee to accurately detect and remove outliers, and furthermore achieves the accurate subspace clustering results.

Key words: big data analytics, outlier detection, subspace clustering
