Computer and Modernization

Previous Articles     Next Articles

An Outlier Detection Algorithm for Subspace Clustering

  

  1. 1. NARI Group Corporation, Nanjing 210003, China;

    2. School of Electronic and Information Engineering, Nanjing University of Information Science and Technology, 
    Nanjing 210044, China; 3. Guodiantong Corporation, Beijing 100070, China;
    4. State Grid Information & Telecommunication Branch of Zhejiang Province Electric Co., Hangzhou 310007, China
  • Received:2015-10-28 Online:2015-12-23 Published:2015-12-30

Abstract:

There are several challenging difficulties in modern big data analytics, such as missing data, unstructured data, and outlier corruption, etc. The foremost important
preprocess is outlier detection and removal. In this paper, for tackling the popular subspace clustering problem in data analytics, we consider the more challenging scenario in
which the data set is corrupted by sparse outliers. Based on the sparsity assumption, the classic ksubspace algorithm is adapted to incorporate the 1 norm regularization
to alleviate outlier sideeffect. In order to overcome the huge requirements of computation and memory in big data, the modified ksubspace clustering algorithm exploits
stochastic gradient descent (SGD) for fast computation and memory efficiency. Simulation experiments show that even the data set is heavily corrupted by outliers the proposed
approach can guarantee to accurately detect and remove outliers, and furthermore achieves the accurate subspace clustering results.

Key words: big data analytics, outlier detection, subspace clustering

CLC Number: