Computer and Modernization ›› 2022, Vol. 0 ›› Issue (11): 1-8.

    Next Articles

Categorical Data Clustering Based on Extraction of Associations from Co-association Matrix

  

  1. (Department 8 of System, North China Institute of Computing Technology, Beijing 100083, China)
  • Online:2022-11-30 Published:2022-11-30

Abstract: Categorical data clustering is widely used in different fields in the real world, such as medical science, computer science,  etc. The usual categorical data clustering is studied based on the dissimilarity measure. For data sets with different characteristics, the clustering results will be affected by the characteristics of the data set itself and noise information. In addition, the categorical data clustering based on representation learning is too complicated to implement, and the clustering results are greatly affected by the representation results. Based on the co-association matrix, this paper proposes a clustering method that can directly consider the relationship between the original information of categorical data, categorical data clustering based on extraction of associations from co-association matrix (CDCBCM). The co-association matrix can be regarded as a summary of the information association in the original data space. The co-association matrix is constructed by calculating the co-association frequency value of different objects in each attribute subspace, and some noise information is removed from the co-association matrix, and then the clustering result is obtained by normalized cut. The method is tested on 16 publicly available datasets in various aspects, compared with 8 existing methods, and detected using the F1-score metric. The experimental results show that this method has the best effect on 7 data sets, the average ranking is the best, and it can better complete the clustering task of categorical data.

Key words: categorical data, categorical data clustering, machine learning, co-association matrix, normalized cut