Computer and Modernization ›› 2021, Vol. 0 ›› Issue (09): 113-120.

Previous Articles     Next Articles

S2R2: Semi-supervised Feature Selection Based on Analysis of Relevance and Redundancy

  

  1. (1. College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China;
    2. Collaborative Innovation Center of Novel Software Technology and Industrialization, Nanjing 210093, China;
    3. College of Civil Aviation, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China)
  • Online:2021-09-14 Published:2021-09-14

Abstract: Feature selection is one of the key problems of pattern recognition and data mining, which can be removed dataset redundant and irrelevant features to improve learning performance. Based on the max-relevance and min-redundancy criteria, a novel semi-supervised feature selection method based on relevance and redundancy analysis is proposed. This new method is independent of any classification learning algorithm. Firstly, unsupervised relevance is analyzed and expanded. Then it is combined with information gain to form a semi-supervised feature relevance and redundancy measures, which can effectively identify and remove irrelevant and redundant features. Finally, an incremental forward search is used to construct feature subset in a greedy manner, which avoiding the search for exponential solution spaces and improving algorithm efficiency. This article also proposes the FS2R2 method as a fast version of the S2R2 method to deal with large-scale problems. The experimental results on standard data sets illustrate the effectiveness and superiority of  the proposed approaches.

Key words: semi-supervised learning, feature selection, information theory, max-relevance and min-redundancy