Computer and Modernization ›› 2020, Vol. 0 ›› Issue (10): 17-22.

Previous Articles     Next Articles

Spam E-mail Recognition Based on Cluster Analysis Algorithm

  

  1. (School of Computer and Information Technology, Northeast Petroleum University, Daqing 163318, China)
  • Online:2020-10-14 Published:2020-10-14

Abstract: For spam recognition methods used in the past, in the face of today’s fast updating and a wide variety of word segmentation, it is difficult to accurately identify the key word segmentation in a e-mail, the application ability of the recognition methods needs to be further improved. To this end, a spam recognition method based on cluster analysis algorithm is proposed. Firstly, we preprocess e-mail samples to get the key word segmentation of the e-mail text content, remove the stop words, and calculate the weight of the word segmentation according to the frequency of the word segmentation in the e-mail text. Then, combining with the e-mail feature attributes, we construct the e-mail feature space, and quantify the e-mail feature. Lastly, the features of the e-mail are extracted and processed for dimensionality reduction, which is used as the input of the clustering algorithm, and the output result is iteratively calculated to complete the identification of spam. The experimental results show that the designed spam e-mail recognition method based on cluster analysis algorithm is more accurate in keyword extraction and word segmentation, and can accurately identify spam e-mails, which shows the practical application ability of the designed spam e-mail recognition method based on cluster analysis algorithm has been improved.

Key words: clustering algorithm, spam, word segmentation, text clustering