Computer and Modernization ›› 2020, Vol. 0 ›› Issue (06): 89-.

Previous Articles     Next Articles

Spam Text Classification Method Based on Deep Q-network

  

  1. (Suzhou Power Supply Branch, State Grid Jiangsu Electric Power Limited Company, Suzhou 215004, China)
  • Received:2019-09-20 Online:2020-06-24 Published:2020-06-28

Abstract: Electronic mail is widely used in people’s daily life. It also serves, however, as a carrier for the proliferation of spam mails filled with false information, malicious software and undesired advertisements. Spam mails not only bring inconvenience but also unnecessarily consume a lot of network resource and even pose a huge threat to their information safety. Therefore, it remains an important task to effectively identify and filter spam mails. Current filtering methods are mainly based on identifying the source and content of mails, which are not effective and require a large amount of artificial labeling and are not sensitive to the changes of spam mails’ content or format. In recent years, researchers have applied deep reinforcement learning to the natural language processing and obtained good results. Therefore, this paper presents a classification method for identifying spam mails based on deep Q-network. The mail text first is preprocessed, then is segmented and is transformed into word vectors using Word2vec model. The deep Q-network is used to filter spam mails based on these word vectors in order to improve efficiency and accuracy. The method makes full use of the CBOW model in Word2vec to obtain the word vector corresponding to each participle in the mail text, and directly processes the obtained word vector with the deep Q-network, without extracting the features of the mail, so as to avoid the negative impact caused by the deviation of feature extraction. The experiment results verify the effectiveness of the method.

Key words: electronic mail, deep Q-network, Word2vec, text classification

CLC Number: