计算机与现代化

• 人工智能 • 上一篇    下一篇

基于卷积神经网络的互联网短文本分类方法

  

  1. 中原工学院计算机学院,河南郑州450007
  • 收稿日期:2016-08-23 出版日期:2017-04-20 发布日期:2017-05-08
  • 作者简介:郭东亮(1991-),男,河南林州人,中原工学院计算机学院硕士研究生,研究方向:自然语言处理; 刘小明(1979-),男,河南许昌人,讲师,博士,研究方向:机器学习,自然语言处理; 郑秋生(1965-),男,河南郑州人,教授,硕士,研究方向:信息安全,数据资源管理。
  • 基金资助:
    河南省科技攻关项目(132102310284); 河南省教育厅科学技术研究重点项目( 14A520015)

Internet Short-text Classification Method Based on CNNs

  1. School of Computer Science, Zhongyuan University of Technology, Zhengzhou 450007, China
  • Received:2016-08-23 Online:2017-04-20 Published:2017-05-08

摘要: 互联网短文本的分类是自然语言处理的一个研究热点。本文提出一种基于卷积神经网络(Convolutional Neural Networks,CNNs)互联网短文本分类方法。首先通过Word2vec的Skip-gram模型获得短文特征,接着送入CNNs中进一步提取高层次特征,最后通过K-max池化操作后放入Softmax分类器得出分类模型。在实验中,该方法和机器学习方法以及DBN方法相比,结果表明本文方法不仅解决了文本向量的维数灾难和局部最优解问题,而且有效地提高了互联网短文本两级分类准确率,证实了基于CNNs的互联网短文本分类的有效性。

关键词: 卷积神经网络, 短文本分类, 深度学习, 机器学习

Abstract: The Internet short-text classification is a hot research topic in natural language processing. This paper presents a short text classification method based on deep learning’s convolutional neural networks. First short-text features are achieved by the Skip-gram model of Word2vec, then it is sent into the CNNs to extract high-level features, after the K-max pooling, it is put into the Softmax classifier to get a classification model. In the Internet short-text classification experiments, compared to machine learning and DBN’s method, the results show that the proposed method not only solves the problems of the curse of dimensionality of text vector and the local optimal solution, but also effectively improves the accuracy of Internet short-text classification, and confirms the validity of the Internet short-text classification method based on CNNs.

Key words:  CNNs, short-text classification, deep learning, machine learning

中图分类号: