计算机与现代化 ›› 2022, Vol. 0 ›› Issue (06): 21-26.

• 算法设计与分析 • 上一篇    下一篇

融合RoBERTa和特征提取的政务热线工单分类

  

  1. (长三角信息智能创新研究院,安徽芜湖241000)
  • 出版日期:2022-06-23 发布日期:2022-06-23
  • 作者简介:陈钢(1982—),男,安徽淮南人,副研究员,博士,研究方向:自然语言处理,E-mail: cheng@ustc.win。
  • 基金资助:
    2021年安徽省重点研究与开发计划项目(202104a05020071); 2020年芜湖市科技计划项目(2020yf41)

Government Hotline Work-order Classification Fusing RoBERTa and Feature Extraction

  1. (Yangtze River Delta Information Intelligence Innovation Research Institute, Wuhu 241000, China)
  • Online:2022-06-23 Published:2022-06-23

摘要: 政务热线承接了海量市民诉求,人工对工单分类耗时费力。现有工单分类方法大多基于机器学习或单一神经网络模型,难以有效理解上下文语义信息,且文本特征提取不全面。针对这一问题,本文提出一种融合RoBERTa和特征提取的政务热线工单分类方法。该方法首先通过基于RoBERTa预训练语言模型的语义编码层获取政务热线工单文本中的语义表征向量,然后通过由CNN-BiGRU-Self-Attention定义的特征提取层获取工单文本的局部特征和全局特征,并对全局特征进行处理以凸显重要性高的语义特征,最后将融合后的特征向量输入分类器来完成工单分类。实验结果表明,相较于其他基线分类方法,本文提出的方法能够取得更好的工单分类效果。

关键词: 政务热线, 工单分类, RoBERTa, 语义编码, 特征提取

Abstract: Government hotlines undertake a large number of citizens’ demands, which make manual work-order classification time-consuming and laborious. Most of the existing work-order classification methods are based on machine learning or single neural network model. With these methods, it is difficult to effectively understand the context semantic information, and the text feature extraction is not comprehensive. A government hotline work-order classification method fusing RoBERTa and feature extraction is proposed to address the above problems. The proposed method firstly obtains context-aware semantic feature vectors from textual descriptions of work-orders by RoBERTa pre-trained language model. Then, a feature extraction layer based on convolution neural network, bidirectional gated recurrent unit and Self-Attention mechanism is constructed to obtain the local and global features of the work-order semantic encodings, with the process of highlighting the semantic features with great importance for the global features. Finally, the fused feature vectors are input into the classifier to finish work-order classification. Experimental results show the proposed method can achieve better classification performance compared with several baseline methods.

Key words: government hotline, work-order classification, RoBERTa, semantic encoding, feature extraction