计算机与现代化 ›› 2020, Vol. 0 ›› Issue (07): 71-75.doi: 10.3969/j.issn.1006-2475.2020.07.014

• 人工智能 • 上一篇    下一篇

基于Do-Bi-LSTM模型的电子政务文本相似度评估模型

  

  1. (太原科技大学计算机科学与技术学院,山西太原030024)
  • 出版日期:2020-07-06 发布日期:2020-07-15
  • 作者简介:李凡(1994-),女,山西吕梁人,硕士研究生,研究方向:自然语言处理,E-mail: 2659238942 @qq.com; 白尚旺(1964-),男,教授,博士,研究方向:数据库与软件工程技术,信息管理与决策支持; 党伟超(1974-),男,副教授,博士,研究方向:数据库与软件工程,信息管理与信息系统,分布式计算; 潘理虎(1974-),男,副教授,博士,研究方向:人工智能。
  • 基金资助:
    “十二五”山西科技重大专项项目(20121101001); 山西省中科院科技合作项目(20141101001); 山西省科技攻关项目(20141039)

E-government Text Similarity Evaluation Model Based on Do-Bi-LSTM Model

  1. (School of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan 030024, China)
  • Online:2020-07-06 Published:2020-07-15

摘要: 针对当前政务系统中人工审批文本效率低下的问题,本文将文本相似度引入电子政务中。当前基于文本相似度的网络模型中,存在着生成的词向量矩阵巨大,需要大量的时间去训练,而且仅利用上下文的环境来生成词向量,忽略了文档的语序和语义的关系。为了提高效率并降低训练成本,本文提出基于Do-Bi-LSTM文本相似度计算方法,该模型首先通过Doc2vec语言模型把训练数据集中的文本转换成向量,该方法在词向量的基础上增加了文本向量,从而获取了句子之间以及段落之间的相互关系。然后把得到的向量作为Bi-LSTM网络模型的输入进行模型训练,最后与LSTM网络模型、传统的深度网络模型相比,实验表明本文方法的准确率有很大的提高,具有可行性。

关键词: 文本相似度, Doc2vec, 双向长短期记忆网络

Abstract: In view of the inefficiency of manual approval texts in current government systems, this paper introduces text similarity into e-government. In the current network model based on text similarity, there is a huge matrix of generated word vectors, which requires a lot of time to train, and only uses the context of the context to generate word vectors, ignoring the relationship between the word order and semantics of the document. In order to improve efficiency and reduce training cost, this paper proposes a Do-Bi-LSTM text similarity calculation method, which first converts the text in the training data set into a vector through the Doc2vec language model. This method adds a text vector on the basis of the word vector, so can capture the interrelationship between sentences and between paragraphs. Then the obtained vector is trained as the input of the Bi-LSTM network model. Finally, compared with the LSTM network model and the traditional deep network model, the experiment shows that the accuracy of the method is greatly improved and feasible.

Key words: text similarity, Doc2vec, bi-directional long short-term memory

中图分类号: