E-government Text Similarity Evaluation Model Based on Do-Bi-LSTM Model

doi:10.3969/j.issn.1006-2475.2020.07.014

Abstract

Abstract: In view of the inefficiency of manual approval texts in current government systems, this paper introduces text similarity into e-government. In the current network model based on text similarity, there is a huge matrix of generated word vectors, which requires a lot of time to train, and only uses the context of the context to generate word vectors, ignoring the relationship between the word order and semantics of the document. In order to improve efficiency and reduce training cost, this paper proposes a Do-Bi-LSTM text similarity calculation method, which first converts the text in the training data set into a vector through the Doc2vec language model. This method adds a text vector on the basis of the word vector, so can capture the interrelationship between sentences and between paragraphs. Then the obtained vector is trained as the input of the Bi-LSTM network model. Finally, compared with the LSTM network model and the traditional deep network model, the experiment shows that the accuracy of the method is greatly improved and feasible.

Key words: text similarity, Doc2vec, bi-directional long short-term memory

CLC Number:

TP311

LI Fan, BAI Shang-wang, DANG Wei-chao, PAN Li-hu. E-government Text Similarity Evaluation Model Based on Do-Bi-LSTM Model[J]. Computer and Modernization, 2020, 0(07): 71-75.

References

［1］ OSMAN C C, ZALHAN P G. From natural language text to visual models: A survey of issues and approaches［J］. Informatica Economica, 2016,20(4):44-61.
［2］姜华,韩安琪,王美佳,等. 基于改进编辑距离的字符串相似度求解算法［J］. 计算机工程, 2014,40(1):222-227.
［3］ ZHENG T, GAO Y M, WANG F, et al. Detection of medical text semantic similarity based on convolutional neural network［J］. BMC Medical Informatics and Decision Making, 2019,156: DOI： 10.1186/s12911-019-0880-2.
［4］ PUTRA M E W, SUWARDI I S. Structural off-line handwriting character recognition using approximate subgraph matching and levenshtein distance［J］. Procedia Computer Science, 2015,59:340-349.
［5］ GRABOWSKI S. A note on the longest common substring with k-mismatches problem［J］. Information Processing Letters, 2015,115(6-8):640-642.
［6］张波. 基于维基百科链接特征的词语语义相似度计算［J］. 软件工程, 2019,22(10):36-43.
［7］刘文,马慧芳,脱婷,等. 融合共现距离和区分度的短文本相似度计算方法［J］. 计算机工程与科学, 2017,29(3):52-53.
［8］徐鑫鑫,刘彦隆,宋明. 利用加权词句向量的文本相似度计算方法［J］. 小型微型计算机系统, 2019,40(10):2072-2076.
［9］ ZHAO C G, WANG Z. GOGO: An improved algorithm to measure the semantic similarity between gene ontology terms［J］. Scientific Reports, 2018,8:15107:DOI:10.1038/s41598-018-33219-y.
［10］ABEYSINGHE R, QU X F, CUI L C. Identifying similar non-lattice subgraphs in gene ontology based on structural isomorphism and semantic similarity of concept labels［C］// AMIA Symposium. 2018:1186-1195.
［11］HASSANZADEH H, NGUYEN A, VERSPOOR K. Quantifying semantic similarity of clinical evidence in the biomedical literature to facilitate related evidence synthesis［J］. Journal of Biomedical Informatics, 2019:103321:DOI: 10.1016/j.jbi.2019.103321.
［12］张河苇,金剑,董绍华,等. 语义相似度计算在内检测数据参数匹配中的应用［J］. 石油科学通报, 2018,3(4):446-451.
［13］乔晶晶,段利国,李爱萍. 融合多种特征的实体对齐算法［J］. 计算机工程与设计, 2018,39(11):3395-3400.
［14］游彬,严岳松,孙英阁,等. 基于HowNet的信息量计算语义相似度算法［J］. 计算机系统应用, 2013,22(1):129-133.
［15］庄严,李国良,冯建华. 知识库实体对齐技术综述［J］. 计算机研究与发展, 2016,53(1):165-192.
［16］马慧芳,刘文,李志欣,等. 融合耦合距离区分度和强类别特征的短文本相似度计算方法［J］. 电子学报, 2019,47(6):1331-1336.
［17］翟社平,李兆兆,段宏宇,等. 多特征融合的句子语义相似度计算方法［J］. 计算机工程与设计, 2019,40(10):2867-2873.
［18］杨波,杨文忠,殷亚博,等. 基于词向量和增量聚类的短文本聚类算法［J］. 计算机工程与设计, 2019,40(10):2985-2990.
［19］刘一丁,陈晓琳,尹晓阳,等. 资源贫乏型语言间文本相似度计算方法［J］. 指挥信息系统与技术, 2019,10(4):27-32.
［20］SONG M, HEO G E, DING Y. SemPathFinder: Semantic path analysis for discovering publicly unknown knowledge［J］. Journal of Informetrics, 2015,9(4):686-703.
［21］O’SHEA K, CROCKETT K, BANDAR Z, et al. Natural language scripting within conversational agent design［J］. Applied Intelligence, 2014,198:DOI:10.1007/s10489-012-0408-2.
［22］CHUA C C, LIM T Y, SOON L K, et al. Meaning preservation in example-based machine translation with structural semantics［J］. Expert Systems with Applications, 2017,78:242-258.
［23］BEEKHUIZEN B, STEVENSON S. More than the eye can see: A computational model of color term acquisition and color discrimination［J］. Cognitive Science, 2018,42(8):2699-2734.

[1]	XU Ya-xin, HE Ze-en, XU Xu-kan. Automatic Classification Method of CNC Machine Tool Fault Text Based on CNN-BiLSTM [J]. Computer and Modernization, 2023, 0(04): 7-14.
[2]	SHAO Meng-qiao, JI Shun-hui, ZHANG Peng-cheng. AC-Rec: Academic Collaborators Recommendation Method Based on Multi-features [J]. Computer and Modernization, 2021, 0(03): 94-100.
[3]	XIA Zhi-ming1，2, LIU Xin1，2. A Similarity Algorithm for Chinese Text Based on Semantics [J]. Computer and Modernization, 2015, 0(4): 6-9.