计算机与现代化 ›› 2024, Vol. 0 ›› Issue (09): 56-60.doi: 10.3969/j.issn.1006-2475.2024.09.010

• 算法设计与分析 • 上一篇    下一篇

基于相异度矩阵的碎片化回复文本聚类方法


  

  1. (1.国家电网福建省电力公司,福建 福州 350000; 2.福建亿榕信息技术有限公司,福建 福州 350003;
    3.国家电网有限公司,北京 100000)
  • 出版日期:2024-09-27 发布日期:2024-09-29
  • 基金资助:
    福建省科技项目(SGFJ0000KXJS1700225)

Text Clustering Method for Fragmented Reply Based on Dissimilarity Matrix

  1. (1. State Grid Fujian Electric Power Company, Fuzhou 350000, China; 2. Fujian Yirong Information Technology Co., Ltd., Fuzhou 350003, China; 3. State Grid Corporation of China, Beijing 100000, China)
  • Online:2024-09-27 Published:2024-09-29

摘要: 针对问答社区碎片化回复文本中有效抽取所需文本信息的问题,本文提出一种基于相异度矩阵的碎片化回复文本聚类方法。首先,根据文本之间相异度设计聚类中心,以聚类方式将社区中碎片化回复文本分类;然后,使用基于RNN+CNN的问题文本特征提取方法提取用户问题的文本特征;最后,结合提取的问题文本特征,使用基于TF-IDF算法的抽取式文本自动生成算法,实现回复文本信息的快速自动提取。实验结果表明本文方法可以自动抽取所需文本信息,抽取结果精度高且稳定,可应用于问答社区碎片化回复文本的抽取。

关键词: 问答社区, 碎片化回复文本, 自动抽取, 聚类, 相异度

Abstract:  In response to the problem of effectively extracting the required text information from fragmented reply texts in Q&A communities, this paper proposes a clustering method for fragmented reply texts based on dissimilarity matrix. Firstly, the clustering center is designed based on dissimilarity between texts and the fragmented reply texts in the community are classified by the clustering way. Then, the text features of user questions are extracted based on RNN+CNN. Finally, the automatic extraction of fragmented response text is achieved based on TF-IDF algorithm using the extracted question text features. The experimental results show that the proposed method can automatically extract the required text information with high accuracy and stability, and can be applied to the extraction of fragmented reply texts in question answering communities.

Key words: question-answer community, fragmented reply text, automatic extraction, clustering, dissimilarity

中图分类号: