计算机与现代化 ›› 2020, Vol. 0 ›› Issue (09): 19-24.doi: 10.3969/j.issn.1006-2475.2020.09.004

• 数据库与数据挖掘 • 上一篇    下一篇

多特征融合的教育资源标签生成算法

  

  1. (1.长沙理工大学物理与电子科学学院,湖南长沙410114;
    2.长沙理工大学近地空间电磁环境监测与建模湖南省普通高校重点实验室,湖南长沙410114)
  • 收稿日期:2020-02-11 出版日期:2020-09-24 发布日期:2020-09-24
  • 作者简介:李雯(1995—),女,湖南郴州人,硕士研究生,研究方向:数据挖掘及应用,E-mail: 271254918@qq.com; 通信作者:文勇军(1975—),男,讲师,博士,研究方向:网络信息安全与应用,E-mail: micowen@csust.edu.cn; 唐立军(1963—),男,教授,博士生导师,研究方向:信号检测与处理,E-mail: tanglj2009@163.com。
  • 基金资助:
    国家科技支撑计划课题(2014BAH08F04); 湖南省重点研发计划项目(2018GK2054); 湖南省教育厅科学研究项目(17k004); 湖南省研究生科研创新项目(CX2018B575); 近地空间电磁环境监测与建模湖南省高校重点实验室开放基金资助项目(N201907)

A Multi-feature Fusion Algorithm for Label Generation of Educational Resources

  1. (1. School of Physical & Electric Science, Changsha University of Science & Technology, Changsha 410114, China;
    2. Hunan Province Higher Education Key Laboratory of Modeling and Monitoring on the Near-earth 
    Electromagnetic Environments, Changsha University of Science & Technology, Changsha 410114, China)
  • Received:2020-02-11 Online:2020-09-24 Published:2020-09-24

摘要: 利用标签的形式简单有效地对教育资源进行准确描述,对互联网中杂乱、庞大的教育资源进行高效分类,能使用户便捷地浏览和获取教育资源信息并提高教育资源的利用率。自然语言处理中生成文本标签的方法有很多种,但特征描述不全面,因此需要研究多特征融合的标签生成方法。本文结合中文文本的特点,在TextRank算法基础上,加入TF-IDF权重和位置信息权重,考虑词语在语料库中的信息及在文章中的位置信息,生成包括语料库信息和位置信息的标签,形成多特征融合的标签生成算法。测试结果及分析表明,多特征融合后的标签生成算法最高F值为0.571,其平均值为0.34,优于常用的TextRank算法和TF-IDF算法,有效提高了教育资源标签质量,有利于教育资源更好的利用和管理。

关键词: 教育资源标签, TextRank算法, TF-IDF算法, 标签生成, 算法改进

Abstract: In the form of tags, educational resources can be accurately described in a simple and effective way, and the messy and huge educational resources in the Internet can be classified efficiently, so that users can browse and obtain educational resource information conveniently and the utilization rate of educational resources  is improved. There are many methods to generate text tags in natural language processing, but the description of features is not comprehensive. Therefore, the method of label generation for multi-feature fusion is studied. Combining with the characteristics of Chinese text, adding TF-IDF weights and location information weights on the basic of TextRank algorithm, considering the information of words in the corpus and the position information in the article, the labels including corpus information and position information are generated to form a multi-feature fusion algorithm for label generation. The test results and analysis show that the maximum F-measure value of the improved TextRank algorithm is 0.571 and its average value is 0.34, which is better than the commonly TextRank algorithm and TF-IDF algorithm, and the improved TextRank algorithm can effectively improve the quality of educational resource labels, which is beneficial to better utilization and management of educational resources.

Key words: educational resource lable, TextRank algorithm, TF-IDF algorithm, lable generation, algorithm improvement

中图分类号: