计算机与现代化 ›› 2021, Vol. 0 ›› Issue (07): 23-28.

• 算法设计与分析 • 上一篇    下一篇

基于用户信息向量聚类和改进SAMME的推荐算法

  

  1. (成都理工大学信息科学与技术学院(网络安全学院),四川成都610051)
  • 出版日期:2021-08-02 发布日期:2021-08-02
  • 作者简介:王杉文(1997—),男,四川遂宁人,硕士研究生,研究方向:推荐系统,深度学习,E-mail: 302618914@qq.com; 欧鸥(1977—),男,教授,博士,研究方向:智能计算,空间信息技术; 马万民(1998—),男,硕士研究生,研究方向:机器学习; 陈建林(1995—),男,硕士研究生,研究方向:推荐系统。
  • 基金资助:
    国家重点研发计划资助项目(2018YFF01013304)

Recommendation Algorithm Based on User Information Vector Clustering and Improved SAMME#br#

  1. (College of Information Science and Technology (College of Internet Security), Chengdu University of Technology, Chengdu 610051, China)
  • Online:2021-08-02 Published:2021-08-02

摘要: 针对目前主流的推荐算法中获取的用户信息不完整以及推荐时间过长的问题,本文提出一种基于用户信息向量聚类和改进SAMME的推荐算法,该算法通过分析用户基本信息(地域、时间、兴趣、标签等),找出用户信息关键词;对不同用户信息关键词基于TF-IDF方法进行加权构建用户信息向量;接着使用K-means算法进行用户聚类分析,将用户聚类结果作为改进SAMME训练样本集;最后通过改进SAMME算法将预测结果对用户进行好友推荐,并在训练过程中保存模型,大大减少推荐时间。最终将本文算法在真实的微博用户数据集上进行实验,并与其他主流算法进行对比,结果显示本文算法在准确率、召回率、F值上都取得了不错的效果。

关键词: 推荐系统, SAMME算法, 用户信息, 聚类分析

Abstract: Aiming at the problem of imperfect user information acquisition and long recommendation time in the current mainstream recommendation algorithms, this paper proposes a recommendation algorithm based on user information vector clustering and improved SAMME. The algorithm analyzes basic user information (region, time, interest, tags, etc) to find user information keywords; weights different user information keywords based on the TF-IDF method to construct user information vectors; then uses the K-means algorithm to perform user clustering analysis, and uses the user clustering results as improved SAMME training sample set; finally, the prediction results are recommended to the user by the improved SAMME algorithm, and the model is saved during the training process, which greatly reduces the recommendation time. Finally, the algorithm of this paper is tested on the real Weibo user data set and compared with other mainstream algorithms. The results show that the algorithm of this paper  achieves good results in accuracy, recall and F-value.

Key words: recommendation system, SAMME algorithm, user information, cluster analysis