计算机与现代化 ›› 2022, Vol. 0 ›› Issue (07): 47-53.

• 算法设计与分析 • 上一篇    下一篇

改进EasyEnsemble的软投票策略下的用户购买预测方法

  

  1. (上海理工大学理学院,上海200093)
  • 出版日期:2022-07-25 发布日期:2022-07-25
  • 作者简介:杨进(1978—),女,江苏无锡人,讲师,硕士生导师,博士,研究方向:智能优化,图论与组合优化,E-mail: yangjin.0903@163.com; 张晨(1996—),男,山西临汾人,硕士研究生,研究方向:机器学习,E-mail: zc_19960701@163.com。
  • 基金资助:
    国家自然科学基金资助项目(12071293); 教育部人文社科规划基金资助项目(16YJA630037); 上海市一流学科建设项目(S1201YLXK)

User Purchase Forecast Method Under Softvoting Strategy Based on Improved EasyEnsemble

  1. (School of Science, University of Shanghai for Science and Technology, Shanghai 200093, China)
  • Online:2022-07-25 Published:2022-07-25

摘要: 随着互联网发展,网上购物已经成为人们越来越多的选择。为了更好实现帮助顾客推荐商品的目的,对原有数据进行特征提取,再用互信息的方法对数据进行特征选择;用改进的EasyEnsemble算法处理类别不平衡的问题,利用集成策略弥补欠采样的缺陷,使样本数据得到充分的利用并且降低了正负样本差造成的影响;最后选择使用软投票的方法将XGBoost和随机森林结合为一个终分类器做预测,并与单一的算法相比,从而得到更好的结果。基于阿里巴巴天池大赛所提供的数据,以查准率P、召回率R和F1值为评价指标,分别与当前热门的机器学习算法进行对比,验证了本文方法的有效性。

关键词: 互信息, 类别不平衡, EasyEnsemble, XGBoost

Abstract: With the development of Internet, shopping online has become an increasing choice for people. In order to better achieve the purpose of helping customers to recommend products, the feature of original data is extracted and the feature of the data is selected by mutual information method. The improved EasyEnsemble algorithm is used to deal with the problem of category imbalance, and the defect of under-sampling is compensated by integration strategy. The sample data is fully utilized and the influence caused by positive and negative sample difference is reduced. Finally, the softvoting method is used to combine XGBoost and random forest into a final classifier for prediction, which is compared with the single algorithm, so as to get better results. Based on the data provided by Alibaba Tianchi Competition, the precision rate P, recall R and F1 values are taken as evaluation indexes to compare with the current popular machine learning algorithms respectively to verify the effectiveness of this method.

Key words: mutual information, class-imbalance, EasyEnsemble, XGBoost