计算机与现代化 ›› 2021, Vol. 0 ›› Issue (08): 70-76.

• 人工智能 • 上一篇    下一篇

基于混合采样与Random_Stacking的软件缺陷预测

  

  1. (青岛科技大学信息科学技术学院,山东青岛266061)
  • 出版日期:2021-08-19 发布日期:2021-08-19
  • 作者简介:闫岭岭(1993—),女,山东汶上人,硕士研究生,研究方向:机器学习,数据挖掘,E-mail: 1019162802@qq.com; 江峰 (1978—),男,江西彭泽人,教授,博士,CCF会员,研究方向:机器学习,数据挖掘,粗糙集等; 通信作者:杨爱光(1972—), 黑龙江鸡西人,讲师,研士,研究方向:人工智能,嵌入式系统,E-mail: yanll2802@qq.com。
  • 基金资助:
    国家自然科学基金资助项目(61973180,61671261); 山东省自然科学基金资助项目(ZR2018MF007)

Software Defect Prediction Based on Hybrid Sampling and Random_Stacking

  1. (College of Information Science & Technology, Qingdao University of Science and Technology, Qingdao 266061, China)  
  • Online:2021-08-19 Published:2021-08-19

摘要: 现有的软件缺陷预测方法面临数据类别不平衡性、高维数据处理等问题。如何有效解决上述问题已成为目前相关领域的研究热点。针对软件缺陷预测所面临的类别不平衡、预测精度低等问题,本文提出一种基于混合采样与Random_Stacking的软件缺陷预测算法DP_HSRS。DP_HSRS算法首先采用混合采样算法对不平衡数据进行平衡化处理;然后在该平衡数据集上采用Random_Stacking算法进行软件缺陷预测。Random_Stacking算法是对传统Stacking算法的一种有效改进,它通过融合多个经典的分类算法以及Bagging机制构建多个Stacking分类器,对多个Stacking分类器进行投票,得到一个集成分类器,最后利用该集成分类器对软件缺陷进行预测。通过在NASA MDP数据集上的实验结果表明,DP_HSRS算法的性能优于现有的算法,具有更好的缺陷预测性能。

关键词: 软件缺陷预测, 数据不平衡, 混合采样, Random_Stacking, DP_HSRS

Abstract: The existing software defect prediction methods  face problems such as imbalance of data categories, high-dimensional data processing, and so on. How to effectively solve the above problems has become a research hotspot in related fields. Aiming at the problems of unbalanced categories and low prediction accuracy faced by software defect prediction, this paper proposes a software defect prediction algorithm DP_HSRS based on hybrid sampling and Random_Stacking. The DP_HSRS algorithm firstly uses a hybrid sampling algorithm to balance the unbalanced data, then uses the Random_Stacking algorithm to predict software defects on the balanced data set. The Random_Stacking algorithm is an effective improvement to the traditional Stacking algorithm. It constructs multiple Stacking classifiers by fusing multiple classic classification algorithms and the Bagging mechanism, votes multiple Stacking classifiers to obtain an integrated classifier, and finally uses the integrated classifier to predict software defects. The results of experiments on the NASA MDP data set show that the performance of the DP_HSRS algorithm is better than the existing algorithms, and it has better defect prediction performance.

Key words: software defect prediction, data imbalance, mixed sampling, Random_Stacking, DP_HSRS