计算机与现代化 ›› 2023, Vol. 0 ›› Issue (04): 32-38.

• 软件工程 • 上一篇    下一篇

基于堆叠降噪自编码器的跨项目软件缺陷数量预测方法

  

  1. (信息工程大学密码工程学院,河南 郑州 450000)
  • 出版日期:2023-05-09 发布日期:2023-05-09
  • 作者简介:刘路瑶(1993—),男,河南内黄人,硕士研究生,研究方向:软件质量管理,软件缺陷预测,E-mail: 1974583081@qq.com; 韩培胜(1978—),男,副教授,博士,研究方向:网络完全,深度学习,E-mail: hps97430031978@163.com。
  • 基金资助:
    国家自然科学基金资助项目(61572517)

Cross-project Software Defect Number Prediction Method Based on Stacked

  1. (School of Cryptography, University of Information Engineering, Zhengzhou 450000, China)
  • Online:2023-05-09 Published:2023-05-09

摘要: 在软件缺陷预测技术应用中,需要预测的项目可能是一个全新的项目,或者需要预测的项目历史数据较为不足。一种解决方法是利用已有数据充足的项目(源项目)构建模型完成对新项目(目标项目)的预测,主要利用传统机器学习方法对源项目与目标项目进行特征迁移学习完成缺陷预测,但不同项目之间的数据存在较大的分布差异,同时传统机器方法学习到的特征表示能力很弱且缺陷预测性能较差。针对此问题,从深度学习出发提出一种基于堆叠降噪自编码器的跨项目缺陷预测方法,该方法结合堆叠降噪自编码器和最大均值差异距离,能够有效地提取源项目与目标项目可迁移的深层次特征表示,基于该特征可以训练出有效的缺陷数量预测模型。实验结果表明,在Relink数据集和AEEEM数据集上与经典的跨项目缺陷预测方法Burak过滤法、Peters过滤法、TCA以及TCA+进行比较,该方法在大多数情况下可取得最好的预测结果。

关键词: 跨项目软件缺陷预测, 堆叠降噪自编码器, 最大均值差异距离, 深度特征表示

Abstract: In the application of software defect prediction technology, the project to be predicted may be a brand new project, or the historical data of the project to be predicted is insufficient. One solution is to use a project (source project) with sufficient data to build a model to complete the prediction of a new project (target project), and mainly use traditional machine learning methods to perform feature transfer learning on the source project and the target project to complete defect prediction. There is a large difference in the distribution of data between different projects, and the feature representation ability learned by traditional machine methods is weak and the defect prediction performance is poor. In response to this problem, a cross-item defect prediction method based on stacked denoising autoencoders is proposed from the perspective of deep learning. This method combines stacked denoising autoencoders and maximum mean difference distance, which can effectively extract the transferable deep-level feature representation of source items and target items, based on which an effective defect number prediction model can be trained. The experimental results show that compared with the classical cross-item defect prediction methods Burak filtering method, Peters filtering method, TCA and TCA+ on Relink dataset and AEEEM dataset, this method achieves the best prediction results in most cases.

Key words: cross-project software defect prediction, stacked denoising autoencoders, maximum mean difference distance, deep feature representation